Supercomputing Architectures- An Overview


A supercomputer is the computer that does more number of crunching tasks per second. That does not mean that supercomputers have to use faster processors. Presently supercomputers relatively use older and slower processors. The main requirement criteria for supercomputing is the number of number crunching has to be the maximum whereas high performance computing is the domain that deals with computing with more processing power through using faster processors.

Supercomputing Architectures

There are mainly three types of supercomputing architectures.

1.  Vector Processing

2.  Parallel Processing

3. Grid/Distributed Computing

Vector Supercomputers

   The first and foremost supercomputers were vector supercomputers. Vector supercomputer is a single machine with supercomputing capability. These machines are optimized for applying arithmetic operations to large vectors (dynamic arrays) of data.  There are many vertical applications that need this type of machines.

Parallel Processing

This kind of supercomputers is being designed by combining many computing machines or processors that split the computing load among themselves in an optimal way. This sort of machine can scale well by incorporating as many new machines or processors with the existing ones.

Thus, parallel processing means clustering of computing machines. There are mainly three types  of clustering.

1. Failover clustering is one in which if one machine breaks down in the cluster, automatically any other machine in that cluster will take care of the responsibilities.

2.  Load balancing cluster is one in which service requests are routed to different servers in the cluster to offset high load on a particular server.

3. High performance cluster facilitates all the machines in that cluster to work simultaneously to bring better number crunching capabilities. The computing machines, connected by multiple high-speed networks,  in the cluster share the task in an optimal and efficient way. 

        The Beowulf cluster is a high performance cluster built out of commonly available parts, running Linux or Windows.

Grid Processing

    Grid computing is the new paradigm shift in supercomputing domain. This initiative came on the basis of SETI@home project. Actually all computing machines specifically personal computers are idle for significant amounts of time. Hence the idea of using the idle time of millions of computers connected through the Internet to do some useful works.  This paradigm shift in computing with a huge processing power can even beat the performance of any existing supercomputing machines. 

Thus grid computing is a way of harnessing idle computing power in large networks. A high performance computer has to be built at one location with dedicated buildings and machines. A computing grid, on the other hand, connects existing networks across locations over the Internet. 


Using two or more processors on the same machine is called SMP (Symmetric Multiprocessing). In a multiprocessor machine, a program has to be divided among each processor, such that each processor can handle its own chunk independently. For this, the program has to be split into smaller parts referred to as tasks. Each task is generally called a thread. The software developer has to code in a way such that the program can be broken into independent chunks. 


This sort of dividing a whole program into many subtasks towards the goal of using more than one processor in a computing machine is called multithreading. If suppose the whole program gets splited into two threads and there are two processors, then each processor will be assigned to perform the functionality of one thread. Each processor has its own cache memory. If the processor gets short of cache, then the system memory or RAM may be used. On a multiprocessor machine, each processor and RAM is connected via a dedicated high-speed bus.

Distributed Memory Model

    Inter-thread communication

As each process has been blessed with its one memory, one thread can not access another thread's memory. If the processor processing thread 2 requirement has come across a data related to processor, which is processing thread 1, communicates the data to thread 1. On receiving it, thread 1 will store the data in its memory. This is basically called inter-thread communication. But passing data among processors has some overheads such as construction of data, calling for and sending data by the thread etc. 

Also the point to be noted here is that intercommunicated data has to travel via the slow system bus. Thus it becomes very critical for a software developer to be very particular about  minimizing the occurrence of  inter-thread communication while developing multithreaded software applications through assigning independent data to the processors. This model is called distributed memory model.

Shared Memory Model

This mechanism comes as a viable solution for the problem of inter-thread communication. Instead of having a separate memory for each processor, there is a unique memory accessible for all the processors. This setup is called shared memory model. In this model, there is no need for communication between threads as both threads can access the shared memory.

But this model has its own bottlenecks. There comes a situation wherein threads would be working on the same data set simultaneously brining the issue of data integrity. To avoid this unwanted situation, each thread must lock access to the other thread while manipulating the data in the shared memory. This locking presents an overhead to the processing as well as to the developer, who needs to take care of locking and releasing locks after manipulation in his source code.