NUMA refers to the server’s memory design choice available for multiprocessors. NUMA means that it will take longer to access some regions of memory than others. The memory access time depends on the memory location relative to a processor. Multiprocessor Architectures. There are 3 criteria’s on which performance of a multiprocessor system can be judged: Scalability, Latency and Bandwidth.

Scalability – The ability of a system to demonstrate a proportionate increase in parallel speedup with the addition of more processors.

Latency – The time taken in sending a message from node A to node B.

Bandwidth – The amount of data that can be communicated per unit of time.

The goal of a multiprocessor system is to achieve a highly scalable, low latency, high bandwidth system.

Typical Parallel Architectures

There are 2 major types of Parallel Architectures that are common in the industry: Shared Memory Architecture and Distributed Memory Architecture. Shared Memory Architecture is of 2 types: Uniform Memory Access (UMA), and Non-Uniform Memory Access (NUMA).

Shared Memory Architecture (As seen in the photo below) all processors share the same memory, and treat it as a global address space. The major challenge to overcome in such architecture is the issue of Cache Coherency (i.e. every read must reflect the latest write). Such architecture is usually adapted in hardware model of general purpose CPU’s in laptops and desktops.

Uniform Memory Access (UMA) – The following photo shows a sample layout of processors and memory across a bus interconnection. All the processors are identical, and have equal access times to all memory regions. These are also sometimes known as Symmetric Multiprocessor (SMP) machines. The architectures that take care of cache coherency in hardware level, are knows as CC-UMA (cache coherent UMA).

Uniform Memory Access (NUMA) – The following photo shows type of shared memory architecture, we have identical processors connected to a scalable network, and each processor has a portion of memory attached directly to it. The primary difference between a NUMA and distributed memory architecture is that no processor can have mappings to memory connected to other processors in case of distributed memory architecture, however, in case of NUMA, a processor may have so. It also introduces classification of local memory and remote memory based on access latency to different memory region seen from each processor. Such systems are often made by physically linking SMP machines. UMA, however, has a major disadvantage of not being scalable after a number of processors .

Distributed Memory Architecture (As seen in the photo below) all the processors have their own local memory, and there is no mapping of memory addresses across processors. So, we don’t have any concept of global address space or cache coherency. To access data in another processor, processors use explicit communication. One example where this architecture is used with clusters, with different nodes connected over the internet as network.