NEC Earth Simulator

The World's Fastest Supercomputer!

Around the end of April 2002 it became public knowledge in the US that the Japanese had developed an astonishing supercomputer more than five times as fast as the formerly fastest machine. The Earth-Simulator, developed by NEC, was born after 5 years of public development and is capable of a theoretical maximum of 40 trillion operations per second (40 Tflops). The popular website Top500.org granted the Earth-Simulator the #1 spot as the world's fastest computer in June of 2002. This caught the complacent American supercomputing community by suprise. We thought we had the latest and greatest technology and would retain the lead for a long time, but we were proven wrong.

The Earth Simulator differs from most other modern supercomputers in that it is a specialized vector-processing computer. It is designed, from the ground up, not to be flexible and allow for anything to be run on it, but to run long mathematical calculations quickly. Other supercomputers, such as ASCI White, the MCR Linux Cluster, or even the not yet finished TeraGrid are made from commercial server products networked together in massive amounts. These servers could theoretically run popular operating systems, common applications, and web servers. The Earth Simulator, however, is designed only for long vector operations.

Hardware
The hardware in this supercomputer consists of 5120 NEC SX-5 500MhzCPUs (called arithmatic processors, AP) organized in clusters of 8 each of which share a common memory space. The clusters of 8 are paired up into 640 Processor Nodes (PN) linked together by a 640x640 crossbar switch running at 16GB/s max bandwidth. The interconnects are organized into large cabinets (IN) located physically at the center of the supercomputer's layout. Each CPU can access 2GB of memory for a total of 10TB of memory space and each CPU is capable of an individual 8Gflops.

Software
The Earth Simulator runs a NEC variant of Unix called Super-UX. There is a massive Parallel File System to control access to disc which has the potential to be a major bottleneck in muliprocessor machines. A specific job scheduler was designed to automatically distribute work loads and efficiently control how a program is parallelized. The programming environment is somewhat complex. The distributed memory parallelizing system called MPI (Message Passing Interface) is used to control parallelization between nodes (each of which has its own memory space). The OpenMP distributed programming language controls the parallelization among specific CPU's in each Processor Node (8 CPUs per node). There is actually a third tier of parallelization as each processor is 4-way superscalar and vectorizes each thread given to it to process.