Modeling Seismic Wave Propagation on a 156GB PC Cluster

Large earthquakes in densely populated
areas can be deadly and very damaging to the local economy. Recent
earthquakes in El Salvador (magnitude 7.7 on January 13, 2001),
India (magnitude 7.6 on January 26, 2001) and Seattle (magnitude
6.8 on February 28, 2001) illustrate the need to understand better
the physics of earthquakes and motivate attempts to predict seismic
risk and potential damage to buildings and infrastructures.

Strong ground shaking during an earthquake is governed by the
seismic equations of motion, which support three types of waves:
pressure (or sound), shear and surface waves. Numerical techniques
can be used to solve the seismic wave equation for complex
three-dimensional (3-D) models. Two major classes of problems are
of interest in seismology: regional simulations (e.g., the
propagation of waves in densely populated sedimentary basins prone
to earthquakes, such as Los Angeles or Mexico City) and the
propagation of seismic waves at the scale of the entire Earth.
Every time an earthquake occurs, these waves are recorded at a few
hundred seismic stations around the globe and provide useful
information about its interior structure.

Numerical Technique

At the Seismological Laboratory at the California Institute
of Technology, we developed a highly accurate numerical technique,
called the Spectral-Element Method, for the simulation of 3-D
seismic wave propagation. The method is based upon the classical
finite element method widely used in engineering. Each of the
elements contains a few hundred points, solves the seismic wave
equation on a local mesh and communicates the results of its
computations to neighbors in the mesh. To model seismic wave
propagation in the Earth, we create a mesh of the globe, which we
divide into a large number of slices (see Figures 1 and 2). Each
slice contains a large number of elements (typically several tens
of thousands). The objective is to run the calculations on a
parallel computer because the size of the mesh makes it impossible
to run our application on a shared-memory machine or a workstation.
Therefore, the method is perfectly suited for implementation on a
cluster of PCs, such that each PC handles a subset of all the
elements of the mesh. We use message-passing techniques to
communicate the results between PCs across the network. This idea
of parallel processing under Linux has developed rapidly in the
scientific community (see the articles by M. Konchady and R. A.
Sevenich listed in Resources).

Figure 1. Mesh of the Globe

Figure 2. Slices Assigned to Processors

Research on how to use large PC clusters for scientific
purposes started in 1994 with the Beowulf Project of NASA
(beowulf.org), later
followed by the Hyglac Project at Caltech and the Loki Project at
Los Alamos (see Tom Sterling and collaborators' book How
to Build a Beowulf and
cacr.caltech.edu/resources/naegling).
Hans-Peter Bunge from Princeton University was among the first to
use such clusters to address geophysical problems, and Emmanuel
Chaljub from the Institut de Physique du Globe in Paris, France
introduced the idea of using message passing to study wave
propagation in the Earth. Clusters are now being used in many
fields in academia and industry. An application to a completely
different field, remote sensing, was presented in a recent issue of
the Linux Journal by M. Lucas (see
Resources).

Hardware

For our project we decided to build a cluster from scratch
using standard PC parts. The acronym COTS, for commodity
off-the-shelf technology, is often used to describe this approach.
The main constraint was that we needed a large number of PCs and a
lot of memory because of the size of the meshes we wanted to use in
our simulations. Communications and I/O are not a big issue for us
since the PCs spend most of their time doing computations, and the
amount of information exchanged between PCs is always comparatively
small. Therefore, our particular application would not benefit
significantly from the use of a high-performance network, such as
Gigabit Ethernet or Myrinet. Instead, we used standard 100Mbps Fast
Ethernet. Due to the large number of processors required (312 in
total), we used dual-processor motherboards to reduce the number of
boxes to 156, thus minimizing the space needed for storage (and the
footprint of the cluster). This structure impacts performance
because two processors share the memory bus (which causes bus
contention but reduces the hardware cost) since only one case,
motherboard, hard drive, etc., are needed for two processors. We
ruled out the option of rackmounting the nodes, essentially to
reduce cost, but chose to use standard mid-tower cases on shelves,
as illustrated in Figure 3. This approach is sometimes given the
name LOBOS (“lots of boxes on shelves”). The shelving system was
placed in a computer room already equipped with a powerful
air-conditioning system and 156 dual-processor PCs.

Figure 3. 156 Dual-Processor PC Cluster. The boxes are connected
using a standard 100Mbps Fast Ethernet network (the green and blue
cables). In the back, one can see the 192-port Cisco switch. The
height of the shelving system is approximately eight feet.

Deciding between Pentium IIIs and AMD Athlon processors was
difficult. The Athlon is said to be faster for floating-point
operations, which is the main type of operation used in most
scientific applications, including ours. At build time, no
dual-processor Athlon motherboard was available. As mentioned
above, using single nodes would have increased the total cost of
the cluster. For this reason, we selected the Pentium III.

It is tempting to use the latest technology when assembling a
PC. However, new processors are more expensive than six-month-old
technology and offer a small increase in performance. Three- to
six- month-old processors provide the best trade-off between price
and performance. We used 733MHz processors when we assembled the
machine in the summer of 2000.

Figure 4. Price/Performance Ratio for the Pentium III

Figure 4 shows the ratio between price and performance for
the Pentium III processor. The prices shown are an average of
typical prices from retailers in the US. As one can see, old
processors are cheap but relatively slow. New processors are faster
but much more expensive. The optimal price/performance ratio is
obtained in between.

We decided to put the maximum possible amount of memory on
the motherboards, i.e., fully populate the memory slots with 1GB of
RAM per PC for a total of 156GB of memory in the cluster. Each PC
is also referred to as a “node” or “compute node”. Note that
memory represents more than 50% of the total cost of the
cluster.

The rest of the hardware is fairly standard: each PC has a
Fast IDE 20GB hard drive, a Fast Ethernet network card and a cheap
1MB PCI video card, which is required for the PC to boot properly
and can be used to monitor the node if needed. We use high-quality,
mid-tower cases with ball-bearing case fans because the mechanical
parts in a cluster, such as fans and power supplies, are the most
likely to fail. Note that the total disk space in the cluster is
enormous (20GB × 156 = 3,120GB = 3TB). To further reduce the
cost of the cluster and to have full control over the quality of
the installed parts, we decided to order the parts from different
vendors and assemble the nodes ourselves, rather than ordering
pre-assembled boxes. It took three people about a week to assemble
the entire structure. One PC, called the front end, has a special
role in the cluster: it contains the home filesystems of the users
(SCSI drives NFS-mounted on the other nodes with the autofs
automounter), the compilers, the message-passing libraries and so
on. Simulations are started and monitored from this machine. The
front end is also used to log in to the nodes for maintenance
purposes. The nodes are all connected using a 192-port Catalyst
4006 switch from Cisco, which has a backplane bandwidth of 24Gbps
(see Figure 5).

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.