Parallel Computing Using Linux

Various classes of problems lend themselves to parallel computing solutions. This article discusses the concepts and shows how Linux can be used to address the problem.

Parallel computing involves the design of
a computing system that uses more than one processor to solve a
single problem. For example, if two arrays with ten elements each
must be added, two processors can be used to compute the results.
One processor computes the sum of the first five elements and the
second processor computes the sum of the second five elements.
After the computation, the results from one processor must be
communicated to the other processor. Before starting the
computation, both processors agree to work on independent
sub-problems. Each processor works on a sub-problem and
communicates when the solution is available. Theoretically, a
two-processor computer should add the array of numbers twice as
fast as a single-processor computer. In practice, there is overhead
and the benefits of using more processors decrease for larger
processor configurations.

Affordable Supercomputers

Obtaining a Unix workstation for the cost of a PC has been
one of the benefits of using Linux. This idea has been carried a
step further by linking together a number of Linux PCs. Several
research projects are underway to link PCs using high performance
networks. High speed networking is a hot topic and there are a
number of projects using Linux to develop a low latency and high
bandwidth parallel machine. (One URL is
http://yara.ecn.purdue.edu/~pplinux.)

Currently, there is not much high level support for shared
memory programming under SMP Linux. The basic Linux mechanisms for
sharing memory across processors are available. They include the
System V Inter-Processor Communication system calls and a thread
library. But, it will be some time before a parallel C or C++
compiler will be available for Linux. Parallel programming can
still be done on an SMP Linux machine or on a cluster of Linux PCs
using message passing.

Parallel computing is advantageous in that it makes it
possible to obtain the solution to a problem faster. Scientific
applications are already using parallel computation as a method for
solving problems. Parallel computers are routinely used in
computationally intensive applications such as climate modeling,
finite element analysis and digital signal processing. New
commercial applications which process large amounts of data in
sophisticated ways are driving the development of faster computers.
These applications include video conferencing, data mining and
advanced graphics. The integration of parallel computation,
multimedia technology and high performance networking has led to
the development of video servers. A video
server must be capable of rapidly encoding and decoding megabytes
of data while simultaneously handling hundreds of requests. While
commercial parallel applications are gaining popularity, scientific
applications will remain important users of parallel computers.
Both application types are merging as scientific and engineering
applications use large amounts of data and commercial applications
perform more sophisticated operations.

Parallel computing is a broad topic and this article will
focus on how Linux can be used to implement a parallel application.
We will look at two models of parallel programming: message passing
and shared memory constructs.

Message Passing

Conceptually, the idea behind message passing is
simple—multiple processors of a parallel computer run the same or
different programs, each with private data. Data is exchanged
between processors when needed. A message is transmitted by a
sender processor to a
receiver processor. One processor can be
either a sender or a receiver processor at any time. The sender
processor can either wait for an acknowledgement after sending or
it can continue execution. The receiver processor checks a message
buffer to retrieve a message. If no message is present, the
processor can continue execution and try again later or wait until
a message is received. Multiple sends and
receives can occur simultaneously in a
parallel computer. All processors within the parallel computer must
be inter-connected by a network (Figure 1).

All processors can exchange data with all other processors.
The routing of messages is handled by the operating system. The
message-passing model can be used on a network of workstations or
within a tightly coupled group of processors with a distributed
memory. The number of hops between processors can vary depending on
the type of inter-connection network.

Message passing between processors is achieved by using a
communication protocol. Depending on the communication protocol
used, the send routine usually accepts a destination processor ID,
a message type, the start address for the message buffer and the
number of bytes to be transmitted. The receive routine can receive
a message from any processor or from a particular processor. The
message can be of any particular type.

Most communication protocols maintain the order in which
messages are sent between a pair of processors. For example, if
processor 0, sends a message of type a
followed by a message of type b to processor
1, then when processor 1 issues a receive from processor 0 for a
generic message type, the message of type a
will be received first. However, in a multi-processor system, if a
processor issues a receive from any processor, there is no
guarantee of the order of messages received from the sending
processors. The order in which messages are transported depends on
the router and the traffic on the network. To maintain the order of
messages sent, the safest way is to use the source processor number
and message type.

Message passing has been used successfully to implement many
parallel applications. But a disadvantage of message-passing is the
added programming required. Adding message-passing code to a large
program requires considerable time. A domain decomposition
technique must be chosen. Data for the program must be divided such
that there is minimal overlap between processors, the load across
all processors is balanced and each processor can independently
solve a sub-problem. For regular data structures, the domain
decomposition is fairly straightforward, but for irregular grids,
dividing the problem so that the load is balanced across all
processors is not trivial.

Another disadvantage of message passing is the possibility of
deadlock. It is very easy to hang a parallel computer by misplacing
a call to the send or
receive routines. So, while message passing is
conceptually simple, it has not been adopted fully by the
scientific or commercial communities.

Geek Guides

Pick up any e-commerce web or mobile app today, and you’ll be holding a mashup of interconnected applications and services from a variety of different providers. For instance, when you connect to Amazon’s e-commerce app, cookies, tags and pixels that are monitored by solutions like Exact Target, BazaarVoice, Bing, Shopzilla, Liveramp and Google Tag Manager track every action you take. You’re presented with special offers and coupons based on your viewing and buying patterns. If you find something you want for your birthday, a third party manages your wish list, which you can share through multiple social- media outlets or email to a friend. When you select something to buy, you find yourself presented with similar items as kind suggestions. And when you finally check out, you’re offered the ability to pay with promo codes, gifts cards, PayPal or a variety of credit cards.