First Beowulf Cluster in Space

When a satellite's image-gathering power exceeds the bandwidth available to transmit the images, a Linux cluster right on the satellite helps decide which images to send back to Earth.

PPU Design

The PPU consists of two anti-fuse Actel field programmable gate arrays
(FPGAs), known to be more radiation-resistant than other solutions from
Xilinx or Altera. Each FPGA hosts ten processing nodes (PNs), each with
a 206MHz StrongARM processor and 64MB of SDRAM. Individual FPGAs are
connected to three Atmel 4MB serial Flash chips containing a bootloader,
the OS kernel and filesystem images, which include selected image-processing
applications. Of course, programs can be added dynamically while the
satellite is in space, as though it were a regular Linux cluster.

The PPU is connected to the rest of the satellite by fairly slow
quad-redundant controller area network (CAN) links and two fast (200Mb/s)
low-voltage differential signalling (LVDS) links for image data
from the on-board camera. Figure 2 shows an overview of the hardware
architecture. Most interesting to mention is that the PPU also
can take over satellite control from the OBC. In fact, this is
one of the experiments that is supposed to validate that software and hardware
COTS components can fulfill mission-crucial tasks.

Figure 2. The cluster is based on two FPGAs, each connected to ten
206MHz StrongARM processors.

Internally, the PPU resembles a cluster-based
computing system with the FPGAs providing
the interconnection network. In fact, these
hubs themselves can offer image-processing
capabilities. The cluster concept means we can
sacrifice PNs to failure and yet carry on system
operation regardless. It also gives each PN sufficient
autonomy to run multiple algorithms simultaneously. As
each FPGA has its own independent communication links,
PPU operation can continue even with severe failures,
such as destruction of an entire FPGA.

A parallel bus interfaces each PN to an FPGA. Given that ten PNs
communicate with one FPGA, hardware I/O pins on the FPGA become a
limitation. It is impossible to support ten full 32-bit buses. A 16-bit
data bus is the next logical choice but results in a halving of the
effective bus bandwidth. However, considerable effort was made to ensure
that this slimmed interface operates efficiently, and it has resulted in a
novel 17-bit data bus, which is discussed later. From the PN perspective,
the FPGA is memory mapped into address space using an addressable window
concept to reduce parallel bus requirements.

Booting

Booting of the PNs is sequential to reduce peak power on start up and
consists of three stages. First, the StrongARM operates in the 16-bit
access mode, executing code directly from the lowest address window
of the FPGA. Although this translates into half-bandwidth memory access,
the small size of the ARM assembler bootloader (512 bytes) makes it
acceptable. The bootloader is a tiny ARM assembler coded routine of
less than 5,122 bytes that executes directly out of the FPGA's lowest
address window. It initialises the StrongARM, sets up SDRAM and then
loads the second stage from serial Flash. The second stage retrieves the
kernel and ramdisk from serial Flash, executes the kernel decompressor
and boots Linux. Finally, the third bootloader stage consists of
bzImage, which decompresses itself into the appropriate memory location and then
executes the kernel, which then decompresses its ext2 initrd ramdisk.

The 17-Bit Bus Interface and Protocol

All communication to the PN occurs through FPGA. A kernel device driver plus
a user-space library provide a standard interface API for Linux applications. The
low-level driver maintains two filesystem character devices that implement
interrupt handling and software receive/transmit buffers. In
order to keep the driver efficient and simple, kernel preemption was
disabled. The driver also periodically writes to a watchdog register in
the FPGA, as a heartbeat signal, causing reboot on timeout.

In the PPU, writes from the PN to FPGA fall into two classes: control
and message data. Message data normally is destined for another PN,
whereas control data directs some action on the part of either the FPGA or
PN. Similarly, reads of the FPGA by a PN also fall into these categories.

In case of message data writes from a PN to another PN via the FPGA,
each item of data destined for a particular PN must be addressed. Either
addressing information is part of each and every word transferred
or it's set in advance. In the PPU, message paths are set in advance
under PN control for efficiency reasons, assuming most transfers are
large—which is without doubt the case for satellite images. But
a 16-bit interface conveying 16-bit data messages must have a mechanism
to distinguish between data and address packets. This could be achieved
by writing these to separate address registers in the FPGA.

The situation for reading the FPGA is trickier, however. A 16-bit bus
requires two reads for each message: one read to determine message
type and/or length and another to convey the actual message. But
because our messages have variable length, there is an immediate problem
concerning the timing of such messages. The reason is an interrupt
signal is used to indicate a 16-bit value waiting to be read and the
PNs are under obligation to respond. So for long messages, the FPGA
would read a sequence of 16-bit half-words. But it has no obvious means of
distinguishing a 16-bit control word inserted into this sequence. We
could prefix all half-words with a type header, but that would mean
two reads per half-word of message—halving the bandwidth.

Our solution is a 17-bit bus with the StrongARMs operating in 32-bit
access mode. Both raw data and commands share the physical link as
half-words with their type differentiated by the state of a special
17th bit that indicates to the PN whether an incoming item is data or
a control message. Most important, it does this without requiring any
extra read cycles or extra bus bandwidth.

This approach wouldn't be of interest if the driver module couldn't
take direct advantage of the load-store nature of the ARM and the fact
that all instructions are conditional. The former implies that the
32-bit read from the 17-bit bus is loaded into an internal register
before being moved to memory. The latter implies that if the 17th bit
of the interface is wired to the most significant data bit, D31, rather
than the more obvious choice of D16, it can be used to affect
the zero flag. As a result, the data destination to one of two internal
memory buffers can be controlled through conditional data moves. This
is extremely efficient compared with an inefficient conditional branch
that most other processors utilise. The following assembler code provides
an example with r0, r1, r2 and r3 being the registers for the address of
the FPGA data transfer, the control word buffer, the message word buffer
and the type, respectively. In summary, the code for the optimised solution
is 33% faster and uses one register less:

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.