Partnership

Software

A cluster like the the HPU4Science's requires careful software selection to maximize the performance of the hardware. In this page, we describe the software choices for the HPU4Science cluster and detailed description is on the second Ars Technica article.

Linux is the current standard bearer for high performance computation
operating systems with almost 92 percent of the top 500 super computers.
Every major scientific computation project in the world, including the
Large Hadron Collider, runs on linux. The key features of linux are
stability, the ability to pick from a wide variety of file systems (more
on this in a moment), the large existing code base for high performance
computing, and the ease of tailoring the OS to the specific hardware
requirements. Availability of open source projects with code that can be
configured for highly parallelized processing was also an important
consideration.

File System - BTRFS

For writing data at the highest possible speeds, one would normally
choose a good PCI card or on-motherboard RAID. However, on-motherboard
RAID chips do not perform as fast as PCI cards and their configuration
can be both quirky and unstable. It also makes you dependent upon a
hardware vendor’s choices and whims which may not align with the
specific usage profiles of this system. Moreover, hardware RAID
controllers come with a price, and the HPU4Science budget is large, but
not infinite, and that money is better spent on more computational power
(GPUs!).

These considerations lead to investigating BTRFS as a
file system because it allowed software RAID with performance on par
with hardware RAID. If it can perform as well as hardware RAID, software
RAID is ideal because it can be more easily tailored to the specific
system requirements and offers substantially more flexibility. It also
makes future upgrades much easier as they do not require reconfiguring
firmware or investing in new hardware. On top of that, because of the
large amount of data that needs to be stored, the risk of data
corruption is high, and BTRFS’s checksum algorithm ensures data
reliability.

Primary Programming Language - Python

Given the choice of a highly parallelized GPU based cluster, there are
very few choices for programming languages. The freely available
libraries from nVidia for interfacing with the GPUs (CUDA) are written
in C. CUDA, relative to previous general purpose GPU computation
approaches, enables easier GPU processing by abstracting away most of
the interaction between the software and the hardware and it opens up
several hardware functions like shared memory that some open source
implementation leave out.

SAGE

The research performed on the HPU4SCience cluster is expected to require
extensive mathematical exploration. The researchers are not trying to
create new theorems, but they do use high levels math, say level 3 on a
log scale that ranges from 1 for Sudoku and 5 for Weinberg's Quantum
Field Theory. Therefore, the system must explore both numeric and
symbolic math. Since the programming for the cluster is largely written
in Python, it would be nice if the mathematical software interacted well
with Python.

Sage, a Computer Algebra System (CAS) which
development was initiated by a number theorist, combines the power of
commercial CASs but it is both written in and interprets Python. Sage is
a combination of many open source mathematical and scientific packages
including Maxima, Octave, Numeric Python, Scilab, SymPy, Matplotlib,
Latex, etc. bound together into a single framework that lets users work
in a single language but access a wide universe of software. It can also
work with commercial software including Mathematica and MatLab. Sage
provides an interactive graphical user interface (through any web
browser) that is stylistically similar to Mathematica, but also very
light and ideal for server configurations.

LITERATE PROGRAMMING and Reproducible computational research

Donald Knuth, master of us all, has long advocated literate programming
as the way forward for technical programming, but these ideas have been
superbly ignored by the large majority of people who code for science
(let's not even mention people who just "code"). The core concept behind
literate programming is to describe what you want to do and code it all
at the same time, making sure that humans can understand what you’re
doing, not just machines.