Hardware Recommendations

Benchmark systems for ALMA data processing

ALMA data represents a significant challenge to process. General considerations for computing hardware to run CASA on may be found on the CASA hardware considerations page, which has a detailed discussion of trade-offs between processing power and i/o and some alternative [though similar] system specifications). Note that the CASA software package is rapidly evolving, and our recommendations may change in the near future once the performance of CASA in parallel mode (which will start to become available with the CASA 4.1 release) is fully understood.

As a very rough guide to data volume, Cycle 0 execution blocks (EBs) from ALMA contain about 1GB of raw data if taken in continuum mode (128 channels per baseband), or 10GB of raw data if taken in spectral line mode (3840 channels per baseband), with a typical project containing 3-10 EBs (though a few have many more). In Cycle 1, the increased number of antennas will inflate these volumes, though the ability to use mixed modes and channel averaging will counteract this to some extent, resulting in a likely net increase of about a factor of two in volume over Cycle 0 (i.e. 2GB per continuum EB, 20GB per spectral line EB). Experience suggests that typical processing will temporarily inflate these numbers by about a factor of ten. Furthermore, the Cycle 0 data deliveries often contain multiple EBs, and also contain measurement sets which inflate the data volume of the deliveries by a factor of about four compared to the raw ALMA Science Data Models (ASDMs). (The Cycle 1 packages do not include measurement sets.) The largest Cycle 0 delivery is about 1.4TB in size. With the increased number of antennas in Cycle 1, and beyond, these volumes will increase.

Currently, only the smallest continuum datasets can be effectively processed on a laptop. For desktop computers, 8 GB is probably the minimum memory needed for data reduction, and substantial improvements are seen when additional memory is added (up to at least 128GB).

The systems on this page are representative of the machines the regional centres are purchasing to support visitors. We have been benchmarking these systems in order to assist users with their purchasing decisions should they wish to perform data processing and analysis at their home institutions.

Runtimes

Our runtime benchmarking has been performed using the Antenna Band 7 Science Verification dataset. The benchmarking scripts are based on those in the CASA Guides for calibration and imaging with the interactive parts removed. These data were taken with eleven antennas compared to the 16+ offered in Cycle 0 and 32+ in Cycle 1, so these times are lower bounds on a typical Early Science program. These tests used CASA 4.0. (Note that none of these tests used parallel CASA, and thus effectively ran CASA on only a single core. We expect that when parallelization is enabled both the absolute and relative performances will change significantly, favouring single machines with large numbers of cores, and clusters.)