A Distributed Datacube Analysis Service for Radio Telescopes

Mahadevan, Venkat

The current- and next-generation radio telescopes are poised to produce data at an unprecedented rate. We are developing the cyberinfrastructure to enable distributed processing and storage of FITS data cubes from these telescopes. In this contribution, we will present the data storage and network infrastructure that enables efficient searching, extraction and transfer of FITS datacubes. The infrastructure combines the iRODS distributed data management with a custom spatially-enabled PostgreSQL database. The data management system ingests FITS cubes, automatically populating the metadata database using FITS header data. Queries to the metadata service return matching records using VOTable format.

The iRODS system allows for a distributed network of fileservers to store large data sets redundantly with a minimum of upkeep. Transfers between iRODS data sites use parallel I/O streams for maximum speed. Files are staged to the optimal host for download by an end user. The service can automatically extract subregions of individual or adjacent cubes registered to user-defined astrometric grids using the Montage package. The data system can query multiple surveys and return spatially registered data cubes to the user. Future development will allow the data system to utilize distributed processing environment to analyze datasets, returning only the calculation results to the end user.

This cyberinfrastructure project combines many existing, open-source packages into a single deployment of a data system. The codebase can also function on two-dimensional images. The project is funded by CANARIE under the Network-Enabled Platforms 2 program.