Data management platform to power collaborative data network

Written by Kate McDonald on 17 October 2014.

Melbourne-based data management systems specialist Arcitecta and supercomputing giant SGI are working together to build customised research platforms to give the research community access to the massive amounts of data held in the national Research Data Storage Infrastructure (RDSI) project.

Arcitecta's Mediaflux data management platform has been chosen as a key data management engine for the RDSI project, which is providing storage for some of Australia's huge data sets generated through genomics, DNA sequencing, population health research and cancer tissue banks.

The data sets are held in eight distributed data centres or nodes, which currently contain over 11 petabytes (11,000 terabytes) of content and are expected to grow to over 55 petabytes as part of the project. The idea is to allow researchers to access, analyse and re-use the data held in the nodes in a coherently governed environment.

Researchers will be able to peruse data collections by name, type, owner, date and linked publications as Mediaflux is able to make disparate types of data available to users through its metadata search engine. Data can also be quickly discovered and queried as it is ingested due to the automatic metadata extraction capabilities of Mediaflux.

Mediaflux is being used to build a web-based repository for the Cooperative Research Centre (CRC) for Mental Health, where researchers will be able to capture, access and query clinical observation data from longitudinal studies of biomarkers.

In the RDSI project, it will facilitate rapid collaboration across different data types and data repositories that would otherwise be incompatible.

RDSI project director Nick Tate said the benefits from better-managed and more accessible research data are being sought everywhere across the research sector.

“At the same time, the acceleration in data creation is outstripping growth in data storage capabilities,” Dr Tate said. “A national data environment of the scale planned means new questions can be asked on topics, and at levels not previously thought possible.”

Arcitecta chief technology officer Jason Lohrey said the research community was on the cusp of achieving a national data management environment.

“This is creating a sustainable foundation for curating the collective output of Australian researchers and our international collaborators, leading to better research outcomes and a more efficient way to conduct large studies,” he said.