AIDA Data Hub

The most important factor for training world-class AI is access to massive amounts of high-quality training data. The AIDA data hub is a place where researchers can collaboratively gather, annotate, share and enrich large volumes of research data for machine learning in medical imaging diagnostics. The purpose is to facilitate research, innovation and clinical adoption of world-class AI technology in Sweden.

The AIDA dataset register provides information on datasets that have been shared on the data hub, and makes them citable in scientific publications using DOI identifiers.

So far more than 4 TB image data from radiology and pathology has been shared on the AIDA data hub.

Data acquisition priorities are set by the AIDA data hub clinical council. AIDA can fund efforts to bring in prioritized data and annotations for sharing on the data hub according to these priorities.

Access and privacy

AIDA is a collaboration arena for academia, industry and healthcare and supports OpenScience and FAIR data. AIDA can facilitate large scale data exports for research from clinical production systems, and can host research data for sharing. AIDA uses data mainly from medical imaging, and therefore has an obligation to adequately safeguard the privacy of the individuals concerned. Therefore AIDA only shares data that is ethically approved for sharing, and when a contractual agreement is in place that includes non-disclosure of data, such as AIDA partner contracts for innovation projects, clinical– or technical fellowships, on-site development projects, or network partnerships. AIDA also facilitates contacts with data controllers for datasets shared on AIDA for data sharing options outside of AIDA.

Please see the AIDA GDPR policy for more details. AIDA is planning a revision of its platform and will update this section as extended capabilities become available.

Anonymous vs identifiable data

Currently, AIDA exclusively uses anonymized data. However to facilitate export and work with larger amounts of data, AIDA is planning to establish a platform that is secure enough to store, share and process also identifiable data.

Design

The core idea of AIDA is to facilitate taking good and novel AI tools into the clinic to make a positive impact in day-to-day work in medical care. For this reason, the AIDA data hub is built around a Picture Archive and Communication System (PACS), of the same kind as medical professionals use in their day-to-day work at radiology and digital pathology clinics. This allows clinicians to use their normal tools to conveniently carry out data filtering, inspection, annotation and quality control of results, and to immediately see in a realistic setting whether a research idea has potential to add value in everyday clinical work.

Apart from the PACS, the AIDA platform also has added services for file sharing and source code collaboration, as well as computation servers that allows machine learning experts to process the data using their own preferred tools, using operating systems and applications of their choice on cutting edge GPU accelerators.