INTERCONNECT Report Summary

Project Context and Objectives:InterConnect is a European Union FP7-funded project that aims to change the way that data are used in population research into the causes of diabetes and obesity. It seeks to create the foundation to enable research to move from explaining the differences in the risk of diabetes and obesity within populations to being able to explain differences in risk between populations.

There is considerable global variation in the risk of diabetes and obesity between populations and the evidence suggests a complex interaction between genetic and environmental risk factors across the life course. Current strategies for studying variation between populations are limited:

• It is not feasible to establish sufficiently large, de novo multi-ethnic international prospective cohort studies and the elapsed time required for disease outcomes to be manifest is also prohibitive.

• Meta-analysis of results from individual studies can work well for some risk factors but not for others.

Individual participant meta-analysis based on pooling data from separate cohorts is analytically preferable but there are considerable organisational and regulatory challenges with such an approach.

InterConnect therefore aims to enable a solution which directly investigates between-population variation via meta-analyses of individual participant data across studies but without physical pooling of the data. Such an approach is currently limited by a lack of knowledge of relevant studies, their design and the data available; by methodological diversity in the assessment of exposures and outcomes; and by the lack of an IT framework for federated meta-analysis.

InterConnect seeks to build the foundation for cross-cohort analysis on a sustainable basis through the following objectives which address the limitations described above:

• Developing and populating an online registry of studies relevant to the field of studying gene-environment interaction on diabetes and obesity; providing a mechanism to link to study-level meta-data including standardised descriptions of the populations studied, measured used and materials stored.

• Creating a virtual forum for harmonisation of methods between relevant studies, including objective approaches to the measurement of key environmental exposures such as diet and physical activity and the use of biomarkers.

• Establishing an appropriately governed framework in which individual participant data from contributing studies can be analysed in a safe and protected setting that protects privacy, is aligned with the consent and legal arrangements of the studies and maximises the utility of the information that has been collected.

• Establishing a funders’ network to ensure connectivity with the project and a forum for stakeholders who have an interest in the policy, social and economic benefits of the research that will be enabled by InterConnect, acting together to promote cultural change towards a new paradigm of optimising use of existing data.

Project Results:We have made significant progress to the objectives of the project during the first (1st October 2013 to 31st March 2015) and second (1st April 2015 to 30th September 2016) reporting periods. The achievements in each period are summarised below.

Registry

To establish the registry, in period 1 we:

• Determined the specification, defining the data to be essentially included, study search protocols, and the data entry, verification and publication process.

• Built the registry ‘shell’, comprising the registry database, an online data collection form (with the functionality to track the status of each entry) and presentation of summary information in a searchable format.

• Populated the registry with an initial set of studies and made the resource available to researchers by linking the registry to the public-facing InterConnect website (www.interconnect-diabetes.eu).

• Decided our overall strategy for continued development, which is to ensure high coverage of studies by inclusion of a limited set of information that can be initially collected from the public domain, so creating little burden for investigators while enabling sign-posting of a large number of useful studies.

In period 2, we continued to develop the registry and have:

• Completed systematic reviews of (i) gene-environment interaction and type 2 diabetes in adults (ii) gene-environment interactions on diabetes and obesity among ethnic and migrant populations and (iii) the contextual factors that are relevant to ethnic and migrant populations.

• Populated the registry with a larger number of studies identified through (i) completion of the above systematic reviews to identify existing published studies and (ii) additional survey activities (existing registries, consortia websites, direct contact with experts) to identify unpublished studies.

• Collected meta-data from studies involved in the first working exemplar project which provides in-depth information to be added to the online registry once the software upgrade is complete (see also below).

Data harmonisation

To develop data harmonisation activities, in period 1 we:

• Decided to prioritise the exposures of diet and physical activity and the outcome measure of body composition.

• Developed exemplar research questions, which serve as working examples to establish principles, develop specific protocols and provide an incentive for investigators running cohorts to engage with InterConnect and use the tools that are being developed.

• Developed our understanding of the scientific and technical issues associated with data harmonisation, learning from the experience of Maelstrom Research at McGill University (partner 11) and the EU FP7-funded BioSHaRE project (www.bioshare.eu).

• Approached 22 cohorts relevant to the lead working exemplar question (effect of physical activity during pregnancy on foetal / neonatal adiposity and whether this varies by foetal sex) to develop a collaboration; this and other working exemplars will guide specific data harmonisation activities in the next period.

In period 2, we have:

• Completed harmonisation of exposures and outcomes for the first working exemplar project, coded the algorithms and applied them remotely to datasets hosted by the relevant cohorts. In doing so we have developed systematic procedures and re-usable tools as well as a data schema that will be shared for future use

• Progressed development of the registry software to make it customisable. This will enable us to upgrade the public-facing site to store the meta-data (i.e. data dictionaries of the relevant variables and the harmonisation algorithms) from studies involved in working exemplars to provide a re-usable resource for future harmonisation and cross-cohort analyses.

• Developed an R-script for a principal component analysis to describe population structure and applied this to a sub-set of studies within the registry.

• Made progress towards the development of an online tool to signpost researchers to methods for self-reported and objective measures of diet, physical activity and body composition (anthropometry).

Data management and governance

To develop a data management and governance framework, in period 1 we:

• Decided to apply and adapt the tools that have been tested in the FP7-funded BioSHaRE project and made available as an open source toolkit by Maelstrom Research.

• Set up the required IT systems at the University of Cambridge (partner 1) and piloted their use, demonstrating that an approach based on federated analysis of data (‘the analysis comes to the data’) gave equivalent results to the conventional data pooling process (‘the data comes to the analysis’) in a pilot analysis.

• Worked with BioSHaRE ethico-legal experts to define ethical, legal and social issues associated with the federated approach to data sharing that will need to be addressed for implementation in diabetes and obesity research.

In period 2, we have:

• Completed a federated meta-analysis of data from the 8 geographically dispersed cohorts participating in the first working exemplar project, generating scientifically interesting results of public health relevance.

• Established a second working exemplar project which has successfully engaged additional cohorts in the InterConnect approach. We have also begun to establish further exemplar projects which will be central to enabling the transition to a sustainable platform for federated meta-analysis of environmentally and ecologically diverse cohort data.

• Developed a bespoke ‘Data Access and Results Sharing Network Agreement’ which has been shared with the 8 cohorts participating in the first exemplar project; conventional data sharing agreements are not well suited to the InterConnect / BioSHaRE approach in which researchers are accessing data and sharing results, rather than physically sharing data.

Engaging stakeholders and dissemination

To develop stakeholder networks and disseminate information, in period 1 we:

• Set up a project website and disseminated information via newsletters and exhibition stands at conferences.

• Hosted an international meeting for research funders and stakeholders that led to the nucleus of a virtual funder and stakeholder network; work will continue to develop and substantiate this network through the duration of the project.

• Created tailored information packs to facilitate collaborations to take forward the working exemplars that will enable us to begin to understand the real-life issues affecting implementation and uptake of InterConnect tools.

In period 2, we have:

• Completed a major update of the content and structure of the project website, and updated the information packs described above.

• Held two international symposia in 2015 and 2016 as adjuncts to the European Association for the Study of Diabetes (EASD) conferences; these were attended primarily by diabetes researchers and also some research funders and wider stakeholders

• Held a workshop to introduce InterConnect to researchers studying ethnic and migrant populations

Potential Impact:The ambition of InterConnect is to create the foundations for future cross-cohort analyses, in particular research to explain the differences in risk of diabetes and obesity between populations. We anticipate that this will be manifest through an infrastructure to identify relevant studies, provide study-level meta-data and approaches to inform the harmonisation of data and enable federated data analysis, so providing a new, secure, scalable and potentially sustainable approach to optimising use of existing data. Alongside this, we plan to establish a virtual network of research funders and stakeholders that will promote the cultural change required for a paradigm shift in approaches to the analysis of data across cohorts and facilitate the sustainability of the analytical infrastructure. While InterConnect can seek to engage and catalyse action by funders and stakeholders, steps towards cultural change and sustainability are ultimately dependent on outside agencies taking ownership.

Potential impacts of the project can be summarised as:

• Maximising the investment made by different international research funders through improved re-use of existing data and co-ordination of complementary research.

• Provision of a forum for exchange of information and best practice between projects, catalysed by the study registry and online tools for harmonisation of exposures and outcomes.

• Creation of a self-sustaining virtual network of funders and stakeholders to better co-ordinate research, consider shared investment in new studies or data to overlay, and therefore, better ‘connect’ existing studies.

• Enabling future research which is likely to have major implications for understanding what explains the major differences seen in inherent susceptibility to diabetes and obesity; this enhanced knowledge will increase our understanding of risk and will in turn inform future preventive strategies.