Abstract:

Patients with Type 1 diabetes (T1D) may develop a wide variety of additional slowly progressing complications, which have been shown to be partly heritable and to correlate with each other. However, the genetic and biological mechanisms behind them are still mostly unknown. The goal of this work was to use machine learning and data mining approaches that could capture the progressive nature of multiple complications simultaneously, and create novel phenotype classes that could help to solve the pathogenesis and genetics of diabetic complications.

To achieve this, a dual-layer self-organizing map (SOM) was trained using clinical and environmental patient data from the FinnDiane study, and the trained SOM node prototypes were clustered to classes using agglomerative hierarchical clustering. The genetic differences between the created classes were evaluated using heritability estimates, and the genetic markers associated with the class assignments showing significant heritability were analysed in genome-wide association study (GWAS).

The created class assignments were biologically plausible, and were estimated to be up to 42% genetically determined. The GWAS analyses detected a genetic marker (rs202095311, located in the last intron of the gene NRIP1) genome-wide significantly (p<5×10^-8) associated with one of the created class assignments. In addition, GWAS detected multiple other genetic regions with suggestive p-values that contained mostly genes and processes previously linked to diabetic complications or their risk factors.