Using Python I took the smoker status reported by users and correlated it with a mutation (rs1051730) in the nicotinic acetylcholine receptor alpha 3 subunit CHRNA3 gene. A few genome wide association studies (GWAS) linked this mutation to nicotine dependence, alcohol abuse, and susceptibility of developing lung cancer.

My point with the post was to offer a proof of concept and to reveal/interpret the data I got out of my Python analysis. I wanted to create a precedent so that others could freely use and improve my scripts and my approach.

Of course, if you’re a user of OpenSNP, you can gain a lot of insight by looking at your own genotype for this SNP (single nucleotide polymorphism) and correlate it with my findings. To see the exact details of what I did and to download the Python codes, go and read the post.

Anyhow, I decided to continue with another analysis.

This time I am looking into the group of users who (so far) shared (phenotype) information regarding their diabetes status. The initial number of users in this phenotype group was around 180, but not all of them shared their genetic information, which makes associations impossible.

Thus I only accounted for the users who shared both phenotype (diabetes status) and genotype (23andme raw genetic data) information. After cleaning the data, 115 users (subjects) made it into the analysis.

You don’t have to be researcher elite to recognize the first potential drawback: an unbalanced sample size: not enough subjects with a ‘diabetes’ status; in the greater scheme things, this is good!

Putting this aside, I decide to go on with the analysis, in the hope that as more people start using the platform, we’ll get a higher and more balanced sample size. When this happens, it will be much easier to repeat the analysis and we will probably gain much more insight from it.

In my first analysis I only looked at one mutation (SNP, rs1051730). Now I looked at 7 genetic loci (SNPs) that have been associated with diabetes in GWAS. I’ll go through them one by one.

If you are part of the users included in this sample (if you’ve submitted your 23andme genetic report to OpenSNP) and if your reported ‘diabetes’ phenotype is normal, please make sure to double check your position (genotype) for the SNPs I discuss below; in the likelihood you carry the risk alleles for these mutations, this should help you make appropriate lifestyle decision so that you can avoid developing Type 2 Diabetes or other conditions of the metabolic syndrome. I have made this report primarily for you.

Genetics and Diabetes

– mutation in the KCNJ11 gene, CC is the normal genotype
– T is the risk allele
– CT may be at 1.3x higher risk for developing Type 2 diabetes
– TT may be at 2.5x higher risk for developing Type 2 diabetes
– in our sample below, for those who are normal and carry CT or TT, I’d be on the lookout for the rest of the mutations associated with T2D as the effects could be additive

– mutation in the IGF2BP2 gene, AA is the normal genotype
– C is the risk allele, AC and CC may be at 1.2x higher risk of developing T2D
– many of you who reported the ‘normal’ phenotype carry a risk allele

– mutation in the PPARG gene, associated with diabetes and fat metabolism, CC is the normal genotype
– G is the risk allele, CG and GG should be careful when following high fat diets
– the majority of you reporting the ‘normal’ phenotype are CC

– mutation in the TCF7L2 gene, CC is the normal genotype and may be at lower risk of developing T2D and gestational diabetes
– T is the risk allele, CT may be at 1.4x higher risk of developing T2D, while TT may be at 2x higher risk of developing T2D
– a significant number of you who reported the ‘normal’ phenotype carry a risk allele

– mutation in the FTO gene, CC is the normal genotype
– A is the risk allele, AC and AA may be at 1.2x, respectively 1.4x higher risk for developing T2D
– same story as above: careful if you reported ‘normal’ and carry one of the risk alleles!

– mutation near position 22134095 on chromosome 9, CC is the normal genotype
– T is the risk allele, CT and TT may be at 1.2x higher risk of developing T2D
– about 2/3 of you who reported the normal phenotype are TT (highest risk)

– mutation in the SLC30A8 gene that codes for a zinc transporter protein, TT is the normal genotype
– C is the risk allele, CT and CC carriers may be at increased risk of developing T2D
– most (82 out of 89) of you reporting the normal phenotype carry one of the risks alleles CT or CC, while only 7 of you are TT

Concluding Thoughts

I’ve addressed the majority of my indications to those who reported the normal phenotype. If you look at your genotypes for these mutations and if you notice you carry risk alleles for one or more of them, please use this information to help you make better lifestyle decisions. Destiny is not set in genes; even if you carry the highest risk alleles for all these mutations, you can still avoid (to a small or to a larger extent) the risk of developing negative health conditions as long as you make good lifestyle decisions.

Ending…

One thing that came through my mind as I was building the python code was that as more users share their data, I could create a machine learning algorithm to run over it. More about that…well, when the time comes. Until then, I’ll try to look at other phenotypes and see if I can get more insight from the data.