Tuesday, April 23, 2013

Genome on a Hard Drive

I ignored all of 23andMe's warnings about loss of security and downloaded my raw chromosome/SNP-level genome data from their servers onto my hard drive. It's a 10 Megabyte text file which starts at chromosome 1 and ends with X & Y chromosomes, and my mitochondrial DNA. Here is how the file starts.

# This data file generated by 23andMe at: Tue Apr 23 09:13:29 2013## Below is a text version of your data. Fields are TAB-separated# Each line corresponds to a single SNP. For each SNP, we provide its identifier# (an rsid or an internal id), its location on the reference human genome, and the# genotype call oriented with respect to the plus strand on the human reference sequence.# We are using reference human assembly build 37 (also known as Annotation Release 104).# Note that it is possible that data downloaded at different times may be different due to ongoing# improvements in our ability to call genotypes. More information about these changes can be found at:# https://www.23andme.com/you/download/revisions/## More information on reference human assembly build 37 (aka Annotation Release 104):# http://www.ncbi.nlm.nih.gov/mapview/map_search.cgi?taxid=9606## rsidchromosomepositiongenotypers44772121 82154AArs30943151 752566AArs31319721 752721GGrs121248191 776546AArs112407771 798959GGrs66810491 800007CCrs49703831 838555ACrs44756911 846808CTrs75377561 854250AGrs133029821 861808GGrs11100521 873558GT

... and so on for 16,563 pages ...

In Microsoft Word it takes ten minutes to open due to its vast size and occupies 20 Megabytes.

As you can see, to the human eye this enormous heap of data is both incomprehensible and useless, but there are obviously tools - programs - which can access it. These were used by 23andMe itself to profile my health risks, inherited traits and ethnicity/ancestry.

The insurance company argument has these organisations running their programs over your genome and raising your premiums (or denying you cover) for heritable conditions. This is meant to be illegal in some places but that won't stop them.

The police/security argument is that these agencies would like nothing more than everyone's full genotype publicly available (on Facebook?) because (i) it makes DNA matching so much more powerful; and (ii) as 23andMe show, you can get a lot of phenotype even from today's restricted genotyping (i.e. 23andMe know a scary amount about me just from running their analysis).

The personal identity argument is that, assuming a benign prenatal and childhood environment, most of the key facts about personal identity are gene-encoded (how could they not be - we were built by these things). So personality, intelligence, appearance, height and even many social attitudes are heavily influenced by the genome: see this surprising graphic from here where genetic contribution is to the right in blue.

Graphics like this are built from twin studies, not genomic analysis, but there is intense ongoing research looking at the specific gene-variants (alleles) which drive such phenotypical characteristics. At some stage after the research is in, there will be tools which can grab your or my genome and read off this kind of rather personal information.

I guess those nice folk from the security and police services will be first in line, followed by employers and then potential life partners. Actually, the line could be in any order!

I ignore really futuristic options open perhaps to our great-great-grandchildren to clone their genome-publicising ancestor either in virtuality or in the flesh! (I wrote about this at science-fiction.com).