Friday, January 30, 2015

Visualizing one’s DNA

23andme has been around for years, and offer relatively cheap service of DNA analysis. Their tech allows anyone to see health issues related to their specific DNA composition, as well as familial connections and history. As a customer of 23andme, I enjoyed the info they provided, but I was also looking for a way to visualize my own unique DNA…similar to how you’d see it on crime shows like CSI.

Unfortunately, 23andme doesn’t offer any info or service on doing this, so I had to investigate myself. Thankfully, they DO allow you to download your “raw” DNA, which is a 20+ MB text file with tons of rows like this:

RSID

chromosome

position

genotype

4477212

1

82154

AA

3094315

1

752566

AA

3131972

1

752721

GG

12124819

1

776546

AA

11240777

1

798959

AG

6681049

1

800007

CC

4970383

1

838555

AC

4475691

1

846808

CT

7537756

1

854250

AG

13302982

1

861808

GG

1110052

1

873558

GT

The table is actually something along the line of almost a million rows, so turning all this to something visible isn’t that simple. The first question is WHAT is it that we want to visualize. Well, DNA is composed of 4 nucleotides – A, C, G and T, but it’s far more complex than even those million lines. In fact, a full DNA sequence is typically several BILLION items long (if you stretched it out, it would be 2-3 meters long, yet only 2.5 Nanometers wide (if you scaled it up to the thickness of a human hair, which is the smallest thing we can see without a magnifier, it would be about 100 kilometer long!)

Research has found that only a small fraction of the DNA is actually used in our “construction”, and the rest is just filler. When DNA is scientifically analyzed, all that filler is disregarded, and scientists have created a database of DNA pieces that “matter”. In this database, each such piece is numbered, and is known as a “Single Nucleotide Polymorphism” or SNP for short (it’s pronounced “snip”).

When a person’s DNA is analyzed, like 23andme does, instead of just listing out your entire DNA, they match your nucleotides to those listed in the database, and the result is a long list of those. If we look again at the table above (my 1st chromosome), we can see that nucleotides from position 1 through 82153 were disregarded, but the nucleotide in position 82154 is significant, and having AA there was recorded as SNP number 4477212 in the database (anyone can look it up here: http://www.ncbi.nlm.nih.gov/SNP/) . After that, we jump ahead over 600,000 nucleotides to reach another one.

23andme doesn’t actually give you the ENTIRE DNA, so even if I wanted to visualize the whole thing, I wouldn’t be able to. Even the reduces set is a bit much, as it’s almost a million records, each with a SNP ID between 0 and about 80 million. What I elected to do is represent each SNP ID with a color in the range of 16,777,216 shades a computer can display. In case you weren’t aware, a computer displays color as a combination of Red, Green and Blue, with each color ranging on a palette from 0 to 255. The range of colors goes from 0 Red, 0 Green and 0 Blue to 255 Red, 255 green and 255 Blue. Technically speaking, this type of visualization is for entertainment purposes only, and has NO scientific accuracy or value. You would never be able to convert the graphics back into a real DNA sample.

To actually produce this, I’ve created a simple program, which processes one’s DNA file and creates one of 4 styles of visualization. You could choose to use any or all, and do whatever you like with them – use as Wallpaper, print on a shirt, mousepad or poster, or anything else. Here are two of the visualizations:

Depending on demand and feedback, I might come up with additional variations of the visualizations. Here are the instructions:

1. To get the application, click here. It’s 135KB, and there’s no installer…just drop it somewhere on your hard drive.

4. On the top-right, click Download. You will be asked to sign-in again.

5. Download the file as ZIP, and expand it.

6. Launch my application, and browse to the text file you extracted from your ZIP

7. Click Pre-process, and wait for it to complete (should be about 5 seconds)

8. Click on the samples to see the visualizations, and then on one of the corresponding buttons

9. Take a screenshot, and paste it into your favorite graphic application (I highly recommend Pain.net)

***I should note that since DNA files don’t really come-around easily, I was only able to test this with my own DNA. I can’t promise the app won’t choke, crash or fail to process your own file. If it does, though, I would appreciate it if you contact me through the form on the right and let me know, so I can fix up whatever error there is.