Raw Data Technical Details

The raw data provided by 23andMe has undergone a general quality review however only a subset of markers have been individually validated for accuracy. The data from 23andMe’s Browse Raw Data feature is suitable only for informational use and not for medical, diagnostic or other use. Consult with a healthcare professional before making any major lifestyle changes.

The Browse Raw Data feature is provided for customers who are interested in additional research into their genome, but it may be of limited utility for many. The raw data provided by 23andMe is an advanced view of all your uninterpreted raw genotype data, including data that is not used in 23andMe reports. This data has undergone a general quality review however only a subset of markers have been individually validated for accuracy. The data from 23andMe’s Browse Raw Data feature is suitable only for informational use and not for medical, diagnostic or other use. Consult with a healthcare professional before making any major lifestyle changes.

How 23andMe Reports Genotypes

The 23andMe genotyping platform detects single nucleotide polymorphisms (SNPs). A SNP is a DNA location, or "marker," in the genome that has been shown to vary among people in terms of the DNA base or bases. There are four DNA bases: adenine (A), thymine (T), guanine (G), and cytosine (C). So, for example, at the same genomic location, you might have a C and someone else might have a T. These DNA base differences are known as "variants."

For most SNPs on the 23andMe platform, the 23andMe Raw Data feature reports the marker name (usually an rsID or internal ID number), its exact genomic location, the possible variants at that marker (A, T, G, or C), and the specific variants you have, i.e. your genotype. Because you have two sets of autosomal chromosomes -- one from your mother and one from your father -- you usually have two variants at every location, and your genotype will be reported as a pair of variants, e.g. "G/A."

In some cases your genotype will be reported as a single variant because not all DNA is inherited in chromosome pairs e.g., mitochondrial DNA and, for the most part, the X and Y chromosomes in men).

Occasionally, for some SNPs on the 23andMe platform, your genotype may be reported as an insertion or deletion (--) of DNA bases instead of just a simple variant pair. Depending on the genomic location, either an insertion or deletion could represent the typical version of the SNP. In other words, there are some markers in which having an extra base (insertion) is the typical variant and having a deletion is the less common variant. Conversely, there are some places in the genome where an insertion is rare, making a deletion the typical variant at that location.

23andMe does not report on all possible insertions or deletions. In general, the ones reported on are small, spanning only one or a few bases.

Reference Genome and Strandedness

23andMe results indicate SNP (single nucleotide polymorphism) positions and DNA bases based on the NCBI human reference genome (a standard version of the nucleotide sequence of the human genome). Both the raw data, as well as site features and reports, currently use human genome assembly GRCh37 (build 37).

DNA consists of two strands that are complementary to each other. The DNA base "A" always pairs with "T," and "G" always pairs with "C" across these two strands. One strand is called the positive (+) strand, and the other is called the negative (-) strand.

The genotypes displayed on the 23andMe website, including in the Raw Data feature, always refer to the positive (+) strand on build 37 of the human reference genome.

Be aware that other websites or publications may sometimes refer to the negative strand when reporting genotypes.

If the possible genotypes reported by 23andMe and another source do not match, it is likely that they are referring to complementary DNA strands rather than the same strand. For example, 23andMe might report that a SNP has two versions, G and A. But other sources may report that the versions for that SNP are C and T. Both ways of reporting the SNP are correct, because the G is paired with a C on the opposite strand, and A is paired with T.

Not Determined

In some cases, we are not able to provide a genotype result for a particular SNP. If results cannot be provided, you will see a ”not determined” message. In the downloaded raw data file, the entry for any uncalled SNP displays '--' instead of a two-letter genotype. If you see this result, our algorithm may not have been able to confidently determine your genotype at that marker. This can be caused by random test error or other factors that interfere with the test. Some “not determined” variants are expected in the raw data and are not a cause for concern.

RS Numbers (rsids)

The rsID number is a unique label ("rs" followed by a number) used by researchers and databases to identify a specific SNP (Single Nucleotide Polymorphism). It stands for Reference SNP cluster ID and is the naming convention used for most SNPs.

If a probe on our genotyping platform doesn't correspond to a SNP with a clear rsID, or the probe is assaying a DNA change that is not a known SNP so it doesn't have an rsID, then that marker is usually assigned an "internal" id ("i" followed by a number). Our researchers may have included some of these "custom" SNPs on our genotyping platform in order to maximize the number of 23andMe features available to customers, as well as to offer flexibility for future research.

In general, many SNPs labeled with an "internal" id in the Raw Data feature may not have a corresponding rsID in outside scientific literature or other third party services.