The 23andMe presenter (Nick Eriksson) delivered an overview of the potential of the 23andMe cohort for association studies: all 23andMe customers have genetic information for over 500,000 common genetic variants, and they are also encouraged to provide self-reported phenotype data on a wide range of traits ranging from the presence of detached earlobes to longitudinal tracking of Parkinson’s disease symptoms. Eriksson reported that the company now had sufficient numbers of returned surveys to perform genome-wide association studies for 22 traits, with sample sizes ranging between 2500 and 6000 individuals – reasonable sample sizes for an initial look at the genetic architecture of a complex trait.

The company seems to be doing a reasonable job of identifying and controlling for the various potential confounders that plague genome-wide association studies, such as population structure. However, 23andMe faces an unusual challenge that standard academic GWAS consortia don’t: the possibility that a subject will give a biased trait report after seeing their own genetic data.

This was powerfully illustrated by results from the “athlete gene” ACTN3 (a gene close to my own heart). There was no association between the athletic performance-associated variant in this gene and self-reported sprinter/endurance preference in individuals who hadn’t seen their genetic data – but in individuals who had already seen their genotype there was a marked shift towards carriers of the “sprint” or “endurance” allele self-identifying with those respective categories. In other words, people were altering their self-reported athletic affiliation on the basis of their genotype; Eriksson estimated that around 25% of individuals must be shifting their self-identification to explain the effect, a staggeringly large number.

Eriksson played down the potential impact of this effect, but this is still a rather worrying finding for a company relying on self-reported (and often quite subjective) phenotype data from a customer base that has often peeked at their genetic data before ever filling in a survey; at the very least there is potential for inflation of apparent association with known markers already in the 23andMe database. One way around this might be to provide some kind of incentive for customers to complete phenotype surveys before they ever see their genotype data, perhaps by providing discounts on future product updates.

Aside from this niggling concern, the major message from the talk is that 23andMe’s approach works in terms of generating genome-wide significant associations for complex traits: the company has successfully replicated a series of known associations with eye, skin and hair colour, for instance. More interestingly, 23andMe has also nailed down a handful of genuinely novel genetic associations: a massively significant association between an olfactory receptor region and “asparagus anosmia” (the inability to smell asparagus in one’s own urine), and two regions associated with hair curl.

These traits seem pretty trivial, but this is precisely the sort of area where 23andMe will be able to out-compete academic consortia, and these types of associations are also extremely (perhaps perversely) attractive to personal genomics customers; it’s just cool to be able to see the region of the genome that underlies a trait you can see in yourself, and to follow the inheritance of these traits through a family. These types of associations won’t contribute to clinical genetics, but they are likely to non-trivially boost 23andMe’s appeal to consumers.

Will 23andMe be able to uncover novel associations with a greater relevance to disease genetics? I suspect their impact here will be much more modest, at least in the near future; academic consortia are generally vastly more well-powered to pick up disease risk associations given their more stringent quality control and phenotype definitions. However, it’s important not to underestimate the importance of 23andMe’s ability to recruit and maintain an active base of participants, and their Facebook-like viral marketing appeal (in which customers have an incentive to recruit other people). This may make it possible for 23andMe to tap long-term phenotypic change, such as the progression of symptoms in patients suffering from diseases such as Parkinson’s.

It’s been interesting to watch the perception of the genomics community towards 23andMe shift over time. There’s still some hostility out there – and indeed, the first question directed towards Erikson was a needlessly combative and rather incoherent question about the ascertainment bias in 23andMe’s sample towards wealthier individuals – but the strangeness of the 23andMe model is starting to wear off, and presentations like this one will no doubt help to convince scientists that this is a company that at least is capable of doing solid science.

There’s one other small nugget of data worth mentioning. It’s always been hard to get a solid estimate of the number of customers in 23andMe’s database, but we now have a conservative lower bound: the company has at least 6,000 unrelated individuals of European ancestry enrolled who have taken phenotype surveys, suggesting a total active (i.e. engaged in phenotype surveys) customer base substantially higher than this. I don’t think this number would surprise many regular readers, but it’s a useful antidote to the sorts of ridiculously low recruitment numbers I’ve heard quoted by personal genomics critics.