Citizen Science Genomics as a Model for Crowdsourced Preventive Medicine Research

Abstract

Summary: A research model for the conduct of citizen science genomics is described in which personal genomic data is integrated with physical biomarker data to study the impact of various interventions on a predefined endpoint. This research model can be used for large-scale preventive medicine studies by both institutional researchers and citizen science groups. The genome-phenotype-outcome methodology comprises seven steps: 1) identifying an area of genotype/phenotype linkage for study, 2) conducting a thorough literature review of data supporting this genotype/phenotype linkage, 3) elucidating the underlying biological mechanism, 4) reviewing related studies and clinical trials, 5) designing the study protocol, 6) testing the study design and protocol in a small pilot study, and 7) modifying study design and protocol based on information from the pilot study for a large-scale prospective study. This paper describes a real-world example of the methodology implemented for a proposed study of polymorphisms in the MTHFR gene, and how these polymorphisms may influence homocysteine levels and vitamin B deficiency. The current study looks at the possibility of optimizing personalized interventions per the genotype-phenotype profiles of individuals, and tests the hypothesis that simple interventions may be effective in reducing homocysteine in individuals with high baseline levels, particularly in the presence of a polymorphism in the MTHFR variant rs1801133.Keywords: MTHFR, homocysteine, genomics, polymorphism, variant, citizen science, patient-driven clinical trial, crowdsourced clinical trial, research study, self-experimentation, intervention, personalized medicine, preventive medicine, participatory medicine, quantified self, genome-phenotype-outcome study, citizen science genomics.Citation: Swan M, Hathaway K, Hogg C, McCauley R, Vollrath A. Citizen science genomics as a model for crowdsourced preventive medicine research. J Participat Med. 2010 Dec 23; 2:e20.Published: December 23, 2010.Competing Interests: The authors have declared that no competing interests exist.

Introduction

Continually decreasing costs in genomic sequencing have made it possible for individuals to obtain their own genomic data. An estimated 80,000 individuals have subscribed to consumer genomic services. Genotyping provider 23andMe counted 50,000 subscribers as of June 2010.[1] Navigenics and deCODEme had an estimated 20,000 and 10,000, respectively, as of March 2010.[2] Others may be clients of Pathway Genomics or other services. Today, individuals can view the 200 or so variants analyzed by consumer genomic companies for a variety of disease, drug response, trait, and carrier status conditions via a web-based interface, and a question naturally arises as to what else can be done with the data.

Tools do not yet exist to identify and prevent disease before clinical onset. Integration of genomic, phenotypic, environmental, and microbiometric health data streams will be required to create reliable predictive tools. The potential volume of this data is staggering, numbering, perhaps, a billion data points per person,[3] which may routinely generate zettabytes of medical data.[4]

The combination of multiple health data streams, the anticipated data deluge, and the challenges and expense of recruiting subjects for studies all suggest that there could be a benefit to supplementing traditional randomized clinical trials with other techniques.[5] Crowdsourced cohorts of citizen scientists (eg, patient registries) could be a significant resource for testing multiple hypotheses as research could be quickly and dynamically applied in various populations. Engaged citizen scientists could collect, synthesize, review, and analyze data. They could interpret algorithms, and run bioinformatic experiments. This paper proposes a research model that could be used in conducting citizen science genomics, that integrates personal genomic data with physical biomarker data and interventions, and that could be applied in large-scale preventive medicine studies by both institutional researchers and citizen science groups.

Methods

An increasing number of individuals have access to their own genomic data, would like to contribute this data to scientific research, and would like to put it to use in managing their own health. Scalable models for conducting citizen science studies are needed. The authors designed a methodology for the conduct of citizen science genomics which links genomic data to corresponding phenotypic measures and relevant interventions. The purpose is to create mechanisms for establishing and monitoring baseline measures of wellness, and tools for the conduct of preventive medicine. The key steps in the methodology include:

1. Selecting a specific area of genotype/phenotype linkage for potential study and generating a testable hypothesis
2. Conducting a literature review to validate the selected study area
3. Analyzing the underlying biological pathway and mechanism
4. Reviewing related studies and clinical trials
5. Designing the study protocol
6. Testing the study design in a small non-statistically significant pilot
7. Identifying the next steps for a full-scale launch of the study

Results

The results are presented as a detailed outline of the seven-step methodology for operating citizen science genomic studies. The methodology is implemented in the specific case of a proposed study looking at polymorphisms in the MTHFR gene and how these polymorphisms relate to homocysteine levels and vitamin B deficiency.

1. Select a specific area of genotype/phenotype linkage for potential study and generate a testable hypothesis.

For the inaugural citizen science genomic study, 40 potential ideas were identified in a variety of health and behavioral genomic areas in recently published research (http://diygenomics.pbworks.com). One area that seemed conducive to study was the potential association of the MTHFR gene and vitamin B deficiency. MTHFR polymorphisms may keep vitamin B-9 (folic acid) from being metabolized into its active form, folate. This may lead to the potentially harmful accumulation of homocysteine. There is a strong research-supported association between the principal MTHFR variant (rs1801133) and homocysteine levels.[6] Genotyping data for MTHFR variants are available in 23andMe data. Furthermore, blood tests for homocysteine, vitamin B-12, and vitamin B-9 are readily obtainable, as are over-the-counter vitamin supplement interventions. A testable hypothesis was generated that supplements may be effective in reducing homocysteine levels, particularly for those with a genetic polymorphism.

Studying MTHFR and vitamin B deficiency could have an important public health benefit since approximately half of the US population is estimated to have one or more MTHFR polymorphisms. The distribution of genotypes in the US for rs1801133 is 49% CC (homozygous normal), 40% CT (heterozygous), and 11% TT (homozygous risk).[7] In addition, vitamin B-12 deficiency is a common nutritional deficiency in both the US and the developing world,[8] particularly for the elderly and vegetarians (approximately 3% of the US population).[9]

The majority of published literature relates to cardiovascular disease. A meta-analysis of 30 prospective and retrospective studies (involving a total of 5,073 ischemic heart disease (IHD) events and 1,113 stroke events) showed that a 25% lower homocysteine level was independently associated with an 11% lower risk of coronary heart disease and a 19% lower risk of stroke.[10] Despite this, the causal relationship between elevated homocysteine and cardiovascular outcomes has not been conclusively proven. A large (n=12,064), recently published (June 2010), prospective, randomized study of patients with a prior myocardial infarction provided either folic acid or vitamin B-12 supplementation compared to placebo. The authors tracked coronary events over an average of 6.7 years. They found an average reduction of 28% in plasma homocysteine levels, but no difference between the vitamin group and placebo group in the occurrence of coronary events or death.[15] However a prospective, randomized study of the impact of homocysteine levels on the progression of atherosclerosis showed that folic acid supplementation led to reduced homocysteine levels and a regression in carotid intima-media thickness (CIMT) compared to an increase in CIMT for the placebo group.[16]

Although more research is needed, there appears to be adequate evidence that low homocysteine levels are desirable, and may reduce risk for a number of conditions.

3. Analyze the underlying biological pathway and mechanism

The MTHFR pathway and homocysteine metabolism are the underlying biological mechanisms in this study. There are a number of ways in which genetic variation and intervention may impact homocysteine metabolism. Homocysteine is a naturally-occurring amino acid in the blood which is broken down (metabolized) through three interconnected pathways: the folate cycle, methionine cycle, and transsulfuration pathway (Figure 1).[17] A detailed explanation of homocysteine metabolism is presented in the Supplementary Material. The pathways are fairly complex and involve two other enzymes in addition to MTHFR. It is possible that different interventions could impact overall homocysteine metabolism in different ways. In Figure 1, the red boxes show the different places where the first intervention (the inactive form of B-9, further described below) may impact the pathway; the green box shows where the second intervention (the active form of B-9) may impact the pathway.

Figure 1: Homocysteine metabolism.

4. Review related studies and clinical trials

Several clinical trials have been conducted to investigate the ability of interventions to lower homocysteine levels. A detailed review of nine studies was conducted and is presented in the Supplementary Material. The average overall result was to lower homocysteine by 23%. Two studies[18][19] specifically compared folic acid with the active form of folate, 5-MTHF (5-methyltetrahydrofolate). Both found that the active formulation was most effective in reducing homocysteine levels (Akoglu 37% versus 24%;[18] Lamers 19% versus 12%[19]).

The existing clinical trials suggest that several factors may influence baseline homocysteine levels, in particular, age, health status, and genotype. Individuals who were older (especially over 50), had just experienced a major health disruption, or had one or more polymorphisms in the main MTHFR variant rs1801133, were more likely to have higher baseline homocysteine levels than those that did not (Supplementary Material: Figure 2). Further, the reduction proportion from the baseline level was greater for those individuals with higher initial levels of homocysteine.

5. Design the study protocol

The required genomic and phenotypic data were identified. Approximately 20 variants have been linked with homocysteine in genome-wide association studies.[20] MTHFR 677C>T (rs1801133) was selected as the variant with the strongest association to mild enzyme deficiency, and MTHFR 1298A>C (rs1801131) as the leading secondary variant.[6] The corresponding phenotypic measures selected were blood tests for homocysteine, vitamin B-12, and folate (vitamin B-9).

The type and timing of interventions were determined based on published literature. The background research on the MTHFR mechanism suggests that individuals with one or more polymorphisms may not be able to metabolize folic acid (the inactive form of B-9) into its active form (tetrahydrofolate or folate), as efficiently as individuals without a polymorphism. Therefore, the first intervention selected was administration of the inactive form of B-9, which is commonly present in over the counter B vitamin products such as Centrum multivitamins. The second intervention involved administration of the active form of folate (L-methylfolate), and the third was administration of the inactive and active forms together (also being tested by a current clinical trial).[21] The supplement contents were as follows: the Centrum multivitamin contained 2 mg of pyridoxine hydrochloride (B-6), 400 mcg of folic acid (B-9), and 6 mcg of methylcobalamin (B-12); the Life Extension Foundation L-methylfolate contained 1,000 mcg of L-methylfolate. The interventions were to be taken on a daily basis, at the same time of day, with food.

For this pilot phase of the study, the authors opted to use a crossover study design. Each individual tried each intervention, in sequence, essentially serving as his or her own control. While other homocysteine clinical trials typically had at least four-week periods for testing interventions, two representative trials confirmed that most of the observed effect occurred within the first two weeks.[18][19] Therefore in the pilot study, two-week minimum intervention periods were selected with a two-week washout period at the beginning.

Participant recruitment was accomplished by talking about the study in public speaking engagements and targeting special interest groups such as the DIYbio, Quantified Self, Health 2.0, Singularity University, futurist, and life extension communities, particularly 23andMe clients. Some potential participants were motivated to sign up for 23andMe in order to participate in citizen science genomic studies. Many potential participants were interested, but did not join the study for a variety of reasons. The biggest barrier was the self-supported cost of blood tests and supplement interventions ($291). In a full-scale launch, other strategies will be necessary to target a more representative segment of the population.

6. Test the study design in a small non-statistically significant pilot

To test the study design, a small non-statistically significant pilot study was conducted in three phases: execution, results collection, and results analysis. The type of analysis that could be conducted on data results is presented here, realizing that the pilot cohort sample size (n=7) is not statistically significant.

Seven healthy men and women, ages 26-47, who had not taken any vitamin supplements for two weeks or more and met other usual study exclusion criteria, were enrolled in the study. The study was conducted from June to December 2010. Three participants cycled through the study at nearly exact two-week intervals. Three participants went through the study in two- to three-week periods on average, and one participant specifically tested three-week intervals. Six participants ordered blood tests from the Life Extension Foundation as they offered the lowest cost, and lab work orders were fulfilled at local LabCorp (standardized testing) facilities in the US. The remaining participant had homocysteine levels tested at a Japanese medical facility in Tokyo. The L-methylfolate supplement was mail-ordered by the group from the Life Extension Foundation. The Centrum multivitamin was purchased individually at local drug stores. All seven of the study participants collaborated in the study design or an active review of the protocol.

The study relied on self-reporting that the supplement protocol was followed. Participants tried to avoid unusual variance in nutrition, exercise, stress levels, sleep, and other behaviors. Participants looked up their genotype data for the relevant MTHFR variants in their 23andMe data files (genotyping is assumed to be accurate[22]), and recorded them in the study’s public wiki (http://diygenomics.pbworks.com/MTHFR_Results). Blood test measurements from LabCorp PDFs or other reports were entered similarly in the public wiki. All participants were interested in full transparency and public accessibility of their genotypic and phenotypic study results, and allowed their names to be associated with the study. Participants were enumerated as Citizen 1, 2, etc with their initials.

Genotype results: Table 1 lists the pilot study participants and their genotype data for the two reviewed variants. For the main associated variant, rs1801133, three participants are homozygous normal (GG) and four are heterozygous (AG). Two of the heterozygous participants are also vegetarians/vegans which further increases their potential risk of vitamin B deficiency. For the secondary variant, rs1801131, two participants are homozygous normal (TT), four are heterozygous (GT), and one is homozygous for the polymorphism (GG). The table then includes maternal and paternal haplotype group information from 23andMe and demographic information regarding participant ethnicity, gender, age, and vegetarian status.

23andMe’s genotype reporting method (all genotypes are listed as their forward strand values) means that sometimes their genotyping values need to be mapped to other conventions for interpretation. Commonly used resources for obtaining major/minor allele mappings indicate C/T as the major/minor alleles for rs1801133, and A/C for rs1801131 (dbSNP;[23] SNPedia;[24] HuGE Navigator[7]). The mapping of the alleles from the standard resources to 23andMe would be that rs1801133 C/T is G/A in 23andMe data, and rs1801131 A/C is T/G in 23andMe data (C maps to G and vice versa; A maps to T and vice versa). The mapping was confirmed by comparing deCODEme, Navigenics, and 23andMe data files for the same individuals, and by reviewing genotype prevalence across multiple 23andMe files.

Table 1: Genotype results and demographic profiles.

Phenotype results: Figure 2 and Table 2 illustrate how homocysteine levels shifted during the pilot study. Table 3 contains the blood test data for vitamin B-12. At baseline, homocysteine levels ranged from 6.4 – 14.1 µmol/L. The cohort mean was 10.4 (SD (standard deviation) 3.03), and was higher for vegan/vegetarian individuals with a polymorphism in rs1801133 (12.8 versus 9.5). After the first intervention (Centrum multivitamin), homocysteine went down for six individuals and up for one individual, and had a tighter range (5.7-10.6; mean 8.8; SD 1.50).

After the second intervention (L-methylfolate), homocysteine was higher for five individuals, including the four with a polymorphism, and lower for two (mean 10.3; SD 2.77). For the four individuals that included a plasma folate test, levels were at or above the high point of the test reference range (19.9 mg/mL) (Supplementary Material – Table 3) after the second intervention. After the third intervention (Centrum multivitamin + L-methylfolate), for three of the four participants who tried it, homocysteine was higher than with L-methylfolate alone. In the final step, five individuals completed an ending washout blood test, with three participants, including two with a polymorphism, having lower homocysteine levels than after the third intervention. The fourth participant had slightly higher homocysteine, and the fifth participant had markedly higher homocysteine as compared with the last intervention tried, the L-methylfolate. For three out of four participants that included the vitamin B-12 test (Table 3), B-12 levels went up an average of 17.5% after the first intervention, and one participant’s went down 17%. B-12 movement then generally progressed flat or with a slight increase for the duration of the study.

An analysis of the test data results was performed to calculate the percent declines for each period from baseline and for each period relative to the prior period (smoothing was employed for one missing value). There was a 19% average decline in homocysteine for the best solution in any period versus the baseline (Table 2) and a 21% average decline in homocysteine for the best solution in any period versus the prior period. There was not a significant difference between homozygous normal individuals (GG) for the main variant rs1801133 (18% average reduction) and heterozygous individuals (AG) (19% reduction), but the two vegan/vegetarian heterozygous individuals experienced a 28% average reduction. In a larger study that investigated genotype polymorphisms, a difference was found in having greater reduction in heterozygous subjects (12% versus 9%).[25] The secondary variant, rs1801131, did not seem to have an impact, either in isolation or when considered together with rs1801133.

Discussion

The overall result seen in this was a 19% average reduction in homocysteine levels. While not statistically significant, this is consistent with the 23% average reduction achieved in reported clinical trials.

While a homocysteine range of 0.0-15.0 µmol/L is considered clinically normal, many scientists contend that lower levels are preferable. Suggested preferred levels are less than 11.4 µmol/L for men and less than 10.4 µmol/L for women in one paper cited.[26] According to these measures, four out of the seven pilot participants had high baseline homocysteine levels which they were able to meaningfully reduce with supplement interventions.

The best intervention for five out of seven individuals was the regular B vitamin as opposed to the active form of B-9 (folate). The active form of B-9 worked better for one individual. The remaining individual, having a homozygous minor variant form of rs1801133, did not have high initial homocysteine and found that the active form of B-9 was better than the regular B vitamin, but that no intervention was best. This suggests that targeted solutions may be optimum for groups of individuals with certain profiles.

The biggest question was why the blood test values for homocysteine increased in five individuals (three with high baseline homocysteine; four with a polymorphism) after B-9 when other clinical trials found the active form of B-9 to be the superior intervention for lowering homocysteine. Participant behavior was generally consistent, and the reproducibility of testing results (within-person, between-person, and in labs) was also confirmed (Supplementary Material: Variability in homocysteine test results). Variation could have been introduced by a number of factors including the natural variability in homocysteine levels, variability in the active ingredient amounts in the intervention supplements, carryover effects between interventions (also evidenced by the lack of homocysteine levels returning to baseline levels in the final washout cases), or other complexities related to the homocysteine pathway.

This pilot study represents an example of how genomic-phenotypic-outcome research can be conducted in the era of personalized genetic data availability. It also illustrates the potential importance of including genomics as a data element in preventive medicine research, and illustrates the potential of using motivated individuals in citizen science genomic studies. Several participants also indicated the value of their experience and how it translated into post-study behavioral changes (Supplementary Material: Personal statements from study participants).

7. Identify the next steps for a full-scale launch of the study

There are a number of steps required for a full-scale cohort launch including implementing an independent ethical review and informed consent process, adjusting the study protocol, forming strategies for study financing and representative population targeting, and creating a data collection and analysis platform:

Independent Ethical Review and Informed Consent
Citizen science genomics is human subjects research and as such, should have independent ethical review and oversight. There are at least two independent review boards in the US which have indicated their willingness to discuss the potential review of citizen science studies: IRC, Independent Review Consulting, Inc., in Corte Madera, CA (http://www.irb-irc.com) and WIRB, the Western Institutional Review Board, in Olympia, WA (http://www.wirb.com). A related model of consumer genomic research conducted by 23andMe[27] brought up a number of ethical issues,[28] and ultimately IRC reviewed their study. As citizen science models develop, oversight models could evolve to include citizen ethicists, citizen review boards, health advisors (analogous to financial advisors), and insurance mechanisms for personal health experimentation communities. Informed consent would be an obviously required process to include in any full-scale human subjects research study.

Protocol Adjustment
The pilot study confirmed that the central point for investigation in a full-size cohort is whether interventions can be optimized according to the genotype-phenotype profiles of individuals. The pilot study also suggested that a regular B vitamin may be most effective in lowering homocysteine in individuals with high baseline homocysteine levels, especially in the presence of one or more rs1801133 polymorphisms. A number of structural changes could be made to improve scientific rigor in a broader launch, including participant blinding, inclusion of a placebo arm, and standardized monitoring, testing, and interventions.

Strategies for Funding and Representative Population Targeting
To date, citizen science genomics has relied on the study recruitment pool being the limited number of individuals (approximately 100,000) who have subscribed to personal genotyping services. These individuals may not be representative of the population at large; the literature characterizes direct-to-consumer genomic customers as early adopters and self-driven information seekers.[29][30] For widespread public health studies, it will be necessary to target a broad diversity of participants across multiple dimensions including information-seeking and action-taking propensity, ethnicity, and socioeconomic background. To accomplish this, traditional recruitment techniques could be used together with new patient-centered social media strategies.

Conclusion

This paper presents citizen science genomics, a research model contemplated for large-scale execution of preventive medicine research in crowdsourced cohorts. The model integrates personal genomic data with physical biomarker data to study the impact of various interventions on a predefined endpoint. Citizen science genomics could allow both traditional researchers and citizen scientists to access crowdsourced subjects who are ready to engage in research studies. Citizen scientists could be important resources as they increasingly have access to their health information, may be willing to contribute their data to various studies, have the interest and motivation to investigate conditions of personal relevance, and can leverage crowdsourced labor for data collection, monitoring, synthesis, and analysis, and new tool development.

Preventive medicine is a key public health challenge in the coming decades. New models like citizen science genomics are needed to answer important questions. Dropping prices and new technologies for collecting data regarding microbiomes, proteomics, imaging, personal tracking, and other information streams will increase the feasibility of this approach. Preventive medicine has the potential to take on new relevance and meaning through the use of citizen science genomic studies, as crowdsourced participants establish baseline and ongoing longitudinal measures for wellness, health maintenance, and customized intervention.

Acknowledgments

The authors would like to acknowledge Takashi Kido and William Reinhardt for sharing their genotypic and phenotypic data, and many individuals who shared their genetic data for research purposes including David Orban, Geoffrey Shmigelsky, Eri Gentry, Todd Huffman, Fadi Bishara, Richard Leis, Jr., Mark Even Jensen, Misha Angrist, and several parties whom wish to remain anonymous. We would like to acknowledge Lyn Powell and Lucymarie Mantese for their advisory contribution and study support.

Hartwell L. The promise and progress of personalized medicine. Paperresented at the Sandra Day O’Connor College of Law Personalized Medicine Conference; March 8-9, 2010; Scottsdale, AZ. Available at: http://online.law.asu.edu/events/Personalized_Medicine. Accessed September 20, 2010. ↩

Leave a Reply

Welcome

Participatory Medicine is a model of cooperative health care that seeks to achieve active involvement by patients, professionals, caregivers, and others across the continuum of care on all issues related to an individual's health. Participatory medicine is an ethical approach to care that also holds promise to improve outcomes, reduce medical errors, increase patient satisfaction and improve the cost of care. Learn more...

Stay Connected

In a study report hitting the digital wires on Health Affairs at 4pm Eastern time today (March 2, 2015), a group of researchers are reporting the results of a longitudinal study of Patient Activation Measure (PAM) impact on cost and outcome metrics from a large study cohort. The results show that activated, engaged patients have better outcomes, at […] […]

We’d bet good money that anyone who identifies as an e-patient has been led to believe that their desire to participate actively in their medical care marks them as a “demanding patient.” The perception of demanding patients is that they’re behaving like spoiled divas at a medi-spa, with their demands driving up incidence of unnecessary […] […]

Recommended Reading

Prepared Patient Blog featuring Jessie Gruman, PhD, president and founder of the Center for Advancing Health. Jessie was a founding editor of JoPM and continues to have a positive impact on the field of participatory medicine and the cause of patient empowerment.