16 March 2010. The study of schizophrenia genetics, like all of human genetics, has been driven forward in large part by successive technological advances. While early linkage techniques were sufficient to locate the single-gene causes of Mendelian disorders, more power was needed to parse the causes of diseases with more complex patterns of inheritance. The Human Genome Project opened the door to high-throughput, factory-style sequencing and genetic analysis. After its completion in 2003, and the first haplotype map in 2005, the floodgates opened to a torrent of new data on human genetic variation.

These new methods gave researchers exceptional power to investigate the genetic variation that underlies individual risk for complex diseases. Family-based studies were out; association studies using case-control designs and hundreds to thousands of unrelated subjects were in. The assumption underlying genomewide association studies (GWAS) is that common gene variants (that is, those present in more than about 5 percent of people) contribute in small but significant ways to risk of disease in a population (the "common disease/common variant" hypothesis as proposed by Reich and Lander, 2001). The variants are marked by single nucleotide polymorphisms (SNPs); by testing enough SNPs in enough people, these common, low-risk variants can be found.

These days, a typical GWAS involves genotyping thousands of patients and controls using chips covering half a million to a million common SNPs. Then, the search is on for genotypes that differ in frequencies between control and disease groups. Because of the massive number of comparisons that are done, stringent statistical treatment is required to weed out false positives, and the accepted threshold for genomewide significance in SNP frequency is a P value less than 5 x 10-8. Effect sizes in these studies tend to be small—in other complex diseases like diabetes or heart disease, highly significant SNPs increase risk of disease by a few percent, and finding these small effects has required study sizes of 8,000 to 20,000 cases, plus their controls (for example, see reviews by Arking and Chakravarti, 2009 on cardiovascular disease, or Kronenberg, 2008 on diabetes, atherosclerosis, and cancer).

The early returns: little in common?
For diseases of unknown etiology like schizophrenia, GWAS are attractive because they are agnostic. Unlike candidate gene studies, GWAS do not depend on prior knowledge or guesses regarding which genes might be important. In contrast to linkage studies, which require collecting data from affected families, GWAS can be done in a case/control design with unrelated subjects taken from the general population.

Nonetheless, the early GWAS data in schizophrenia were disappointing. In the first seven genomewide studies, ranging in size from a few hundred to a few thousand patients, not one reported any statistically significant SNPs for schizophrenia alone (for a thorough review of the studies preceding the most recent GWAS, see Owen et al., 2009). Furthermore, promising SNPs in one study often failed to replicate in other studies.

In the largest study (total of 6,286 cases and 12,993 controls), one SNP in the zinc finger gene ZNF804A, a putative transcription factor, did achieve significance when schizophrenia and bipolar patients were analyzed together (O’Donovan et al., 2008; see SRF related news story). As expected, the effect size was quite small (odds ratio 1.09, or about a 9 percent increase in risk). However, coauthor Michael Owen of Cardiff University, Wales, said that this study established the proof of principle of GWAS in schizophrenia. “There are many more risk genes that we’ll find, and that’s been proven by this approach,” he told SRF.

Then, in July 2009, a trio of long-awaited GWAS, each analyzing between 2,600 to 3,300 cases, was published online in Nature. None of the three detected any genomewide significant SNPs, but one group’s meta-analyses of the top SNPs in 8,000 patients and more than 19,000 controls pointed to a region of chromosome 6. The region, which includes genes of the major histocompatibility locus, has been previously implicated in linkage studies (Lewis et al., 2003). The studies also found evidence to support the ZNF804A SNP identified by O’Donovan and colleagues.

The response to the studies was mixed. Some called the exercise a failure; others saw it as an advance. In comments to SRF, Daniel Weinberger of the National Institute of Mental Health in Bethesda, Maryland, termed the outcome “decidedly disappointing,” while in a press conference, study coauthor Michael O’Donovan of Cardiff University, Wales, called the work “a significant increment in knowledge” over the zinc finger gene SNP (see SRF related news story and complete text of Weinberger comment, as well as other comments on the subject). The mixed reviews reflect an ongoing and complex controversy, which stems as much from funding questions—should large sums be directed to GWAS at the expense of other types of research?—as from scientific debate about the contribution of common gene variants to schizophrenia and other mental illnesses. (Part 4 of this series will take up that question in more detail.)

A new picture
What was clear from those studies was that, with thousands of patients recruited and billions of SNPs genotyped, no single gene or allele was likely to contribute a great deal to the risk of disease in a large number of people. “If there was a truly common variant, the technique would have found it,” said David Porteous, of the University of Edinburgh in the United Kingdom. He added, “If you were looking for a single, defining marker in the general population of individuals with schizophrenia, there isn’t one. That is for certain now.”

Instead, the evidence points to many genes—hundreds and perhaps even thousands—that each contributes a small amount to the cause of schizophrenia. The data to date suggest that common variants account for just a fraction of the risk of schizophrenia in a population. Estimates of that fraction range from a low of 4 percent (Purcell et al., 2009) to a generous 30 percent, says Porteous; there is no wide agreement on the number. Writing about the GWAS published in Nature in 2009, Kevin Mitchell, Trinity College, Dublin, opined on SRF that, “Based on the meager haul of common variants dredged up by these three studies and their forerunners, this [common variants] hypothesis should clearly now be resoundingly rejected” (see full text of Mitchell’s comment on SRF related news story).

Richard Straub of the National Institute of Mental Health, Bethesda, Maryland, put it more gently. “The common disease, common variant hypothesis has really not panned out all that well, probably due in large part to the lack of any serious attempts in GWAS publications to examine haplotypes and to model gene-gene and gene-environment interactions, which we know are tremendously important in the architecture of risk,” he said. But, he allowed, “The complexity is deep—it is a puzzle where you don’t know how best to solve it overall until you’ve learned from solving some of the components.”

Glass half empty or half full?
Others argue that GWAS has already led to significant insights. Kenneth Kendler, of Virginia Commonwealth University in Richmond, thinks the failure to find truly common risk factors turns out to be one of the more informative results from GWAS, because it reveals something about the genetic architecture of schizophrenia. “It is telling us about a deep background polygene signal,” he said. That means that estimating individual risk will not be a matter of adding up five or 10 risk SNPs into a genetic profile. “The polygene background is a completely different ballgame,” Kendler said. He likes the approach of Purcell and colleagues, who calculated that the top half of all positive SNPs in aggregate explain about 30 percent of the risk of schizophrenia. Although these are extremely early days, Kendler said, that type of aggregation appears to do a much better job at describing risk.

About these findings, Craddock, O’Donovan, and Owen write, “This is obviously a small part of the picture, but it is certainly better than no picture at all.” They add that these results offer “a much more secure foundation than the earlier findings upon which to build follow-up studies,” such as brain imaging, cognitive phenotype (Esslinger et al., 2009), and even candidate gene studies. "We would not regard the first convincing evidence that altered calcium channel function is a primary etiological event in at least some forms of psychosis as a trivial gain in knowledge,” Craddock and colleagues write.

Adding to the list, the researchers cite other insights gleaned from GWAS, including the discovery of an important role in schizophrenia for copy number variations, rare structural changes that affect gene dosage. In addition, GWAS, along with other genetic and epidemiological studies, have generated clear evidence for a high degree of genetic risk-sharing between schizophrenia and bipolar disease, and to a lesser extent, schizophrenia and autism (see SRF related news story; SRF news story; SRF news story). “As clinicians, we do not regard this knowledge as a trivial achievement,” Craddock, O’Donovan, and Owen write.

The reason that GWAS had not turned up more genes, some argued, was simply that the right experiments had not yet been done. The ability to detect common variants depends on sample size: the smaller the effect size, the larger the number of cases and controls needed to get a statistically compelling result. When they published their study fingering ZNF804A, O’Donovan and colleagues estimated that it may take a study of 30,000 patients and an equal number of controls to capture the majority of low-effect risk alleles for psychiatric disorders (Owen et al., 2009). After seeing the 2009 GWAS data, David Collier of the Institute of Psychiatry, London, United Kingdom, upped the number to 100,000 of each to capture statistical significance for many of the hundreds (if not thousands) of common schizophrenia susceptibility alleles of small effect (see SRF related news story).

Bigger studies, more genes
Where is the field in relation to those numbers? In 2007, Patrick Sullivan of the University of North Carolina, Chapel Hill, and other researchers in the United States spearheaded the formation of the Psychiatric GWAS Consortium to pool samples for meta-analysis (for an overview, see Psychiatric GWAS Consortium Coordinating Committee et al., 2009). At last report, the consortium included 165 investigators from 68 institutions in 19 countries who are studying schizophrenia, bipolar disorder, major depressive disorder, autism, and attention-deficit hyperactivity disorder (see Figure 1 for overview). Pablo Gejman of North Shore University Health System and Northwestern University, Evanston, Illinois, presented the first results of the consortium’s schizophrenia genomewide analysis last November at the World Congress of Psychiatric Genetics in San Diego.

In that effort, boosting the sample size (to 12,200 cases and 9,300 controls) increased the yield of statistically significant SNPs to 129, spread over six regions of the genome (see SRF related news story). Gejman stressed that the results are preliminary, but they did replicate associations in the HLA locus on chromosome 6 and identified five other loci. The effect sizes (odds ratios) fall in the 1.1-1.2 range. Gejman said replication is ongoing with a new set of 20,000 samples from the SGENE Consortium, a group of European labs who have pooled their cohorts of schizophrenia subjects for GWAS analysis (Stefansson et al., 2009).

The Psychiatric GWAS Consortium results raised confidence among some researchers that conducting large GWAS is paying off. After seeing the consortium data, Anil Malhotra of Zucker Hillside Hospital in Glen Oaks, New York, told SRF, “I was not an initial believer that this was going to work out that well, but the HLA findings and ZNF804A findings are more convergent than I would have originally imagined. They are replicating in many samples, and the samples are very large, albeit with relatively modest effects sizes.” Malhotra credited a combination of improved technology and larger samples for the progress.

Building on that progress, researchers are continuing to collect more patients. Pamela Sklar, of the Broad Institute of the Massachusetts Institute of Technology and Harvard University, said she is involved in collaborations with a goal of genotyping 10,000 to 20,000 additional patients with schizophrenia or bipolar disorder, as well as matched controls. But what is the ultimate goal? At the World Congress, Sullivan presented a power calculation where a total sample size of 50,000 (that is, 25,000 cases and 25,000 controls) would allow for identification of more than 95 percent of all reported GWAS findings across all biomedical disorders.

However, Sullivan told SRF, “How many GWAS cases is enough is exceptionally complex, and the answer you get depends crucially on your assumptions. There is no unique and definitive answer. Using other medical disorders as a crude guide, sample sizes in excess of 50,000 subjects were needed before sensible and consistent results emerged for common variant associations. Schizophrenia could well be more complicated than other medical disorders. Other scenarios could require 100,000 subjects, particularly if you want to make a serious effort to detect gene-environment interactions.”

Sullivan pointed out that rapid technological development adds an additional level of complexity. “The next generation of GWAS arrays and genome and exome sequencing technologies will allow unprecedented genomic resolution. The technology and pricing are evolving rapidly. If GWAS pricing continues its sharp decline, costs per subject could be far lower than for functional MRI or even for a panel of endocrine biomarkers.”

Others say they have seen enough GWAS to know that more and larger sample sizes are not the answer. “One reason why the GWAS [approach] has not been very informative to date in schizophrenia may be that it is the wrong strategy,” said Straub’s National Institute of Mental Health colleague Daniel Weinberger. He sees two key faults with the approach: it does not take into account the biologic complexity of the disease or the diversity of pathology that falls under the clinical diagnosis of schizophrenia.

“The GWAS analytic model assumes fixed, predictable relationships between genetic risk and illness, but simple relationships between genetic risk and complex pathophysiological mechanisms are unlikely,” Weinberger said. In addition, the approach assumes that the diagnosis of schizophrenia represents a single biological entity, but most clinical diagnoses are more complicated “Other approaches, e.g., family studies, studies of smaller but much better characterized samples, and studies of genetic interactions in these samples, will be necessary to understand the variable genetic architectures of such biologically complex and heterogeneous disorders.”

Malhotra thinks that the GWAS approach has nearly run its course in schizophrenia. “In an interesting twist, it appears that the much anticipated GWAS era will be quite short-lived, with perhaps a lifespan of less than five years in psychiatric genetics,” he said. “Our group and many others are now focusing on next-generation genome sequencing, first aimed at the exome, but then rapidly transitioning to the entire genome. I don't think it will be too long before we start to see the first data from some of these studies.”—Pat McCaffrey.

I wonder whether the relative lack of success in schizophrenia GWAS may be because the origin of schizophrenia may lie not so much in the genetic make-up of people with schizophrenia themselves, but in their prenatal experience, and possibly with the genes of the mother rather than with those of the offspring. Famine, rubella, influenza, herpes (HSV1 and HSV2), and poliovirus infection as well as high fever during pregnancy have all been listed as risk factors for the offspring developing schizophrenia in later life, as have maternal preeclampsia and obstetric complications. (See page at Polygenic Pathways for the many references.)

Maternal resistance to these effects is likely to be gene-dependent. Is it worth considering GWAS in the mothers rather than in the offspring?

The two recent papers in Nature, from the Icelandic group (Stefansson et al., 2008), and the International Schizophrenia Consortium (2008) led by Pamela Sklar, represent a landmark in psychiatric genetics. For the first time two large studies have yielded highly significant consistent results using multiple population samples. Furthermore, they arrived at these results using quite different methods. The Icelandic group used transmission screening and focused on de novo events, using the Illumina platform in both a discovery population and a replication population. By contrast, the ISC study was a large population-based case-control study using the Affymetrix platform, which did not specifically search for de novo events.

Both identified the same two regions on chromosome 1 and chromosome 15, as well as replicating the previously well studied VCFS region on chromosome 22. Thus, we now have three copy number variants which are replicated and consistent across studies. This provides data on rare highly penetrant variants complementary to the family based study of DISC1 (Porteous et al., 2006), in which the chromosomal translocation clearly segregates with disease, but in only one family. In addition, they are in general congruent with three other studies (Walsh et al., 2008; Kirov et al., 2008; Xu et al., 2008) which also demonstrate a role for copy number variation in schizophrenia. These studies together should put to rest many of the arguments about the value of genetics in psychiatry, so that future studies can now begin from a firmer base.

However, these studies also raise at least as many questions as they answer. One is the role of copy number variation in schizophrenia in the general population. The number of cases accounted for by the deletions on chromosome 1 and 15 in the ISC and Icelandic studies is extremely small--on the order of 1% or less. The extent to which copy number variation, including very rare or even private de novo variants, will account for the genetic risk for schizophrenia in the general population is still unknown. The ISC study indicated that there is a higher overall load of copy number variations in schizophrenia, broadly consistent with Walsh et al and Xu et al but backed up by a much larger sample size, allowing the results to achieve high statistical significance. The implications of these findings are still undeveloped,

Another issue is the relationship to the phenotype of schizophrenia in the general population. Many more genotype-phenotype studies will need to be done. It will be important to determine whether there is a higher rate of mental retardation in the schizophrenia in these studies than in other populations.

Another question is the relationship between these copy number variations (and other rare events) and the more common variants accounting for smaller increases in risk, as in the recent O’Donovan et al. (2008) association study in Nature Genetics. It is far too early to know, but there may well be some combination of rare mutations plus risk alleles that account for cases in the general population. This would then be highly reminiscent of Alzheimer’s disease, Parkinson’s disease, and other diseases which have been studied for a longer period of time.

For instance, in Alzheimer’s disease there are rare mutations in APP and presenilin, as well as copy number variation in APP, with duplications causing the accelerated Alzheimer’s disease seen in Down syndrome. These appear to interact with the risk allele in APOE, and possibly other risk alleles, and are part of a pathogenic pathway (Tanzi and Bertram, 2005). Similarly in Parkinson’s disease, rare mutations in α-synuclein, LRRK2 and other genes can be causative of PD, though notably the G2019S mutation in LRRK2 has incomplete penetrance. In addition, duplications or triplications of α-synuclein can cause familial PD, and altered expression due to promoter variants may contribute to risk. By contrast, deletions in Parkin cause an early onset Parkinsonian syndrome (Hardy et al., 2006). Finally, much of PD may be due to genetic risk factors or environmental causes that have not yet been identified. Further studies will likely lead to the elucidation of pathogenic pathways. These diseases can provide a paradigm for the study of schizophrenia and other psychiatric diseases. One difference is that the copy number variations in the neurodegenerative diseases are often increases in copies (as in APP and α-synuclein), consistent with gain of function mechanisms, while the schizophrenia associations were predominantly with deletions, suggesting loss of function mechanisms. The hope is that as genes are identified, they can be linked together in pathways, leading to understanding of the neurobiology of schizophrenia (Ross et al., 2006).

The key unanswered questions, of course, are what genes or other functional domains are deleted at the chromosome 1, 15, and 22 loci, whether the deletions at these loci are sufficient in themselves to cause schizophrenia, and, if sufficient, the extent to which the deletions are penetrant. Both of the current studies identified deletions large enough to include several genes. The hope is that at least a subset of copy number variations (unlike SNP associations identified in schizophrenia to date) may be causative, making the identification of the relevant genes or other functional domains—at least in principle—more feasible.

Another tantalizing observation is that the copy number variations associated with schizophrenia were defined by flanking repeat regions. This raises the question of the extent to which undetected smaller insertions, deletions or other copy number variations related to other repetitive motifs, such as long tandem repeats, may also be associated with schizophrenia. Identification and testing of these loci may prove a fruitful approach to finding additional genetic risk factors for schizophrenia.

Several recent reports have suggested that rare CNVs may be highly penetrant genetic factors in the pathogenesis of schizophrenia, perhaps even singular etiologic events in those cases of schizophrenia who have them. This is potentially of enormous importance, as the definitive identification of such a “causative” factor may be a major step in unraveling the biologic mystery of the condition. I would stress several issues that need to be considered in putting these recent findings into a broader perspective.

It is very difficult to attribute illness to a private CNV, i.e., one found only in a single individual. This point has been potently illustrated by a study of clinically discordant MZ twins who share CNVs (Bruder et al., AJHG, 2008). Inherited CNVs, such as those that made up almost all of the CNVs described in the childhood onset cases of the study by Walsh et al. (Science, 2008), are by definition not highly penetrant (since they are inherited from unaffected parents). The finding by Xu et al. (Nat Gen, 2008) that de novo (i.e., non-inherited) CNVs are much more likely to be associated with cases lacking a family history is provocative but difficult to interpret as no data are given about the size of the families having a family history and those not having such a history. Unless these family samples are of comparable size and obtained by a comparable ascertainment strategy, it is hard to know how conclusive the finding is. Indeed, in the study of Walsh et al., rare CNVs were just as likely to be found in patients with a positive family history. Finally, in contrast to private CNVs, recurrent (but still rare) CNVs, such as those identified on 1q and 15q in the studies of the International Schizophrenia Consortium (Nature, 2008) and Stefansson et al. (Nature, 2008), are strongly implicated as being associated with the diagnosis of schizophrenia and therefore likely involved in the causation of the illnesses in the cases having these CNVs. In all, these new CNV regions, combined with the VCFS region on 22q, suggest that approximately five to 10 patients out of 1,000 who carry the diagnosis of schizophrenia may have a well-defined genetic lesion (i.e., a substantial deletion or duplication).

The overarching question now is how relevant these findings are to the other 99 percent of individuals with this diagnosis who do not have these recurrent CNVs. Before we had the capability to perform high-density DNA hybridization and SNP array analyses, chromosomal anomalies associated with the diagnosis of schizophrenia were identified using cytogenetic techniques. Indeed, VCFS, XXX, XXY (Kleinfelter’s syndrome), and XO (Turner syndrome) have been found with similarly increased frequency in cases with this diagnosis in a number of studies. Now that we have greater resolution to identify smaller structural anomalies, the list of congenital syndromes that increase the possibility that people will manifest symptoms that earn them this diagnosis appears to be growing rapidly. Are we finding causes for the form of schizophrenia that most psychiatrists see in their offices, or are we instead carving out a new set of rare congenital syndromes that share some clinical characteristics, as syphilis was carved out from the diagnosis of schizophrenia at the turn of the twentieth century? Is schizophrenia a primary expression of these anomalies or a secondary manifestation? VCFS is associated with schizophrenia-like phenomena but even more often with mild mental retardation, autism spectrum, and other psychiatric manifestations. The same is true of the aneuploidies that increase the probability of manifesting schizophrenia symptoms. The two new papers in Nature allude to the possibility that epilepsy and intellectual limitations may also be associated with these CNVs. The diagnostic potential of any of these new findings cannot be determined until the full spectrum of their clinical manifestations is clarified.

One of the important insights that might emerge from identification of these new CNV syndromes is the identification of candidate genes that may show association with schizophrenia based on SNPs in these regions. VCFS has been an important source of promising candidate genes with broader clinical relevance (e.g., PRODH, COMT). Stefansson et al. report, however, that none of the 319 SNPs in the CNV regions showed significant association with schizophrenia in quite a large sample of individuals not having deletions in these regions. The Consortium report also presumably has the results of SNP association testing in these regions in their large sample but did not report them. It is very important to explore in greater genetic detail these regions of the genome showing association with the diagnosis of schizophrenia in samples lacking these lesions and to fully characterize the clinical picture of individuals who have them. It is hoped that insights into the pathogenesis of symptoms related to this diagnosis will emerge from these additional studies.

Anyone who has worked in a public state hospital or chronic schizophrenia care facility (where I spent over 20 years) is not surprised to find an occasional patient with a rare congenital or acquired syndrome who expresses symptoms similar to those individuals also diagnosed with schizophrenia who do not have such rare syndromes. Our diagnostic procedures are not precise, and the symptoms that earn someone this diagnosis are not specific. Schizophrenia is not something someone has; it is a diagnosis someone is given. In an earlier comment for SRF on structural variations in the genome related to autism, I suggested that, “From a genetic point of view, autism is a syndrome that can be reached from many directions, along many paths. It is not likely that autism is any more of a discrete disease entity than say, blindness or mental retardation.” These new CNV syndromes manifesting schizophrenia phenomena are probably a reminder that the same is true of what we call schizophrenia.

The results of the family/adoption study by Lichtenstein et al. (2009) and our twin study (Cardno et al., 2002) are remarkably similar. Using a non-hierarchical diagnostic approach, the genetic correlation between schizophrenia and bipolar/mania was 0.60 in the family/twin study and 0.68 in the twin study. The heritability estimates were somewhat lower in the family/adoption (~60 percent) than twin study (~80 percent), but can still be said to be substantial and similar for both disorders.

When we adopted a hierarchical approach, with schizophrenia above mania, we found no monozygotic twin pairs where one twin had schizophrenia and the other had bipolar/mania, but with their considerably larger sample, Lichtenstein et al. (2009) were able to confirm a significantly elevated risk for bipolar disorder in siblings of probands with schizophrenia (RR = 2.7), even when individuals with co-occurrence of both disorders were excluded.

I think there is a potentially interesting link between the family/adoption and twin studies focusing mainly on non-hierarchical diagnoses: Owen and Craddock’s (2009) commentary on the family/adoption study, where they advocate a dimensional approach, and Will Carpenter’s SRF comment regarding the value of domains of psychopathology. The non-hierarchical approach (where individuals can have a diagnosis of both schizophrenia and bipolar disorder during their lifetime) could be viewed as a form of dimensional/domains of psychopathology approach, with schizophrenia and bipolar disorder each having a dimension of liability, and there is now evidence from family, twin, and adoption analyses that these dimensions are correlated, i.e., that there is some overlap in etiological influences.

If schizophrenia and bipolar disorder share some causal factors in common, what might be the implications for the unresolved status of schizoaffective disorder? Our twin study suggested that the genetic (but not environmental) liability to schizoaffective disorder is entirely shared with schizophrenia and mania, defined non-hierarchically (Cardno et al., 2002). If so, and if schizophrenia and bipolar disorder share some genetic susceptibility loci in common, while other loci are not shared, then risk of schizoaffective disorder (or perhaps the bipolar subtype) could be elevated either by the coincidental co-occurrence of non-shared susceptibility loci, or by the occurrence of loci that are common to both disorders.

In this case, any loci that influence risk of schizoaffective disorder (bipolar subtype?) should also increase risk of schizophrenia and/or bipolar disorder, and this model would be refuted if any relatively specific susceptibility loci for schizoaffective disorder were confidently identified.

Some further outstanding issues:

The relative usefulness of: 1) a hierarchical versus non-hierarchical approach to diagnosis of schizoaffective disorder (and if hierarchical, which one?); 2) the various definitions of schizoaffective disorder; 3) schizoaffective disorder per se compared with its subtypes.

Also, to what extent do environmental risk factors for schizophrenia, bipolar disorder, and schizoaffective disorder have different relationships from genetic risk factors?

The three companion papers published in Nature provide important new evidence for a role of the MHC complex and common variation across the genome in risk for schizophrenia. These studies have exploited the availability of comprehensive genotyping technologies, coupled with large cohorts of cases and controls, to identify candidate loci for disease susceptibility.

A notable feature of these papers is the clear willingness of each of the groups to share its data, and to provide overlapping presentations of each others’ results. The combination of datasets permitted the statistical significance of the MHC findings to emerge, thereby increasing confidence in results. The implication that immune processes may interact with genetic risk to influence schizophrenia risk is consistent with several lines of evidence, including our own small GWAS study (Lencz et al., 2007) implicating cytokine receptors in schizophrenia susceptibility.

Perhaps most intriguing is the finding from the International Schizophrenia Consortium demonstrating that a “score” test—combining information from many thousands of common variants—can reliably differentiate patients and controls across multiple psychiatric cohorts. These results indicate that hundreds, if not thousands, of genes of small effect may contribute to schizophrenia risk. Moreover, these same genes were shown to contribute to bipolar risk (but not risk for non-psychiatric disorders such as diabetes).

Much more work remains to be done in psychiatric genetics. While the score test accounted for about 3 percent of the observed case-control variance, statistical modeling suggested that common variation could explain as much as one-third or more of the total risk. Nevertheless, there remains a substantial proportion of genetic “dark matter” (unexplained variance), given the high heritability of a disorder such as schizophrenia. Complementary approaches are needed to further parse the source of the common genetic variance, as well as to identify rare yet highly penetrant mutations. Additional techniques, such as pharmacogenetic studies and endophenotypic research, will help to explicate the functionality and clinical significance of observed risk alleles.

The three Nature papers reporting GWAS results in a large sample of cases of schizophrenia and controls from around Western Europe and the U.S. are decidedly disappointing to those expecting this strategy to yield conclusive evidence of common variants predicting risk for schizophrenia. Why has this extensive and very costly effort not produced more impressive results? There are likely to be many explanations for this, involving the usual refrains about clinical and genetic heterogeneity, diagnostic imprecision, and technical limitations in the SNP chips. But the likely, more fundamental problem in psychiatric genetics involves the biologic complexity of the conditions themselves, which renders them especially poorly suited to the standard GWAS strategy. The GWA analytic model assumes fixed, predictable relationships between genetic risk and illness, but simple relationships between genetic risk and complex pathophysiological mechanisms are unlikely. Many biologic functions show non-linear relationships, and depending on the biologic context, more of a potential pathogenic factor, can make things worse or it can make them better. Studies of complex phenotypes in model systems illustrate that individual gene effects depend upon non-linear interactions with other genes (Toma et al., 2002; Shaoa et al, 2008). Similar observations are beginning to emerge in human disorders, e.g., in risk for cancer (Lo et al., 2008) and depression (Pezawas et al., 2008).

The GWA approach also assumes that diagnosis represents a unitary biological entity, but most clinical diagnoses are syndromal and biologically heterogeneous, and this is especially true in psychiatric disorders. Type 2 diabetes is the clinical expression of changes in multiple physiologic processes, including in pancreatic function, in adipose cell function, as well as in eating behavior. Likewise, hypertension results from abnormalities in many biologic processes (e.g., vascular reactivity, kidney function, CNS control of blood pressure, metabolic factors, sodium regulation), and even a large effect on any specific process within a subset of individuals will seem small when measured in large unrelated samples (Newton-Cheh et al., 2009). In the case of the cognitive and emotional problems associated with psychiatric disorders, the biologic pathways to clinical manifestations are probably much more heterogeneous. While the results of GWAS in disorders like type 2 diabetes and hypertension have been more informative than in the schizophrenia results so far, they, too, have been disappointing, considering all the fanfare about their expectations. But given the pathophysiologic realities of diabetes, hypertension, or psychiatric disorders, how could the effect of any common genetic variant acting on only one of the diverse pathophysiological mechanisms implicated in these disorders be anything other than small when measured in large pathophysiologically heterogeneous populations? Other approaches, e.g., family studies, studies of smaller but much better characterized samples, and studies of genetic interactions in these samples, will be necessary to understand the variable genetic architectures of such biologically complex and heterogeneous disorders.

The synthesis and extraction of the essence of the 3 Nature papers by Heimer and Farley represents science reporting at its best. Completion of the task while the ink was still wet shows that SRF is indeed in good hands. Congratulations on being concise, even-handed, non-judgmental, and challenging under the pressure of time.

Schizophrenia Genetics: Glass Half Full?
While it may be disappointing that the GWAS described above did not identify more genes, they nevertheless represent a landmark in psychiatric genetics and suggest a dual approach for the future: continued large-scale genetic association studies along with alternative genetic approaches leading to the discovery of new genetic etiologies, and more functional investigations to identify pathways of pathogenesis—which may themselves suggest new etiologies.

The consistent identification of an association with the MHC locus reinforces (without proving, as pointed out in the SRF news story) long-standing interest in the involvement of infectious or immune factors in schizophrenia pathogenesis (Yolken and Torrey, 2008). Epidemiologic and neuropathological studies that include patients selected for the presence or absence of immunologic genetic risk variants could potentially clarify etiology; cell and mouse model studies could clarify pathogenesis (Ayhan et al., 2009). It is striking that a major genetic finding in schizophrenia serves to reinforce the concept of environmental risk factors.

The two specific genes identified by the SGENE consortium, NRGN and TCF4, offer intriguing new leads into schizophrenia. This should foster a number of further genetic and neurobiological studies. Deep resequencing (and CNV analysis) can detect rare causative mutations, as exemplified by TCF4 mutations leading to Pitt-Hopkins syndrome. Neurogranin already has clear connections to interesting signaling pathways related to glutamate transmission. A hope is that further studies of both gene products and their interactions will identify pathogenic pathways.

The ISC used common genetic variants “en masse” to generate a “polygene score” from discovery samples of patients; that score was able to predict case status in test populations. The success of this approach provides very strong evidence that a portion of schizophrenia risk status is attributable to common genetic variants acting in concert and that schizophrenia shares genetic factors with bipolar disorder, but not with other diseases. This analysis has multiple practical implications for the direction of research. First, since polygenic factors explain only a portion of the genetic risk, the search for other genetic factors—rare mutations of major effect detectable by deep sequencing, CNVs, variations in tandem repeats (Bruce et al., 2009, in press), and other genomic lesions—takes on new importance. Second, a meaningful integration of polygenic factors in a way that facilitates understanding of schizophrenia pathogenesis and the discovery of therapeutic targets will require identification of relevant pathways. Examination of patient-derived material—such as neurons differentiated from induced pluripotent stem cells taken from well-characterized, patient populations—may be of great value.

The remarkable overlap between the genetic factors of schizophrenia and bipolar disorder suggests the need for further and more inclusive clinical studies—not just of “endophenotypes,” but also of the phenotypes themselves, together, rather than in isolation (Potash and Bienvenu, 2009). For instance, it is only within the past few years that the importance of cognitive dysfunction in schizophrenia has been appreciated. Cognition in bipolar disorder is even less well studied.

How much is really known about the longitudinal course of both disorders? Do genetic factors predict disease outcome? It is only recently that studies have focused intensively on the early course of schizophrenia and its prodrome. Much more is still to be learned, and even less is known about bipolar disorder. In conjunction with this greater understanding of clinical phenotype, it will clearly be necessary to refine the approach to phenotype by establishing the biological framework for these diseases and by establishing biomarkers, such as disruption in white matter (Karlsgodt et al., 2009) or abnormalities in functional networks (Demirci et al., 2009), that cut across current nosological categories. In turn, longitudinal study of clinical, imaging, and functional outcomes of schizophrenia and bipolar disorders should facilitate both focused candidate genetic studies and GWAS of large populations.

References:

Yolken RH, Torrey EF. Are some cases of psychosis caused by microbial agents? A review of the evidence. Mol Psychiatry. 2008 May;13(5):470-9. Abstract

Comment by: David CollierSubmitted 6 July 2009
Posted 6 July 2009 I recommend the Primary Papers

This report is unnecessarily negative, from my point of view. The three studies show not only that GWAS can identify susceptibility alleles for schizophrenia, but that the majority of risk comes from common variants of small effect. These can be found, but as in other complex traits and diseases, such as obesity and height, considerable power is needed, because effect sizes are small, meaning greater samples sizes. This approach works: there are now almost 60 variants influencing height (Hirschhorn et al., 2009; Soranzo et al., 2009; Sovio et al., 2009). Furthermore, the genes identified so far from both traditional mapping, CNV analysis and GWAS, point to two biological pathways, the integrity of the synapse (neurexin 1, neurogranin, etc.) and the wnt/GSK3β signaling pathway (DISC1, TCF4, etc.), which is involved in functions such as neurogenesis in the brain. The identification of disease pathways for schizophrenia has major implications and should not be underestimated. It would be daft to lose nerve now and turn away from GWAS just as they are bearing fruit.

I would like to correct/expand on my comments to Peter Farley, to say that while statistical significance for some markers may be reached sooner, significance for many of the hundreds if not thousands of common schizophrenia susceptibility alleles of small effect might not emerge until samples of 100,000 cases and more than 100,000 controls have been collected. Another point is that organizations such the Wellcome Trust are already assembling case samples for schizophrenia as well as control samples.

Also, I would like to clarify that I believe the remainder of genetic variation, after common variation has been taken into account, will come from some combination of rare CNVs, other rare variants such as SNPs and other types of genetic marker such as variable number of tandem repeats (VNTRs) and of course the much neglected contribution from gene-environment interactions, in which main genetic effects may be obscured.

Some commentators in their reflections take a rather negative view on what has been achieved through the application of GWAS technology to schizophrenia and psychiatric disorders more generally. We strongly disagree with this position. Below, we give examples of a number of statements that can be made about the aetiology of schizophrenia and bipolar disorder that could not be made at high levels of confidence even two years ago that are based upon evidence deriving from the application of GWAS.

1. We know with confidence that the role of rare copy number variants in schizophrenia is not limited to 22q11DS (VCFS) (reviewed recently in O’Donovan et al., 2009). We do not yet know how much of a contribution, but we know the identity of an increasing number of these. Most span multiple genes so it may prove problematic as it has in 22q11DS to identify the relevant molecular mechanisms. However, for one locus, the CNVs are limited to a single gene: Neurexin1 (Kirov et al., 2008; Rujescu et al., 2009). Genetic findings are merely the start of the journey to a deeper biological understanding, but no doubt many neurobiological researchers have already embarked on that journey in respect of neurexin1.

2. Although we have argued in this forum that some of the major pre-GWAS findings in
schizophrenia very likely reflect true susceptibility genes (DTNBP1, NRG1, etc), we now have at
least 4 novel loci where the evidence is more definitive (ZNF804A, MHC, NRGN, TCF4),
(O’Donovan et al., 2008a; ISC, 2009; Shi et al., 2009; Stefansson et al., 2009) and two novel loci (Ferreira et al., 2008) in bipolar disorder (ANK3 and CACNA1C), at least one of which (CACNA1C) additionally confers risk of schizophrenia (Green et al., 2009). This is obviously a small part of the picture, but it is certainly better than no picture at all. These findings also offer a much more secure foundation than the earlier findings upon which to build follow up studies, for example brain imaging, and cognitive phenotypes (Esslinger et al., 2009), and even candidate gene studies. We would not regard the first convincing evidence that altered calcium channel function is a primary aetiological event in at least some forms of psychosis as a trivial gain in knowledge.

3. We can say with confidence that common alleles of small effect are abundant in
schizophrenia, and that they contribute to a substantial part of the population risk (ISC, 2009).
Identifying any one of these at stringent levels of statistical significance may be challenging in
terms of sample sizes. As we have pointed out before, merging multiple datasets may indeed obscure some true associations because of sometimes unpredictable relationships between risk alleles and those assayed indirectly in GWAS studies (Moskvina and O’Donovan, 2007). Nevertheless, that many of the same alleles are overrepresented in multiple independent GWAS datasets from different countries (ISC, 2009) means that larger samples offer the prospect of identifying many more of these. This is not to say that large samples are the only approach; genetic heterogeneity may well underpin some aspects of clinical heterogeneity (Craddock et al., 2009a). However, with the exception of individual large pedigrees, it is not yet evident which type of clinical sample one should base a small scale study on. It should also be self-evident that the analysis of multiple samples, each with a different phenotypic structure, will pose major problems in respect of multiple testing and subsequent replication. Moreover, ascertaining special samples that represent putative subtypes of the clinical (and endophenotypic) spectrum of psychosis will first require large samples to be carefully assessed and the relevant subjects extracted. Subsequently, downstream, evaluation of specific genotype-phenotype relationships will require the remainder of the clinical population to be genotyped in a suitably powered way to show that those effects are specific to some clinical features of the disorder. Increasingly, it is ascertainment and assessment that dominate the cost of GWAS studies so it is not clear this approach will achieve any economies. We must also remember that after a GWAS study, there remains the opportunity to look in a controlled manner for relatively specific associations in the context of the heterogeneous clinical picture. For example we are aware of a number of papers in development that will exploit the sorts of multi-locus tests reported by the ISC to refine diagnostics, and no doubt many other applications of this will emerge in the next year or so.

Critics should bear in mind that the GWAS data are not just there for the ‘headline’ genome-wide findings, but that the data will be available to mine for years to come. The findings reported to date are based on only the simplest analyses.

4. If it were the case that the thousands of SNPs of small effect were randomly distributed across biological systems, none being of more relevance to pathophysiology than another, identifying them would probably be a pointless endeavour. However, there is no reason to believe this will be the case. We have recently shown that in bipolar disorder, the GWAS signals are enriched in particular biological pathways (Holmans et al., 2009) and we also published strong evidence for a relatively selective involvement of the GABAergic system in schizoaffective disorder (Craddock et al., 2009b). We are aware of an as-yet unpublished independent sample with similar findings. We would not regard the first convincing evidence that altered GABA function is a primary aetiological event in at least some forms of psychosis as a trivial gain in knowledge.

Incidentally it is a common misconception that the identification of risk alleles of small effect necessarily confers no useful insights into pathogenesis and possible drug targets. For example, common alleles in PPARG and KCNJ11 have been robustly shown to confer risk to Type 2 diabetes (T2D) but with odds ratios in the region of only 1.14 (of similar magnitude to those revealed by GWAS of schizophrenia). PPARG encodes the target for the thiazolidinedione class of drugs used to treat T2D. KCNJ11 encodes part of the target for another class of diabetes drug, the sulphonylureas (Prokopenko et al., 2008). Moreover, studies of novel associated variants identified in T2D GWAS in healthy, non-diabetic, populations have demonstrated that for most, the primary effect on T2D susceptibility is mediated through deleterious effects on insulin secretion, rather than insulin action (Prokopenko et al., 2008). Further examples of insights into the biology of common diseases coming from the identification of loci of small effect are the implication of the complement system in age-related macular degeneration and autophagy in Crohn’s disease (Hirschhorn, 2009). Already, efforts are under way to translate the new recognition of the role of autophagy in Crohn’s disease into new therapeutic leads (Hirschhorn, 2009). Of course many of the loci identified in GWAS implicate genes whose functions are as yet largely or completely unknown, and determining those functions is a prerequisite of translating those findings. Nevertheless, we believe that the greatest benefits that will accrue from the continued discovery of risk loci through GWAS will come from the assembly of that information into novel disease pathways leading to novel therapeutic targets.

5. We can say with confidence that bipolar disorder and schizophrenia substantially overlap, at least in terms of polygenic risk (ISC, 2009). As clinicians, we do not regard that knowledge as a trivial achievement.

6. We can say with confidence from studies of CNVs that schizophrenia and autism share at least some risk factors in common (O’Donovan et al., 2009). We believe that is also an important insight.

The above are major achievements in what we expect to be a long but accelerating process of picking apart the origins of schizophrenia and other psychotic disorders. We do not think that any other research discipline in psychiatry has done more to advance that knowledge in the past 100 years.

Like that other common familial diseases, the genetics of schizophrenia and bipolar disorder is a “mixed economy” of common alleles of small effect and rare alleles of large and small effects, including CNVs. Those who are concerned at the cost of collecting large samples for GWAS studies must bear in mind that the robust identification of both types of mutation will require similarly large samples; we will just have to get used to that fact if we want to make progress. Collecting samples like this may be expensive, but as clinicians, we know those costs are trivial compared with the human and economic costs of psychotic disorders.

The question of phenotype definition is one which we have repeatedly addressed (Craddock et al., 2009a). Unquestionably, if we knew the true pathophysiological basis of these disorders, we could do better. The fact is that we don’t. Given that, it must be extremely encouraging that despite the problems, risk loci can be robustly identified by GWAS using samples defined by current diagnostic criteria. Moreover, armed with GWAS data in these heterogeneous populations, additional risk genes can be identified through strategies aimed at refining the phenotype that are not constrained by the current dichotomous view of the functional psychoses. We have shown at least one way in which this might be achieved without imposing a further burden of multiple testing (Craddock et al., 2009b), and have little doubt that others will emerge. We agree that approaches to phenotyping that more directly index underlying pathophysiology are highly appealing, and will ultimately be necessary for understanding the mechanistic relationships between gene and disorder. However, the two cardinal assumptions upon which the use of endophenotypes is predicated for gene discovery are questionable. First, there is little good evidence that putative endophenotypes are substantially simpler genetically than “exophenotypes” (Flint and Munafo, 2007). Second, there is rarely good evidence that the current crop of popular putative endophenotypes lie on the disease pathway, indeed there seems to be substantial pleiotropy in the genetics of complex traits, psychosis included (Prokopenko et al., 2008; O’Donovan et al., 2008b).

Finally, we reiterate that while only small parts of the heritability of any complex disorder have been accounted for, large-scale genetic approaches have been extremely successful in studies of non-psychiatric diseases (Manolio et al., 2008) and have led to substantial advances in our understanding of pathogenesis, even for diseases like Crohn’s disease where there was already prior knowledge of pathogenesis from other research methods (Mathew, 2008).

Psychiatry starts from a situation in which there is no robust prior knowledge of pathogenesis for the major phenotypes. Recent findings suggest that mental illness may be the medical field that will actually benefit most over the coming years from application of these powerful molecular genetic technologies.

Moskvina V and O'Donovan MC. (2007) Detailed analysis of the relative power of direct and indirect association studies and the implications for their interpretation. Human Heredity 64(1):63-73. Abstract

GWAS Results: Is the Glass Half Full or 95 Percent Empty?
The publication of the latest schizophrenia GWAS papers represents the culmination of a tremendous amount of work and unprecedented cooperation among a large number of researchers, for which they should be applauded. In addition to the hope of finding new “schizophrenia genes,” GWAS have been described by some of the researchers involved as, more fundamentally, a stern test of the common variants hypothesis. Based on the meagre haul of common variants dredged up by these three studies and their forerunners, this hypothesis should clearly now be resoundingly rejected—at least in the form that suggests that there is a large, but not enormous, number of such variants, which individually have modest, but not minuscule, effects. There are no common variants of even modest effect.

However, Purcell and colleagues now argue for a model involving vast numbers of variants, each of almost negligible effect alone. The authors show that an aggregate score derived from the top 10-50 percent of a set of 74,000 SNPs from the association results in a discovery sample can predict up to 3 percent of the variance in a target group. Simply put, a set of putative “risk alleles” can be defined in one sample and shown, collectively, to be very slightly (though highly significantly in a statistical sense) enriched in the test sample, compared to controls. This is consistent across several different schizophrenia samples and even in two bipolar disorder samples. The authors go on to perform a set of control analyses that suggest that these results are not due to obvious population stratification or genotype rate effects (although effects at this level are obviously prone to cryptic artifacts).

If taken at face value, what do these results mean? They imply some kind of polygenic effect on risk, but of what magnitude? The answer to that depends on the interpretation of the additional simulations performed by the authors. They argue that the risk allele set inevitably contains very many false positives, which dilute the predictive power of the real positives hidden among them. Based on this logic, if we only knew which were the real variants to look at, then the variance explained in the target group would be much greater.

To try and estimate the magnitude of the effect of the polygenic load of “true risk” alleles, the authors conducted a series of simulations, varying parameters such as allele frequencies, genotype relative risks, and linkage disequilibrium with genotyped markers. They claim that these analyses converge on a set of models that recapitulate the observed data and that all converge on a true level of variance explained of around 34 percent, demonstrating a large polygenic component to the genetic architecture of schizophrenia.

These simulations adopt a level of statistical abstraction that should induce a healthy level of skepticism or at least reserved judgment on their findings. Most fundamentally, they rely explicitly for their calculations of the true variance on a liability-threshold model of the genetic architecture of schizophrenia. In effect, the “test” of the model incorporates the assumption that the model is correct.

The liability-threshold model is an elegant statistical abstraction that allows the application of the powerful statistics of normal distributions. Unfortunately, it suffers from the fact that it has no support whatsoever and makes no biological sense. First, there is no justification for assuming a normal distribution of “underlying liability,” whatever that term is taken to mean. Second, as usual when it is invoked, the nature of this putative threshold is not explained, though it surreptitiously implies some form of very strong epistasis (to explain the difference in risk between someone with x liability alleles and someone else with x+1 alleles). If this model is not correct, then these simulations are fatally flawed.

Even if the model were correct, the calculations are far from convincing. From a starting set of 560 models, the authors arrive at seven that are consistent with the observed degree of prediction in the target samples. According to the authors, the fact that these seven models converge on a small range of values for the underlying variance explained by the markers is evidence that this value (around 34 percent) represents the true situation. What is not highlighted is the fact that the values for the actual additive genetic variance (taking into account incomplete linkage disequilibrium between the markers and the assumed causal variants) across these models ranges from 34 percent to 98 percent and that the number of SNPs assumed to be having an effect ranges from 4,625 to 74,062. This extreme variation in the derived models hardly inspires confidence in the authors’ claim that their data “strongly support a polygenic basis to schizophrenia that (1) involves common SNPs, [and] (2) explains at least one-third of the total variation in liability.” (italics added)

From a more theoretical perspective, it should be noted that a polygenic model involving thousands of common variants of tiny effect cannot explain and will not contribute to the observed heightened familial relative risks. Such risk can only be explained by a variant of large effect or by an oligogenic model involving at most two to three loci (Bodmer and Bonilla, 2008; Hemminki et al., 2008; Mitchell and Porteous, in preparation). It seems much more likely that the observed predictive power in the target samples represents a modest “genetic background” effect, which could influence the penetrance and expressivity of rare, causal mutations. However, if the point of GWAS is to find genetic variants that are predictive of risk or that shed light on the pathogenic mechanisms of the disease, then clearly, even if such variants can be found by massively increasing sample sizes, their identification alone would not achieve or even appreciably contribute to either of these goals.

Thumbs up or down on schizophrenia GWAS?
The triumvirate of schizophrenia GWAS studies just published in Nature gives cause for thought, and bears close scrutiny and reflection. To my reading, these three studies individually and collectively lead to an unambiguous conclusion—there is a lot of genetic heterogeneity and not one individual variant of common ancient origin accounts for a significant fraction of the genetic liability. To put it another way, there is no ApoE equivalent for schizophrenia. Strong past claims for ZNF804A and others look to have fallen by the statistical wayside. Putting the results of all three studies together does appear to provide support for a long known, pre-GWAS association with HLA, but otherwise it is hard to give a strong "thumbs up" to any specific result, not least because of the lack of replication between studies. The results are nevertheless important because the common disease, common variant model, on which GWAS are based and the associated cost justified, is strongly rejected as the main contributor to the genetic variance.

The ISC proposes a highly polygenic model with thousands of variants having an additive effect on both schizophrenia and bipolar disorder. I find no fault with their evidence, but its meaning and interpretation remains speculative. Simply consider the fact that SNPs carefully selected to tag half the genome account for about a third of the variance. It follows that the lion's share has gone undetected and will, by design and limitation, remain impervious to the GWAS strategy.

Part of the GWAS appeal is that the genotyping is technically facile and it is easier to collect lots of cases than it is families, but for as long as a diagnosis of schizophrenia or BP depends upon DSM-IV or ICD-10 classification, then diagnostic uncertainty will have a major effect on true power and validity of statistical association, both positive or negative. Indeed, the longstanding evidence from variable psychopathology amongst related individuals, the recent epidemiology evidence for shared genetic risk for schizophrenia and BP, and the further evidence supporting this from the ISC GWAS, all suggest that we should be returning more to family-based studies as a strategy to reduce genetic heterogeneity and find explanatory genetic variants. Plainly, adding ever more uncertainty through ever larger sample sizes is neither smart nor efficient.

I would certainly give the thumbs up to the full and unencumbered release of the primary data to the community as a whole, as this could usefully recoup some of the GWAS investment. It would facilitate a range of statistical and bioinformatics analyses and, who knows, there might be hidden nuggets of statistical support for independent genetic and biological studies.

The main question that arises from the three large genomewide association studies published in Nature is, What should we do next?

One important way forward would be to follow up the association findings in the MHC region. We need to understand the biological mechanism underlying this association. If the association signal is indeed related to infectious diseases, this line of inquiry may lead to the highly desired development of a treatment that might prevent the diseases in some cases.

One possible explanation for the association between schizophrenia and the MHC region (6p22.1) is that infection during pregnancy leads to disturbances of fetal brain development and increases the risk of schizophrenia later in life. A possible test for the theory of infectious diseases as risk factors for schizophrenia would be to study the associated SNPs in 6p22.1 in fathers and mothers of subjects with schizophrenia relative to parents of control subjects. If the 6p22.11 region is related to the tendency of mothers to be infected by viruses during pregnancy, we would expect the SNPs in 6p22.1 to be most strongly associated with being a mother to a subject with schizophrenia.

Another broader and more complicated part of the question is: What would be the best strategy for continued study of the genetic causes of schizophrenia? There shouldn’t be only one way to proceed. Testing samples that are 10 times larger seems likely to lead to the identification of more genes, but with much smaller effect size. Testing the association of common variants with schizophrenia is unlikely to lead to the development of genetic diagnostic tools in the near future. If we want to understand the biology of the disease, it might be easier to concentrate our efforts on the identification of rare inherited and non-inherited variants with large effect on the phenotype. Such rare variants are easier to model in animals (relative to common variants with very small functional effect) and might even account for a larger proportion of cases.

The three companion papers in this week’s issue of Nature, in our view, support the case for investigating interaction between susceptibility genes and infectious exposures in schizophrenia. We and others have argued previously that genetic studies conducted in isolation from environmental factors, and studies of environmental influences in the absence of genetic data, are necessarily limited. Maternal influenza, rubella, toxoplasmosis, herpes simplex virus, and other infections have each been associated with an increased risk of schizophrenia, with effect sizes ranging from twofold to over fivefold. While these epidemiologic findings clearly require replication in independent cohorts, two new developments provide further support for the hypothesis. First, a growing number of animal studies of maternal immune activation have documented behavioral and brain phenotypes in offspring that are analogous to findings from clinical research in schizophrenia, and these findings are mediated in large part by specific cytokines (Meyer et al., 2009; Patterson, 2008). Second, recent evidence indicates that maternal infection is also related to deficits in executive and other cognitive functions and neuropathology thought to arise from disruptions in brain development (Brown et al., 2009a; Brown et al., 2009b).

While the MHC region contains genes not involved in the immune system, in light of the epidemiologic findings on maternal infection, it is intriguing to see that this region is once more implicated in genetic studies of schizophrenia as the importance of this region in the response to infectious insults cannot be ignored. Although it is heartening to see that the potential implications of these findings for infectious etiologies were raised in the article from the SGENE plus group, an analysis of the frequency of SNPs by season of birth falls well short of the type of research that will yield definitive findings on the relationships between susceptibility genes and infectious insults. Hence, we advocate a strategy aimed at large scale genetic analyses of schizophrenia cases using birth cohorts with infectious exposures documented from prospectively collected biological samples from the prenatal period. If the schizophrenia-related pathogenic mechanisms by which MHC-related genetic variants operate involve interactions with prenatal infection, we would expect that studies of gene-infection interaction will yield larger effect sizes than those found in these new papers. The evidence from these papers and the epidemiologic literature should also facilitate narrowing of the number of candidate genes to be tested for interactions with infectious insults, thereby ameliorating the potential for type I error due to multiple comparisons.

Two hundred years after Darwin’s birth and 150 years after the publication of On the Origin of Species, these three papers in Nature show the important role of natural selection in shaping the genetic architecture of schizophrenia susceptibility. If we compare the GWAS results for schizophrenia with those obtained for other diseases, it seems that there are less common risk alleles and/or lower effect sizes in schizophrenia than in many other complex diseases (see, for instance, the online catalog of published GWAS at NHGRI). This fact strongly suggests that negative selection limits the spread of susceptibility alleles, as expected due to the decreased fertility of schizophrenic patients.

Interestingly, the MHC region may be an exception. This region represents a classical example of balancing selection, i.e., the presence of several variants at a locus maintained in a population by positive natural selection (Hughes and Nei, 1988). In the case of the MHC, this balancing selection seems to be related to pathogen resistance or MHC-dependent mating choice. Therefore, the presence of common schizophrenia susceptibility alleles at this locus might be explained by antagonistic pleiotropic effects of alleles maintained by natural selection.

If negative selection limits the spread of schizophrenia risk alleles, most of the genetic susceptibility to schizophrenia is likely due to rare variants. Resequencing technologies will allow the identification of many of these variants in the near future. In the meantime, it would be interesting to focus our attention on non-synonymous SNPs at low frequency. Based on human-chimpanzee comparisons and human sequencing data, Kryukov et al. (2008) have shown that a large fraction of de novo missense mutations are mildly deleterious (i.e., they are subject to weak negative selection) and therefore they can still reach detectable frequencies. Assuming that most of these mildly deleterious alleles may be detrimental (i.e., they confer risk for disease) the authors conclude that numerous rare functional SNPs may be major contributors to susceptibility to common diseases Kryukov et al., 2008. Similar conclusions were obtained by the analysis of the relative frequency distribution of non-synonymous SNPs depending on their probability to alter protein function (Barreiro et al., 2008; Gorlov et al., 2008). As shown by Evans et al. (2008), genomewide scans of non-synonymous SNPs might complement GWAS, being able to identify rare non-synonymous variants of intermediate penetrance not detectable by current GWAS panels.