Genome-wide association studies

A number of important genome-wide association studies (GWASs) have come to my attention in the last few weeks.And I anticipate that the current steady stream of them will very soon become a roaring river.These are studies that sort through genomes of large numbers of people looking for systematic gene variation differences, say comparing genomes of people affected with a disease with genomes of people not so-affected.“Association analysis:A method of genetic analysis that compares the frequency of alleles between affected and unaffected individuals; a given allele is considered to be associated with the disease if that allele occurs at a significantly higher frequency among affected individuals(ref).”Association studies may also compare the genomes of specific samples of people (such as aged AshkenaziJews living in Brooklyn or older women from Tanegashima island) or the genomes of disease tissues (such as from specific kinds of cancers) against the general human genome to determine possibly causal correlations between genomic variations and effects, such as extended longevity or the presence of a disease.

I have already created a number of blog entries reporting on GWAS studies.My focus here is on the general characteristics of GWAS studies, why they are important, why we will see more and more on them, and where they will lead us.

GWAS studies and why they are important

A good example is discussed in the recent blog post New telomerase finding only a small-medium sized deal. The publicationCommon variants near TERC are associated with mean telomere length relates: “We conducted genome-wide association analyses of mean leukocyte telomere length in 2,917 individuals, with follow-up replication in 9,492 individuals. We identified an association with telomere length on 3q26 (rs12696304, combined P = 3.72 x 10(-14)) at a locus that includes TERC, which encodes the telomerase RNA component.”I go on in that post to comment that the study says that people who possessed the gene variation (minor allele of rs12696304) had shorter telomere lengths, equivalent to 3.6 years of aging. People who had two copies of the variation had telomere lengths expected for people 7.2 years older. The implication is that people with the gene defect age faster.“ The study required massive efforts to gather the data – mean leukocyte lengths of 2,917 plus 9,492 individuals.Then it required a herculean data processing and pattern-recognition process to end up with a correlation-based association of shorter telomere lengths with a minor allele of rs12696304 instead of millions of other possibilities.And, finally, from this association an inference was drawn that people who have the allele will generally age faster and die sooner.

The 2008 review study Genome-wide association studies for complex traits: consensus, uncertainty and challenges describes progress as of two years ago and highlights problems as seen at that time “The first wave of large-scale, high-density genome-wide association (GWA) studies has improved our understanding of the genetic basis of many complex traits. For several diseases, including type 1 and type 2 diabetes, inflammatory bowel disease, prostate cancer and breast cancer, there has been rapid expansion in the numbers of loci implicated in predisposition. For others, such as asthma, coronary heart disease and atrial fibrillation, fewer novel loci have been found, although opportunities for mechanistic insights are equally promising. Several common variants influencing important continuous traits, such as lipids, height and fat mass, have also been found. —These findings are providing valuable clues to the allelic architecture of complex traits in general. At the same time, many methodological and technical issues that are relevant to the successful prosecution of largescale association studies have been addressed. — However, despite understandable celebration of these achievements, sober reflection reveals many challenges ahead. — Much work remains to obtain a complete inventory of the variants at each locus that contribute to disease risk and to define the molecular mechanisms through which these variants operate. The ultimate objectives — full descriptions of the susceptibility architecture of major biomedical traits and translation of the findings into clinical practice — remain distant.”Much distance still remains but since this was written there has been a significant and steady acceleration in the rate of publication of genome-wide association studies

Because of their importance, the National Human Genome Institute has created a Catalog of Published Genome-Wide Association Studies.“The curated, searchable and publically accessible database contains information on over 350 publication, linking around 1,640 single nucleotide polymorphisms (SNPs) to more than 80 different diseases and traits. — This catalogue allows some of the trends and genomic characteristics of trait or disease associated SNPs to be analysed across multiple different publications [Hindorff LA et al. (2009) PNAS doi/10.1073], leading to a number of important insights(ref).”

What is included in the catalog is selective “The genome-wide association study (GWAS) publications listed here include only those attempting to assay at least 100,000 single nucleotide polymorphisms (SNPs) in the initial stage. Publications are organized from most to least recent date of publication, indexing from online publication if available. Studies focusing only on candidate genes are excluded from this catalog. — SNP-trait associations listed here are limited to those with p-values < 1.0 x 10-5 (see full methods for additional details).”

One implication of the studies in the catalog is the critical importance of epigenetic mechanisms of gene regulation.As stated in a phg Foundationarticle on the catalog “ — the vast majority of genetic variation associated with complex diseases or traits lies outside of the coding regions of the genome – 45% of SNPs are located inside introns, which are located within genes but are spliced out prior to translation into functional proteins, and 43% of SNPs lie between genes. Whilst in some ways this result is unsurprising, as coding genes only account for around 1% of the genome, it is still unexpected and suggests that regulation of gene expression plays an important role in determining common traits and diseases.”The catalog shows other interesting patterns.“Interestingly, amongst those associations that have been attributed to specific genes (which are located near the trait or disease associated SNPs), 18 regions have been linked with multiple different diseases, suggesting a common underlying aetiological pathway. For example, the major histocompatibility complex (MHC), which plays an important role in the immune system, has been implicated in 10 different conditions ranging from autoimmune disorders to lung cancer. Discoveries of a shared underlying genetic basis for different diseases are likely to become increasingly common as more gene-disease associations are uncovered, and raise a complex set of ethical implications with regards to genetic testing(ref).

Association studies have provided the basis for construction of specific genomic-association databases like RegPrecise: a database of curated genomic inferences of transcriptional regulatory interactions in prokaryotes.“The RegPrecise database — was developed for capturing, visualization and analysis of predicted transcription factor regulons in prokaryotes that were reconstructed and manually curated by utilizing the comparative genomic approach. A significant number of high-quality inferences of transcriptional regulatory interactions have been already accumulated for diverse taxonomic groups of bacteria.”

Along with the development of databases have been the development of research and computational tools.For example, the publication Platform for accurate semi-automatic inference of regulons by comparative genomics approachprovides an approach to “providing effective tools to enable high-quality reconstruction of transcriptional regulatory networks (TRN).” – “We implemented a web-based computational platform for fast and accurate semi-automatic inference of regulons in well-populated groups of closely-related bacterial genomes.”

Why more and more GWAS studies?

There are likely to be more and more GWAS studies and they are likely to involve larger and larger population samples.Factors driving this growth are 1.Knowledge breeds a quest for more knowledge and studies can be built on earlier studies; for example the genome of gliablastoma cells is known(ref) facilitating GWA studies related to gliablastoma, 2.Underlying cost of genome sequencing continues to plummet making GWA studies ever-more economically feasible (see this recent blog post), 3. As more and more-studies are added to the catalog and complete databases like RegPrecise are built up, new studies can be partially based on them, 4. New and ever-better software tools are becoming available for identifying associations(ref)(ref), and 5 ever more-powerful and cheaper computers are allowing association computations which were virtually impossible a few years back when the human genome was first being sequenced.In other words, the factors which empower Giuliano’s Law are at work here and the rate of change is exponential, not linear.

Implications of GWAS studies

Going back to my blog postMy personal longevity – the race between death-stalker and life-prolonger, watch out Death Stalker.The men and women doing genome-wide association studies are ultimately working for Life Prolonger, not for you.They are seriously on your case and what they are turning up is going to help convince you to give us lots more years in our life spans.

About Vince Giuliano

Being a follower, connoisseur, and interpreter of longevity research is my latest career. I have been at this part-time for well over a decade, and in 2007 this became my mainline activity. In earlier reincarnations of my career. I was founding dean of a graduate school and a university professor at the State University of New York, a senior consultant working in a variety of fields at Arthur D. Little, Inc., Chief Scientist and C00 of Mirror Systems, a software company, and an international Internet consultant. I got off the ground with one of the earliest PhD's from Harvard in a field later to become known as computer science. Because there was no academic field of computer science at the time, to get through I had to qualify myself in hard sciences, so my studies focused heavily on quantum physics. In various ways I contributed to the Computer Revolution starting in the 1950s and the Internet Revolution starting in the late 1980s. I am now engaged in doing the same for The Longevity Revolution. I have published something like 200 books and papers as well as over 430 substantive.entries in this blog, and have enjoyed various periods of notoriety. If you do a Google search on Vincent E. Giuliano, most if not all of the entries on the first few pages that come up will be ones relating to me. I have a general writings site at www.vincegiuliano.com and an extensive site of my art at www.giulianoart.com.
Please note that I have recently changed my mailbox to vegiuliano@agingsciences.com.

II don’t think enough is known about CNVs yet to give a definitive answer but my guess is Yes. Obviously a lot of gene deletions can result in shorter lifespans. We know that the average lifespan of mice can be increased 50% if I remember correctly by providing extra copies of the HERT telomerase gene and the P53 anti-tumor gene. I suspect this could also occur rarely as a CNV.