tag:blogger.com,1999:blog-104430502015-02-24T10:41:43.530-05:00Epistasis BlogFrom the Computational Genetics Laboratory at Dartmouth Medical School (www.epistasis.org)Jason H. Moore, Ph.D.http://www.blogger.com/profile/07692025646640606430noreply@blogger.comBlogger510125tag:blogger.com,1999:blog-10443050.post-50198833209622546512015-02-24T10:06:00.001-05:002015-02-24T10:41:43.552-05:00Great feature selection method for detecting epistasis using random forests<span style="font-family: inherit;">This is a really neat approach that is worth exploring for using machine learning methods such as random forests for the detection and modeling of statistical epistasis in genetic studies of human health.</span><br /><span style="font-family: inherit;"><br />Holzinger ER, Szymczak S, Dasgupta A, Malley J, Li Q, Bailey-Wilson JE. Variable selection method for the identification of epistatic models. Pac Symp Biocomput. 2015;20:195-206. [<a href="http://www.worldscientific.com/doi/pdf/10.1142/9789814644730_0020" target="_blank">PDF</a>]</span><br /><span style="font-family: inherit;"><br />Abstract</span><br /><span style="font-family: inherit;"><br /></span><span style="font-family: inherit;"><span style="background-color: white; line-height: 17.9998016357422px; white-space: normal;">Standard analysis methods for genome wide association studies (GWAS) are not robust to complex disease models, such as interactions between variables with small main effects. These types of effects likely contribute to the heritability of complex human traits. Machine learning methods that are capable of identifying interactions, such as Random Forests (RF), are an alternative analysis approach. One caveat to RF is that there is no standardized method of selecting variables so that false positives are reduced while retaining adequate power. To this end, we have developed a novel variable selection method called relative recurrency variable importance metric (</span><span class="highlight" style="background-color: white; line-height: 17.9998016357422px; white-space: normal;">r2VIM</span><span style="background-color: white; line-height: 17.9998016357422px; white-space: normal;">). This method incorporates recurrency and variance estimation to assist in optimal threshold selection. For this study, we specifically address how this method performs in data with almost completely epistatic effects (i.e. no marginal effects). Our results show that with appropriate parameter settings,&nbsp;</span><span class="highlight" style="background-color: white; line-height: 17.9998016357422px; white-space: normal;">r2VIM</span><span style="background-color: white; line-height: 17.9998016357422px; white-space: normal;">&nbsp;can identify interaction effects when the marginal effects are virtually nonexistent. It also outperforms logistic regression, which has essentially no power under this type of model when the number of potential features (genetic variants) is large. (All Supplementary Data can be found here: http://research.nhgri.nih.gov/manuscripts/Bailey-Wilson/</span><span class="highlight" style="background-color: white; line-height: 17.9998016357422px; white-space: normal;">r2VIM</span><span style="background-color: white; line-height: 17.9998016357422px; white-space: normal;">_epi/).</span></span>Jason H. Moore, Ph.D.http://www.blogger.com/profile/07692025646640606430noreply@blogger.com0tag:blogger.com,1999:blog-10443050.post-41263315623419262082015-02-20T20:40:00.004-05:002015-02-24T10:06:42.930-05:00Is Big Data a 21st Century Maginot Line?We have just published this open access editorial <a href="http://www.biodatamining.org/content/pdf/s13040-015-0037-5.pdf" target="_blank">BioData Mining</a> on whether 'big data' is a 21st century <a href="http://en.wikipedia.org/wiki/Maginot_Line" target="_blank">Maginot line</a>. This is relevant because we as scientists sometimes let the data define the research questions rather than the other way around. As the size and complexity of data grows we may find ourselves asking simpler and simpler questions only some of which are important to advancing our understanding of human health and disease.<br /><br />Huang X, Jennings SF, Bruce B, Buchan A, Cai L, Chen P, Cramer CL, Guan W, Hilgert UK, Jiang H, Li Z, McClure G, McMullen DF, Nanduri B, Perkins A, Rekepalli B, Salem S, Specker J, Walker K, Wunsch D, Xiong D, Zhang S, Zhang Y, Zhao Z, Moore JH. Big data - a 21st century science Maginot Line? No-boundary thinking: shifting from the big data paradigm. BioData Min. 2015 Feb 6;8:7. [<a href="http://www.biodatamining.org/content/pdf/s13040-015-0037-5.pdf" target="_blank">PDF</a>]<br /><br />See also our previous related essay on 'no boundary thinking' in bioinformatics.<br /><br />Huang X, Bruce B, Buchan A, Congdon CB, Cramer CL, Jennings SF, Jiang H, Li Z, McClure G, McMullen R, Moore JH, Nanduri B, Peckham J, Perkins A, Polson SW, Rekepalli B, Salem S, Specker J, Wunsch D, Xiong D, Zhang S, Zhao Z. No-boundary thinking in bioinformatics research. BioData Min. 2013 Nov 6;6(1):19. [<a href="http://www.biodatamining.org/content/pdf/1756-0381-6-19.pdf" target="_blank">PDF</a>]Jason H. Moore, Ph.D.http://www.blogger.com/profile/07692025646640606430noreply@blogger.com0tag:blogger.com,1999:blog-10443050.post-48689123178556604102015-01-31T10:52:00.000-05:002015-01-31T10:52:31.649-05:00Epistasis: Methods and Protocols<br /><div class="separator" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em; text-align: center;"><a href="http://www.springer.com/biomed/human+genetics/book/978-1-4939-2154-6" target="_blank"><img border="0" src="http://4.bp.blogspot.com/-PbRp1ZDSxjg/VMz6BEk0CUI/AAAAAAAAAIA/09r2cDD5pIE/s1600/9781493921546.jpg" height="320" width="225" /></a></div><br />Our new edited volume on <a href="http://www.springer.com/biomed/human+genetics/book/978-1-4939-2154-6" target="_blank">epistasis</a>.<br /><br />This volume presents a valuable and readily reproducible collection of established and emerging techniques on modern genetic analyses. Chapters focus on statistical or data mining analyses, genetic architecture, the burden of multiple testing, genetic variance, measuring epistasis, multifactor dimensionality reduction, and ReliefF. Written in the highly successful Methods in Molecular Biology series format, chapters include introductions to their respective topics, lists of the necessary materials and reagents, step-by-step, readily reproducible laboratory protocols, and key tips on troubleshooting and avoiding known pitfalls.<br /><br /><br />Jason H. Moore, Ph.D.http://www.blogger.com/profile/07692025646640606430noreply@blogger.com0tag:blogger.com,1999:blog-10443050.post-11772332164778501582015-01-03T10:34:00.000-05:002015-01-31T10:37:58.829-05:00Heuristic identification of biological architectures for simulating complex hierarchical genetic interactionsMoore JH, Amos R, Kiralis J, Andrews PC. Heuristic identification of biological architectures for simulating complex hierarchical genetic interactions. Genet Epidemiol. 2015 Jan;39(1):25-34. [<a href="http://www.ncbi.nlm.nih.gov/pubmed/25395175" target="_blank">PubMed</a>]<br /><br />Abstract<br /><br />Simulation plays an essential role in the development of new computational and statistical methods for the genetic analysis of complex traits. Most simulations start with a statistical model using methods such as linear or logistic regression that specify the relationship between genotype and phenotype. This is appealing due to its simplicity and because these statistical methods are commonly used in genetic analysis. It is our working hypothesis that simulations need to move beyond simple statistical models to more realistically represent the biological complexity of genetic architecture. The goal of the present study was to develop a prototype genotype-phenotype simulation method and software that are capable of simulating complex genetic effects within the context of a hierarchical biology-based framework. Specifically, our goal is to simulate multilocus epistasis or gene-gene interaction where the genetic variants are organized within the framework of one or more genes, their regulatory regions and other regulatory loci. We introduce here the Heuristic Identification of Biological Architectures for simulating Complex Hierarchical Interactions (HIBACHI) method and prototype software for simulating data in this manner. This approach combines a biological hierarchy, a flexible mathematical framework, a liability threshold model for defining disease endpoints, and a heuristic search strategy for identifying high-order epistatic models of disease susceptibility. We provide several simulation examples using genetic models exhibiting independent main effects and three-way epistatic effects.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-puGgAV2_HTE/VMz2xRJOS8I/AAAAAAAAAH0/6uUF7P9q_tw/s1600/B7pDPauCIAAdqDx.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-puGgAV2_HTE/VMz2xRJOS8I/AAAAAAAAAH0/6uUF7P9q_tw/s1600/B7pDPauCIAAdqDx.jpg" height="255" width="320" /></a></div><br />Jason H. Moore, Ph.D.http://www.blogger.com/profile/07692025646640606430noreply@blogger.com0tag:blogger.com,1999:blog-10443050.post-68439667763070023562014-12-16T10:31:00.000-05:002015-01-31T10:31:32.710-05:00SNP characteristics predict replication success in association studiesGorlov IP, Moore JH, Peng B, Jin JL, Gorlova OY, Amos CI. SNP characteristics predict replication success in association studies. Hum Genet. 2014 Dec;133(12):1477-86. [<a href="http://www.ncbi.nlm.nih.gov/pubmed/25273843" target="_blank">PubMed</a>]<br /><br />Abstract<br /><br />Successful independent replication is the most direct approach for distinguishing real genotype-disease associations from false discoveries in genome-wide association studies (GWAS). Selecting SNPs for replication has been primarily based on P values from the discovery stage, although additional characteristics of SNPs may be used to improve replication success. We used disease-associated SNPs from more than 2,000 published GWASs to identify predictors of SNP reproducibility. SNP reproducibility was defined as a proportion of successful replications among all replication attempts. The study reporting association for the first time was considered to be discovery and all consequent studies targeting the same phenotype replications. We found that -Log(P), where P is a P value from the discovery study, is the strongest predictor of the SNP reproducibility. Other significant predictors include type of the SNP (e.g., missense vs intronic SNPs) and minor allele frequency. Features of the genes linked to the disease-associated SNP also predict SNP reproducibility. Based on empirically defined rules, we developed a reproducibility score (RS) to predict SNP reproducibility independently of -Log(P). We used data from two lung cancer GWAS studies as well as recently reported disease-associated SNPs to validate RS. Minus Log(P) outperforms RS when the very top SNPs are selected, while RS works better with relaxed selection criteria. In conclusion, we propose an empirical model to predict SNP reproducibility, which can be used to select SNPs for validation and prioritization.Jason H. Moore, Ph.D.http://www.blogger.com/profile/07692025646640606430noreply@blogger.com0tag:blogger.com,1999:blog-10443050.post-66567260792156071472014-11-04T10:29:00.000-05:002015-01-31T10:29:23.738-05:00The effects of recombination on phenotypic exploration and robustness in evolutionHu T, Banzhaf W, Moore JH. The effects of recombination on phenotypic exploration and robustness in evolution. Artif Life. 2014 Fall;20(4):457-70. [<a href="http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;arnumber=6926028&amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel7%2F6720217%2F6926024%2F06926028.pdf%3Farnumber%3D6926028" target="_blank">IEEE</a>]<br /><br />Abstract<br /><br />Recombination is a commonly used genetic operator in artificial and computational evolutionary systems. It has been empirically shown to be essential for evolutionary processes. However, little has been done to analyze the effects of recombination on quantitative genotypic and phenotypic properties. The majority of studies only consider mutation, mainly due to the more serious consequences of recombination in reorganizing entire genomes. Here we adopt methods from evolutionary biology to analyze a simple, yet representative, genetic programming method, linear genetic programming. We demonstrate that recombination has less disruptive effects on phenotype than mutation, that it accelerates novel phenotypic exploration, and that it particularly promotes robust phenotypes and evolves genotypic robustness and synergistic epistasis. Our results corroborate an explanation for the prevalence of recombination in complex living organisms, and helps elucidate a better understanding of the evolutionary mechanisms involved in the design of complex artificial evolutionary systems and intelligent algorithms.<br /><br /><br />Jason H. Moore, Ph.D.http://www.blogger.com/profile/07692025646640606430noreply@blogger.com0tag:blogger.com,1999:blog-10443050.post-64212639228266247242014-10-01T10:25:00.000-04:002015-01-31T10:25:26.015-05:00Phenotypic robustness and the assortativity signature of human transcription factor networksPechenick DA, Payne JL, Moore JH. Phenotypic robustness and the assortativity signature of human transcription factor networks. PLoS Comput Biol. 2014 Aug 14;10(8):e1003780. [<a href="http://www.ploscompbiol.org/article/fetchObject.action?uri=info:doi/10.1371/journal.pcbi.1003780&amp;representation=PDF" target="_blank">PDF</a>]<br /><br />Abstract<br /><br />Many developmental, physiological, and behavioral processes depend on the precise expression of genes in space and time. Such spatiotemporal gene expression phenotypes arise from the binding of sequence-specific transcription factors (TFs) to DNA, and from the regulation of nearby genes that such binding causes. These nearby genes may themselves encode TFs, giving rise to a transcription factor network (TFN), wherein nodes represent TFs and directed edges denote regulatory interactions between TFs. Computational studies have linked several topological properties of TFNs - such as their degree distribution - with the robustness of a TFN's gene expression phenotype to genetic and environmental perturbation. Another important topological property is assortativity, which measures the tendency of nodes with similar numbers of edges to connect. In directed networks, assortativity comprises four distinct components that collectively form an assortativity signature. We know very little about how a TFN's assortativity signature affects the robustness of its gene expression phenotype to perturbation. While recent theoretical results suggest that increasing one specific component of a TFN's assortativity signature leads to increased phenotypic robustness, the biological context of this finding is currently limited because the assortativity signatures of real-world TFNs have not been characterized. It is therefore unclear whether these earlier theoretical findings are biologically relevant. Moreover, it is not known how the other three components of the assortativity signature contribute to the phenotypic robustness of TFNs. Here, we use publicly available DNaseI-seq data to measure the assortativity signatures of genome-wide TFNs in 41 distinct human cell and tissue types. We find that all TFNs share a common assortativity signature and that this signature confers phenotypic robustness to model TFNs. Lastly, we determine the extent to which each of the four components of the assortativity signature contributes to this robustness.Jason H. Moore, Ph.D.http://www.blogger.com/profile/07692025646640606430noreply@blogger.com0tag:blogger.com,1999:blog-10443050.post-59463197623635423702014-09-01T10:19:00.000-04:002015-01-31T10:20:43.940-05:00Computational genetics analysis of grey matter density in Alzheimer's diseaseZieselman AL, Fisher JM, Hu T, Andrews PC, Greene CS, Shen L, Saykin AJ, Moore JH. Computational genetics analysis of grey matter density in Alzheimer's disease. BioData Min. 2014 Aug 22;7:17. [<a href="http://www.biodatamining.org/content/pdf/1756-0381-7-17.pdf" target="_blank">PDF</a>]<br /><br />Abstract<br /><br />BACKGROUND:<br /><br />Alzheimer's disease is the most common form of progressive dementia and there is currently no known cure. The cause of onset is not fully understood but genetic factors are expected to play a significant role. We present here a bioinformatics approach to the genetic analysis of grey matter density as an endophenotype for late onset Alzheimer's disease. Our approach combines machine learning analysis of gene-gene interactions with large-scale functional genomics data for assessing biological relationships.<br /><br />RESULTS:<br /><br />We found a statistically significant synergistic interaction among two SNPs located in the intergenic region of an olfactory gene cluster. This model did not replicate in an independent dataset. However, genes in this region have high-confidence biological relationships and are consistent with previous findings implicating sensory processes in Alzheimer's disease.<br /><br />CONCLUSIONS:<br /><br />Previous genetic studies of Alzheimer's disease have revealed only a small portion of the overall variability due to DNA sequence differences. Some of this missing heritability is likely due to complex gene-gene and gene-environment interactions. We have introduced here a novel bioinformatics analysis pipeline that embraces the complexity of the genetic architecture of Alzheimer's disease while at the same time harnessing the power of functional genomics. These findings represent novel hypotheses about the genetic basis of this complex disease and provide open-access methods that others can use in their own studies.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-ul8DknaU4qI/VMzyjIEHDOI/AAAAAAAAAHo/k0YSIvjmKT0/s1600/Untitled.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-ul8DknaU4qI/VMzyjIEHDOI/AAAAAAAAAHo/k0YSIvjmKT0/s1600/Untitled.jpg" height="166" width="320" /></a></div><br />Jason H. Moore, Ph.D.http://www.blogger.com/profile/07692025646640606430noreply@blogger.com0tag:blogger.com,1999:blog-10443050.post-72646986774747570962014-08-12T07:56:00.000-04:002014-08-12T07:56:23.940-04:00Diverse convergent evidence in the genetic analysis of complex disease<span style="font-family: inherit;">The genetic analysis of common human diseases should not rely solely on one piece of evidence (e.g. a p-value derived from a univariate test). In this paper, we explore integrating multiple sources of evidence in the search for valid genetic associations.</span><br /><span style="font-family: inherit;"><br />Ciesielski TH, Pendergrass SA, White MJ, Kodaman N, Sobota RS, Huang M, Bartlett J, Li J, Pan Q, Gui J, Selleck SB, Amos CI, Ritchie MD, Moore JH, Williams SM. Diverse convergent evidence in the genetic analysis of complex disease: coordinating omic, informatic, and experimental evidence to better identify and validate risk factors. BioData Min. 2014 Jun 30;7:10. [<a href="http://www.biodatamining.org/content/pdf/1756-0381-7-10.pdf" target="_blank">PDF</a>]</span><br /><span style="font-family: inherit;"><br />Abstract</span><br /><span style="font-family: inherit;"><br /><span style="background-color: white; line-height: 17.999801635742188px;">In omic research, such as genome wide association studies, researchers seek to repeat their results in other datasets to reduce false positive findings and thus provide evidence for the existence of true associations. Unfortunately this standard validation approach cannot completely eliminate false positive conclusions, and it can also mask many true associations that might otherwise advance our understanding of pathology. These issues beg the question: How can we increase the amount of knowledge gained from high throughput genetic data? To address this challenge, we present an approach that complements standard statistical validation methods by drawing attention to both potential false negative and false positive conclusions, as well as providing broad information for directing future research. The Diverse Convergent Evidence approach (DiCE) we propose integrates information from multiple sources (omics, informatics, and laboratory experiments) to estimate the strength of the available corroborating evidence supporting a given association. This process is designed to yield an evidence metric that has utility when etiologic heterogeneity, variable risk factor frequencies, and a variety of observational data imperfections might lead to false conclusions. We provide proof of principle examples in which DiCE identified strong evidence for associations that have established biological importance, when standard validation methods alone did not provide support. If used as an adjunct to standard validation methods this approach can leverage multiple distinct data types to improve genetic risk factor discovery/validation, promote effective science communication, and guide future research directions.</span></span>Jason H. Moore, Ph.D.http://www.blogger.com/profile/07692025646640606430noreply@blogger.com0tag:blogger.com,1999:blog-10443050.post-71367018060674600532014-08-07T07:50:00.003-04:002014-08-07T07:50:50.699-04:00A classification and characterization of two-locus, pure, strict, epistatic models for simulation and detection<span style="font-family: inherit;">Our latest open-access paper on developing epistasis models that can be used to simulate data for evaluating machine learning methods.&nbsp;</span><br /><span style="font-family: inherit;"><br /></span><span style="font-family: inherit;">Urbanowicz RJ, Granizo-Mackenzie AL, Kiralis J, Moore JH. A classification and characterization of two-locus, pure, strict, epistatic models for simulation and detection. BioData Min. 2014 Jun 9;7:8. [<a href="http://www.biodatamining.org/content/pdf/1756-0381-7-8.pdf" target="_blank">PDF</a>]</span><br /><span style="font-family: inherit;"><br /></span><span style="font-family: inherit;">Abstract</span><br /><span style="font-family: inherit;"><br /></span><div class="" style="background-color: white; line-height: 17.999801635742188px;"><div style="margin-bottom: 0.5em;"><span style="font-family: inherit;">The statistical genetics phenomenon of epistasis is widely acknowledged to confound disease etiology. In order to evaluate strategies for detecting these complex multi-locus disease associations, simulation studies are required. The development of the GAMETES software for the generation of complex genetic models, has provided the means to randomly generate an architecturally diverse population of epistatic models that are both pure and strict, i.e. all n loci, but no fewer, are predictive of phenotype. Previous theoretical work characterizing complex genetic models has yet to examine pure, strict, epistasis which should be the most challenging to detect. This study addresses three goals: (1) Classify and characterize pure, strict, two-locus epistatic models, (2) Investigate the effect of model 'architecture' on detection difficulty, and (3) Explore how adjusting GAMETES constraints influences diversity in the generated models.</span></div><div style="margin-bottom: 0.5em;"><span style="font-family: inherit;">In this study we utilized a geometric approach to classify pure, strict, two-locus epistatic models by "shape". In total, 33 unique shape symmetry classes were identified. Using a detection difficulty metric, we found that model shape was consistently a significant predictor of model detection difficulty. Additionally, after categorizing shape classes by the number of edges in their shape projections, we found that this edge number was also significantly predictive of detection difficulty. Analysis of constraints within GAMETES indicated that increasing model population size can expand model class coverage but does little to change the range of observed difficulty metric scores. A variable population prevalence significantly increased the range of observed difficulty metric scores and, for certain constraints, also improved model class coverage.</span></div><div style="margin-bottom: 0.5em;"><span style="font-family: inherit;">These analyses further our theoretical understanding of epistatic relationships and uncover guidelines for the effective generation of complex models using GAMETES. Specifically, (1) we have characterized 33 shape classes by edge number, detection difficulty, and observed frequency (2) our results support the claim that model architecture directly influences detection difficulty, and (3) we found that GAMETES will generate a maximally diverse set of models with a variable population prevalence and a larger model population size. However, a model population size as small as 1,000 is likely to be sufficient.</span></div><div style="margin-bottom: 0.5em;"><span style="font-family: inherit;"><br /></span></div><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-K_CZ_FCDWIg/U-Nn3RxRvzI/AAAAAAAAAHE/qpOm__0l18Q/s1600/epi.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-K_CZ_FCDWIg/U-Nn3RxRvzI/AAAAAAAAAHE/qpOm__0l18Q/s1600/epi.jpg" height="130" width="400" /></a></div><div style="margin-bottom: 0.5em;"><span style="font-family: inherit;"><br /></span></div></div>Jason H. Moore, Ph.D.http://www.blogger.com/profile/07692025646640606430noreply@blogger.com0tag:blogger.com,1999:blog-10443050.post-91019031746995433822014-08-04T19:02:00.001-04:002014-08-04T19:04:09.462-04:00Why epistasis is important for tackling complex human disease geneticsMy new epistasis review / comment with <a href="http://genetics.sciences.ncsu.edu/index.php/people/trudy-mackay/" target="_blank">Trudy Mackay</a> from NC State. Unfortunately, it is not open access. Email me for a pdf.<br /><br />Mackay TF, Moore JH. Why epistasis is important for tackling complex human disease genetics. Genome Med. 2014 Jun 9;6(6):42. [<a href="http://www.ncbi.nlm.nih.gov/pubmed/25031624" target="_blank">PubMed</a>]Jason H. Moore, Ph.D.http://www.blogger.com/profile/07692025646640606430noreply@blogger.com0tag:blogger.com,1999:blog-10443050.post-79657402847937194332014-07-31T16:17:00.004-04:002014-07-31T16:19:38.867-04:00First complex, then simpleMy new&nbsp;BioData Mining&nbsp;editorial with Dr. James Malley from NIH on approaching machine learning and data science modeling from a complexity point of view.<br /><br />Malley, JD, Moore JH. First complex, then simple. BioData Mining 2014;7:13 [<a href="http://www.biodatamining.org/content/pdf/1756-0381-7-13.pdf" target="_blank">PDF</a>]Jason H. Moore, Ph.D.http://www.blogger.com/profile/07692025646640606430noreply@blogger.com0tag:blogger.com,1999:blog-10443050.post-54147571965973967492014-07-29T10:09:00.003-04:002014-07-29T10:09:54.482-04:00Innovation is often unnerving: the door into summer<span style="font-family: inherit;">My latest BioData Mining editorial with Dr. James Malley from NIH. We discuss innovation and interestingness within the context of biological data mining.</span><br /><span style="font-family: inherit;"><br /></span><span style="font-family: inherit;">Malley JD, Moore JH. Innovation is often unnerving: the door into summer.&nbsp;</span><span style="font-family: inherit;">BioData Min. 2014 Jul 17;7:12. [<a href="http://www.biodatamining.org/content/pdf/1756-0381-7-12.pdf" target="_blank">PDF</a>]</span>Jason H. Moore, Ph.D.http://www.blogger.com/profile/07692025646640606430noreply@blogger.com0tag:blogger.com,1999:blog-10443050.post-42583408879996457952014-06-09T08:57:00.000-04:002014-06-09T08:57:08.646-04:00Validated context-dependent associations of coronary heart disease risk with genotype variation<span style="background-color: white; line-height: 17px;"><span style="font-family: inherit;">An absolutely fabulous paper from Charlie Sing and Andy Clark on the context-dependent effects of GWAS hits. A must read for anyone that believes, as I do, that the assumption of simplicity of genetic architecture is not realistic for most biological traits in humans. I have included below a paper for the PRIM method they used. Also worth a read.</span></span><br /><span style="font-family: inherit;"><span style="background-color: white; line-height: 17px;"><br /></span>Lusk CM, Dyson G, Clark AG, Ballantyne CM, Frikke-Schmidt R, Tybjærg-Hansen A,&nbsp;Boerwinkle E, Sing CF. Validated context-dependent associations of coronary heart&nbsp;disease risk with genotype variation in the chromosome 9p21 region: the&nbsp;Atherosclerosis Risk in Communities study. Hum Genet. 2014 Jun 3. [<a href="http://www.ncbi.nlm.nih.gov/pubmed/24889828" target="_blank">PubMed</a>]</span><br /><span style="font-family: inherit;"><br />Dyson G, Sing CF. Efficient identification of context dependent subgroups of&nbsp;risk from genome-wide association studies. Stat Appl Genet Mol Biol. 2014 Apr&nbsp;1;13(2):217-26. [<a href="http://www.ncbi.nlm.nih.gov/pubmed/24570412" target="_blank">PubMed</a>]</span><br /><span style="font-family: inherit;"><br />Dyson G, Frikke-Schmidt R, Nordestgaard BG, Tybjaerg-Hansen A, Sing CF.&nbsp;Modifications to the Patient Rule-Induction Method that utilize non-additive&nbsp;combinations of genetic and environmental effects to define partitions that&nbsp;predict ischemic heart disease. Genet Epidemiol. 2009 May;33(4):317-24. [<a href="http://www.ncbi.nlm.nih.gov/pubmed/19025787" target="_blank">PubMed</a>]</span>Jason H. Moore, Ph.D.http://www.blogger.com/profile/07692025646640606430noreply@blogger.com0tag:blogger.com,1999:blog-10443050.post-84814260076286247382014-05-02T10:44:00.004-04:002014-05-02T10:44:37.161-04:00Models in BiologyThis is by far one of the best papers I have read in the past year. Very relevant to our efforts to build mathematical, statistical and computation models of the genotype-phenotype relationship in population-based studies of common diseases.&nbsp;Gunawardena <i>BMC Biology</i> 2014, 12:29 [<a href="http://www.biomedcentral.com/content/pdf/1741-7007-12-29.pdf" target="_blank">PDF</a>] &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<br /><br /><br />Jason H. Moore, Ph.D.http://www.blogger.com/profile/07692025646640606430noreply@blogger.com0tag:blogger.com,1999:blog-10443050.post-28309543897007973862014-04-27T09:36:00.002-04:002014-04-27T09:39:39.914-04:00Functional genomics annotation of a statistical epistasis network associated with bladder cancer susceptibilityWe have previously published a statistical epistasis network associated with bladder cancer in a population based study (<a href="http://www.biomedcentral.com/content/pdf/1471-2105-12-364.pdf">Hu et al., BMC Bioinformatics</a>). This network was highly non-random and contained many genes that were part of the aryl hydrocarbon receptor pathway. Below is a short paper reporting functional genomics annotation of the network.<br /><br />Hu T, Pan Q, Andrew AS, Langer JM, Cole MD, Tomlinson CR, Karagas MR, Moore JH. Functional genomics annotation of a statistical epistasis network associated with bladder cancer susceptibility. BioData Min. 2014 Apr 11;7(1):5. [<a href="http://www.biodatamining.org/content/pdf/1756-0381-7-5.pdf">BioData Mining</a>]<br /><br />Abstract<br /><br />BACKGROUND:&nbsp;Several different genetic and environmental factors have been identified as independent risk factors for bladder cancer in population-based studies. Recent studies have turned to understanding the role of gene-gene and gene-environment interactions in determining risk. We previously developed the bioinformatics framework of statistical epistasis networks (SEN) to characterize the global structure of interacting genetic factors associated with a particular disease or clinical outcome. By applying SEN to a population-based study of bladder cancer among Caucasians in New Hampshire, we were able to identify a set of connected genetic factors with strong and significant interaction effects on bladder cancer susceptibility.<div><br />FINDINGS:&nbsp;To support our statistical findings using networks, in the present study, we performed pathway enrichment analyses on the set of genes identified using SEN, and found that they are associated with the carcinogen benzo[a]pyrene, a component of tobacco smoke. We further carried out an mRNA expression microarray experiment to validate statistical genetic interactions, and to determine if the set of genes identified in the SEN were differentially expressed in a normal bladder cell line and a bladder cancer cell line in the presence or absence of benzo[a]pyrene. Significant nonrandom sets of genes from the SEN were found to be differentially expressed in response to benzo[a]pyrene in both the normal bladder cells and the bladder cancer cells. In addition, the patterns of gene expression were significantly different between these two cell types.</div><div><br />CONCLUSIONS:&nbsp;The enrichment analyses and the gene expression microarray results support the idea that SEN analysis of bladder in population-based studies is able to identify biologically meaningful statistical patterns. These results bring us a step closer to a systems genetic approach to understanding cancer susceptibility that integrates population and laboratory-based studies.</div><div><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-aVPIbndfcfY/U10HvaoEP6I/AAAAAAAAAGw/960QcWM3f8o/s1600/net.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-aVPIbndfcfY/U10HvaoEP6I/AAAAAAAAAGw/960QcWM3f8o/s1600/net.jpg" height="176" width="320" /></a></div><div class="MsoNormal" style="background: white; line-height: 13.5pt; margin-bottom: 6.0pt; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt;"><br /></div></div>Jason H. Moore, Ph.D.http://www.blogger.com/profile/07692025646640606430noreply@blogger.com0tag:blogger.com,1999:blog-10443050.post-5948722979773424132014-04-14T13:50:00.002-04:002014-04-14T13:50:31.589-04:00To replicate or not to replicate? The case of pharmacogenetic studies<span style="font-family: inherit;">Statistical replication has always been the gold standard in genome-wide association studies (GWAS). However, as we have previously pointed out, there are many good reasons why true genetic associations might not replicate (<a href="http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0005639" target="_blank">Greene et al. 2009</a>). This 2013 paper explores the issue with respect to pharmacogenetic studies. The mantra of GWAS is now focused on the identification of new drug targets using genetic association results. If this is true, should biological validation matter more than statistical replication?&nbsp;</span><br /><span style="font-family: inherit;"><br />Aslibekyan S, Claas SA, Arnett DK. To replicate or not to replicate: the case&nbsp;of pharmacogenetic studies: Establishing validity of pharmacogenomic findings:&nbsp;from replication to triangulation. Circ Cardiovasc Genet. 2013 Aug;6(4):409-12 [<a href="http://www.ncbi.nlm.nih.gov/pubmed/23963160" target="_blank">PubMed</a>]</span>Jason H. Moore, Ph.D.http://www.blogger.com/profile/07692025646640606430noreply@blogger.com0tag:blogger.com,1999:blog-10443050.post-10862090957832790832014-04-13T10:28:00.004-04:002014-04-27T09:40:11.458-04:00Why human disease-associated residues appear as the wild-type in other species: genome-scale structural evidence for the compensation hypothesis<span style="font-family: inherit;">X</span>u J, Zhang J. Why human disease-associated residues appear as the wild-type in other species: genome-scale structural evidence for the compensation hypothesis. Mol Biol Evol. 2014 [<a href="http://www.ncbi.nlm.nih.gov/pubmed/24723421">PubMed</a>]<br /><br />Abstract<br /><br />Many human-disease associated amino acid residues (DARs) appear as the wild-type in other species. This phenomenon is commonly explained by the presence of compensatory residues in these other species that alleviate the deleterious effects of the DARs. The general validity of this hypothesis, however, is unclear, because few compensatory residues have been identified. Here we test the compensation hypothesis by assembling and analyzing 1077 DARs located in 177 proteins of known crystal structures. Because destabilizing protein structures is a primary reason why DARs are deleterious, we focus on protein stability in this analysis. We discover that, in species where a DAR represents the wild-type, the destabilizing effect of the DAR is generally lessened by the observed amino acid substitutions in the spatial proximity of the DAR. This and other findings provide genome-scale evidence for the compensation hypothesis and have important implications for understanding epistasis in protein evolution and for using animal models of human diseases.<br />Jason H. Moore, Ph.D.http://www.blogger.com/profile/07692025646640606430noreply@blogger.com0tag:blogger.com,1999:blog-10443050.post-12379174164262242642014-04-12T11:11:00.001-04:002014-04-27T09:44:06.949-04:00Detection and replication of epistasis influencing transcription in humansThis study demonstrates that replicable epistasis is common at the level of transcription.<br /><br />Hemani G, Shakhbazov K, Westra HJ, Esko T, Henders AK, McRae AF, Yang J, Gibson G, Martin NG, Metspalu A, Franke L, Montgomery GW, Visscher PM, Powell JE. Detection and replication of epistasis influencing transcription in humans. Nature. 2014 Apr 10;508(7495):249-53. [<a href="http://www.ncbi.nlm.nih.gov/pubmed/24572353">PubMed</a>]<br /><br />Abstract<br /><br />Epistasis is the phenomenon whereby one polymorphism's effect on a trait depends on other polymorphisms present in the genome. The extent to which epistasis influences complex traits and contributes to their variation is a fundamental question in evolution and human genetics. Although often demonstrated in artificial gene manipulation studies in model organisms, and some examples have been reported in other species, few examples exist for epistasis among natural polymorphisms in human traits. Its absence from empirical findings may simply be due to low incidence in the genetic control of complex traits, but an alternative view is that it has previously been too technically challenging to detect owing to statistical and computational issues. Here we show, using advanced computation and a gene expression study design, that many instances of epistasis are found between common single nucleotide polymorphisms (SNPs). In a cohort of 846 individuals with 7,339 gene expression levels measured in peripheral blood, we found 501 significant pairwise interactions between common SNPs influencing the expression of 238 genes (P = 2.91 × 10(-16)). Replication of these interactions in two independent data sets showed both concordance of direction of epistatic effects (P = 5.56 × 10(-31)) and enrichment of interaction P values, with 30 being significant at a conservative threshold of P &lt; 9.98 × 10(-5). Forty-four of the genetic interactions are located within 5 megabases of regions of known physical chromosome interactions (P = 1.8 × 10(-10)). Epistatic networks of three SNPs or more influence the expression levels of 129 genes, whereby one cis-acting SNP is modulated by several trans-acting SNPs. For example, MBNL1 is influenced by an additive effect at rs13069559, which itself is masked by trans-SNPs on 14 different chromosomes, with nearly identical genotype-phenotype maps for each cis-trans interaction. This study presents the first evidence, to our knowledge, for many instances of segregating common polymorphisms interacting to influence human traits.Jason H. Moore, Ph.D.http://www.blogger.com/profile/07692025646640606430noreply@blogger.com0tag:blogger.com,1999:blog-10443050.post-46216454218552552992014-02-14T17:59:00.000-05:002014-02-14T17:59:11.620-05:00Promotion and tenure review at universitiesThere is an interesting thread on promotion and tenure by <a href="https://twitter.com/phylogenomics" target="_blank">@phylogenomics</a> on twitter using the hashtag #publishperish14. He is tweeting comments from <a href="https://twitter.com/lindakatehi" target="_blank">@lindakateh</a>, the chancellor at UC Davis. All the tweets from the conference have been archived <a href="https://docs.google.com/spreadsheet/ccc?key=0AkrYZRk8BPm2dGFJR3R1WUJKdHRMcTczYkFuaG5Qamc&amp;usp=sharing#gid=82" target="_blank">here</a>. Some of the discussion was related to whether everyone should have tenure since very few actually don't get it. I also like the comments about the tenure process reducing creativity and risk-taking. She also discussed the idea of rewarding faculty for other forms of expression including blogs.Jason H. Moore, Ph.D.http://www.blogger.com/profile/07692025646640606430noreply@blogger.com0tag:blogger.com,1999:blog-10443050.post-47682393929966624902014-02-12T19:12:00.003-05:002014-04-27T09:40:43.677-04:00P-values, the 'gold standard' of statistical validity, are not as reliable as many scientists assumeThis piece in <a href="http://www.nature.com/news/scientific-method-statistical-errors-1.14700">Nature</a> about the limitations of p-values is a must read. It is very relevant to genetics, epidemiology and, especially, genome-wide association studies (GWAS) that have put so much emphasis on p-values. We have also written about the limitations of p-values in a short <a href="http://www.biodatamining.org/content/6/1/10">editorial</a>.Jason H. Moore, Ph.D.http://www.blogger.com/profile/07692025646640606430noreply@blogger.com0tag:blogger.com,1999:blog-10443050.post-20958513133853737692014-01-30T11:09:00.005-05:002014-04-27T09:41:15.844-04:00Reconciling clinical importance and statistical significance in GWASGenome-wide association studies (GWAS) have identified many risk-associated SNPs with very small effects. The mantra for identifying more associations is to greatly increase the sample size to be able to detect smaller and smaller effects. This wonderful letter in the <a href="http://www.nature.com/ejhg/journal/v22/n2/full/ejhg2013110a.html">European Journal of Human Genetics</a> points out that at some point the effect size goes below the measurement error calling into question the clinical significance of these GWAS hits. If I were funding a big GWAS study I would first want to know whether increasing the sample size is justified given the effects sizes to be detected and the error of the phenotype measures.<br /><br /> Shriner D, Adeyemo A, Rotimi CN. Reconciling clinical importance and statistical significance. Eur J Hum Genet. 2014 Feb;22(2):158-9. [<a href="http://www.nature.com/ejhg/journal/v22/n2/full/ejhg2013110a.html">EJHG</a>]<br />Jason H. Moore, Ph.D.http://www.blogger.com/profile/07692025646640606430noreply@blogger.com0tag:blogger.com,1999:blog-10443050.post-4854229725193615562014-01-25T08:41:00.001-05:002014-01-25T08:41:31.405-05:00My take of the FDA's decision to regulate 23andMeIn 2013 the FDA ordered <a href="https://www.23andme.com/" target="_blank">23andMe</a> to stop selling it's genetic testing services for health-related purposes. This was a very controversial ruling that generated lots of discussion in the media. A collection of links to media coverage and opinions put together by writer David Dobbs can be found <a href="http://daviddobbs.net/smoothpebbles/i-got-your-23andme-v-fda-links-right-here/" target="_blank">here</a>. My take on the issue can be found in this <a href="http://geiselmed.dartmouth.edu/news/2014/01/23_moore/" target="_blank">Dartmouth Medicine Magazine</a> piece.Jason H. Moore, Ph.D.http://www.blogger.com/profile/07692025646640606430noreply@blogger.com0tag:blogger.com,1999:blog-10443050.post-69411205955962971672014-01-11T09:24:00.003-05:002014-01-11T09:26:59.099-05:00Percentile Ranking and Citation Impact of a Large Cohort of NHLBI-Funded Cardiovascular R01 GrantsI just ran across this interesting new study that evaluated the relationship between the score that an NIH R01 grant receives during peer-review and the future impact of the grant as measured by number and quality of publications. The bottom line is that a grant that receives a top score in the 10th percentile does not produce publications with impact above and beyond a grant in the 30th percentile that would not be funded by 2014 criteria.<br /><br />Danthi N, Wu CO, Shi P, Lauer MS. Percentile Ranking and Citation Impact of a Large Cohort of NHLBI-Funded Cardiovascular R01 Grants. Circ Res. 2014 Jan 9. [<a href="http://www.ncbi.nlm.nih.gov/pubmed/?term=Percentile+Ranking+and+Citation+Impact+of+a+Large+Cohort+of+NHLBI-Funded+Cardiovascular+R01+Grants" target="_blank">PubMed</a>]<br /><br />Abstract<br /><br />Rationale: Funding decisions for cardiovascular R01 grant applications at NHLBI largely hinge on percentile rankings. It is not known whether this approach enables the highest impact science.<br /><br />Objective: To conduct an observational analysis of percentile rankings and bibliometric outcomes for a contemporary set of funded NHLBI cardiovascular R01 grants.<br /><br />Methods and Results: We identified 1492 investigator-initiated de novo R01 grant applications that were funded between 2001 and 2008, and followed their progress for linked publications and citations to those publications. Our co-primary endpoints were citations received per million dollars of funding, citations obtained within 2-years of publication, and 2-year citations for each grant's maximally cited paper. In 7654 grant-years of funding that generated $3004 million of total NIH awards, the portfolio yielded 16,793 publications that appeared between 2001 and 2012 (median per grant 8, 25th and 75th percentiles 4 and 14, range 0 - 123), which received 2,224,255 citations (median per grant 1048, 25th and 75th percentiles 492 and 1,932, range 0 - 16,295). We found no association between percentile ranking and citation metrics; the absence of association persisted even after accounting for calendar time, grant duration, number of grants acknowledged per paper, number of authors per paper, early investigator status, human versus non-human focus, and institutional funding. An exploratory machine-learning analysis suggested that grants with the very best percentile rankings did yield more maximally cited papers.<br /><br />Conclusions: In a large cohort of NHLBI-funded cardiovascular grants, we were unable to find a monotonic association between better percentile ranking and higher scientific impact as assessed by citation metrics.Jason H. Moore, Ph.D.http://www.blogger.com/profile/07692025646640606430noreply@blogger.com0tag:blogger.com,1999:blog-10443050.post-14167457251025239652013-12-30T11:07:00.001-05:002013-12-30T11:20:37.852-05:00Epistasis Blog Posts from 2013<strong style="text-decoration: none;"><span style="font-family: inherit; font-size: large; text-decoration: none;"><a href="http://compgen.blogspot.com/2013_01_01_archive.html" style="text-decoration: none;" target="_blank">January, 2013</a>&nbsp;</span></strong><br /><span style="font-family: inherit;"><br /></span><span style="font-family: inherit;">Gene-gene interactions in a pathway-based analysis of genetic susceptibility to bladder cancer</span><br /><span style="font-family: inherit;"><br /></span><br /><div><strong><span style="font-family: inherit; font-size: large;"><a href="http://compgen.blogspot.com/2013_02_01_archive.html" style="text-decoration: none;" target="_blank">February, 2013</a></span></strong></div><div><span style="font-family: inherit;"><br /></span><span style="font-family: inherit;">Complex effects of nucleotide variants in a mammalian cis-regulatory element</span><br /><span style="font-family: inherit;"><br /></span><br /><div><strong><span style="font-family: inherit; font-size: large;"><a href="http://compgen.blogspot.com/2013_03_01_archive.html" style="text-decoration: none;" target="_blank">March, 2013</a></span></strong></div><div><span style="font-family: inherit;"><br /></span><span style="font-family: inherit;">Four tips for success in graduate school and beyond</span><br /><span style="font-family: inherit;"><br /></span><span style="font-family: inherit;">Gene-based testing of interactions in association studies of quantitative traits</span><br /><span style="font-family: inherit;"><br /></span><span style="font-family: inherit;">Alternative definitions of epistasis</span><br /><span style="font-family: inherit;"><br /></span><span style="font-family: inherit;">Role of genetic heterogeneity and epistasis in bladder cancer susceptibility and outcome: a learning classifier system approach</span><br /><span style="font-family: inherit;"><br /></span><span style="font-family: inherit;">Multifactor dimensionality reduction reveals a three-locus epistatic interaction associated with susceptibility to pulmonary tuberculosis</span><br /><span style="font-family: inherit;"><br /></span><span style="font-family: inherit;">ViSEN: Methodology and software for visualization of statistical epistasis networks</span><br /><span style="font-family: inherit;"><br /></span><span style="font-family: inherit;">Statistical epistasis networks reduce the computational complexity of searching three-locus genetic models</span><br /><span style="font-family: inherit;"><br /></span><span style="font-family: inherit;">An information-gain approach to detecting three-way epistatic interactions in genetic association studies</span><br /><span style="font-family: inherit;"><br /></span><br /><div><strong><span style="font-family: inherit; font-size: large;"><a href="http://compgen.blogspot.com/2013_04_01_archive.html" style="text-decoration: none;" target="_blank">April, 2013</a></span></strong></div><div><span style="font-family: inherit;"><br /></span><span style="font-family: inherit;">Things genes can't do. Shall we have pie or stew?</span><br /><span style="font-family: inherit;"><br /></span><br /><div><strong><span style="font-family: inherit; font-size: large;"><a href="http://compgen.blogspot.com/2013_05_01_archive.html" style="text-decoration: none;" target="_blank">May, 2013</a></span></strong></div><div><span style="font-family: inherit;"><br /></span><span style="font-family: inherit;">Probabilistic multifactor causation - what do we mean?</span><br /><span style="font-family: inherit;"><br /></span><span style="font-family: inherit;">Journal impact factors - updated</span><br /><span style="font-family: inherit;"><br /></span><span style="font-family: inherit;">A robustness study of parametric and non-parametric tests in model-based multifactor dimensionality reduction for epistasis detection</span><br /><span style="font-family: inherit;"><br /></span><br /><div><strong><span style="font-family: inherit; font-size: large;"><a href="http://compgen.blogspot.com/2013_06_01_archive.html" style="text-decoration: none;" target="_blank">June, 2013</a></span></strong></div><div><span style="font-family: inherit;"><br /></span><span style="font-family: inherit;">JAMIA special issue on Translational Bioinformatics</span><br /><span style="font-family: inherit;"><br /></span><br /><div><strong><span style="font-family: inherit; font-size: large;"><a href="http://compgen.blogspot.com/2013_07_01_archive.html" style="text-decoration: none;" target="_blank">July, 2013</a></span></strong></div><div><span style="font-family: inherit;"><br /></span><span style="font-family: inherit;">A simple extension of Multifactor Dimensionality Reduction (MDR) for detecting epistasis effects on quantitative traits</span><br /><span style="font-family: inherit;"><br /></span><br /><div><strong><span style="font-family: inherit; font-size: large;"><a href="http://compgen.blogspot.com/2013_08_01_archive.html" style="text-decoration: none;" target="_blank">August, 2013</a></span></strong></div><div><span style="font-family: inherit;"><br /></span><span style="font-family: inherit;">The effect of genetic background on genetic interaction networks</span><br /><span style="font-family: inherit;"><br /></span><br /><div><strong><span style="font-family: inherit; font-size: large;"><a href="http://compgen.blogspot.com/2013_09_01_archive.html" style="text-decoration: none;" target="_blank">September, 2013</a></span></strong></div><div><span style="font-family: inherit;"><br /></span><span style="font-family: inherit;">Genotype-environment interactions reveal causal pathways that mediate genetic effects on phenotype</span><br /><span style="font-family: inherit;"><br /></span><br /><div><strong><span style="font-family: inherit; font-size: large;"><a href="http://compgen.blogspot.com/2013_10_01_archive.html" style="text-decoration: none;" target="_blank">October, 2013</a></span></strong></div><div><span style="font-family: inherit;"><br /></span><span style="font-family: inherit;">Best paper award at Translational Bioinformatics Conference</span><br /><span style="font-family: inherit;"><br /></span><br /><div><strong><span style="font-family: inherit; font-size: large;"><a href="http://compgen.blogspot.com/2013_12_01_archive.html" style="text-decoration: none;" target="_blank">December, 2013</a></span></strong></div><div><span style="font-family: inherit;"><br /></span><span style="font-family: inherit;">Big data analysis on autopilot?</span></div></div></div></div></div></div></div></div></div></div>Jason H. Moore, Ph.D.http://www.blogger.com/profile/07692025646640606430noreply@blogger.com0