Sample records for genome wide scan

... 1999 Spotlight on Research 2012 July 2012 (historical) Genome-WideScan Reveals Mutation Associated with Melanoma A ... out to see if a technology called whole genome sequencing would help them find other genetic risk ...

Preeclampsia, hallmarked by de novo hypertension and proteinuria in pregnancy, has a familial tendency. Recently, a large Icelandic genome-widescan provided evidence for a maternal susceptibility locus for preeclampsia on chromosome 2p13 which was confirmed by a genomescan from Australia and New Zealand (NZ). The current study reports on a genome-widescan of Dutch affected sib-pair families. In total 67 Dutch affected sib-pair families, comprising at least two siblings with proteinuric preeclampsia, eclampsia or HELLP-syndrome, were typed for 293 polymorphic markers throughout the genome and linkage analysis was performed. The highest allele sharing lod score of 1.99 was seen on chromosome 12q at 109.5 cM. Two peaks overlapped in the same regions between the Dutch and Icelandic genome-widescan at chromosome 3p and chromosome 15q. No overlap was seen on 2p. Re-analysis in 38 families without HELLP-syndrome (preeclampsia families) and 34 families with at least one sibling with HELLP syndrome (HELLP families), revealed two peaks with suggestive evidence for linkage in the non-HELLP families on chromosome 10q (lod score 2.38, D10S1432, 93.9 cM) and 22q (lod score 2.41, D22S685, 32.4 cM). The peak on 12q appeared to be associated with HELLP syndrome; it increased to a lod score of 2.1 in the HELLP families and almost disappeared in the preeclampsia families. A nominal peak on chromosome 11 in the preeclampsia families showed overlap with the second highest peak in the Australian/NZ study. Results from our Dutch genome-widescan indicate that HELLP syndrome might have a different genetic background than preeclampsia. PMID:11781687

Keloids are benign fibroproliferative tumors of the skin which commonly occur after injury mainly in darker skinned patients. Medical treatment is fraught with high recurrence rates mainly because of an incomplete understanding of the biological mechanisms that lead to keloids. The purpose of this project was to examine keloid pathogenesis from the epigenome perspective of DNA methylation. Genome-wide profiling used the Infinium HumanMethylation450 BeadChip to interrogate DNA from 6 fresh keloid and 6 normal skin samples from 12 anonymous donors. A 3-tiered approach was used to call out genes most differentially methylated between keloid and normal. When compared to normal, of the 685 differentially methylated CpGs at Tier 3, 510 were hypomethylated and 175 were hypermethylated with 190 CpGs in promoter and 495 in nonpromoter regions. The 190 promoter region CpGs corresponded to 152 genes: 96 (63%) were hypomethylated and 56 (37%) hypermethylated. This exploratory genome-widescan of the keloid methylome highlights a predominance of hypomethylated genomic landscapes, favoring nonpromoter regions. DNA methylation, as an additional mechanism for gene regulation in keloid pathogenesis, holds potential for novel treatments that reverse deleterious epigenetic changes. As an alternative mechanism for regulating genes, epigenetics may explain why gene mutations alone do not provide definitive mechanisms for keloid formation. PMID:26074660

Background There is evidence for a genetic contribution to chronic periodontitis. In this study, we conducted a genomewide association study among 866 participants of the University of Pittsburgh Dental Registry and DNA Repository, whose periodontal diagnosis ranged from healthy (N = 767) to severe chronic periodontitis (N = 99). Methods Genotypingi of over half-million single nucleotide polymorphisms was determined. Analyses were done twice, first in the complete dataset of all ethnicities, and second including only samples defined as self-reported Whites. From the top 100 results, twenty single nucleotide polymorphisms had consistent results in both analyses (borderline p-values ranging from 1E-05 to 1E-6) and were selected to be tested in two independent datasets derived from 1,460 individuals from Porto Alegre, and 359 from Rio de Janeiro, Brazil. Meta-analyses of the Single nucleotide polymorphisms showing a trend for association in the independent dataset were performed. Results The rs1477403 marker located on 16q22.3 showed suggestive association in the discovery phase and in the Porto Alegre dataset (p = 0.05). The meta-analysis suggested the less common allele decreases the risk of chronic periodontitis. Conclusions Our data offer a clear hypothesis to be independently tested regarding the contribution of the 16q22.3 locus to chronic periodontitis. PMID:25008200

Background No single-nucleotide polymorphisms (SNPs) specific for aggressive prostate cancer have been identified in genome-wide association studies (GWAS). Objective To test if SNPs associated with other traits may also affect the risk of aggressive prostate cancer. Design, setting, and participants SNPs implicated in any phenotype other than prostate cancer (p ≤ 10−7) were identified through the catalog of published GWAS and tested in 2891 aggressive prostate cancer cases and 4592 controls from the Breast and Prostate Cancer Cohort Consortium (BPC3). The 40 most significant SNPs were followed up in 4872 aggressive prostate cancer cases and 24 534 controls from the Prostate Cancer Association Group to Investigate Cancer Associated Alterations in the Genome (PRACTICAL) consortium. Outcome measurements and statistical analysis Odds ratios (ORs) and 95% confidence intervals (CIs) for aggressive prostate cancer were estimated. Results and limitations A total of 4666 SNPs were evaluated by the BPC3. Two signals were seen in regions already reported for prostate cancer risk. rs7014346 at 8q24.21 was marginally associated with aggressive prostate cancer in the BPC3 trial (p = 1.6 × 10-6), whereas after meta-analysis by PRACTICAL the summary OR was 1.21 (95%CI 1.16–1.27; p = 3.22 × 10−18). rs9900242 at 17q24.3 was also marginally associated with aggressive disease in the meta-analysis (OR 0.90, 95% CI 0.86–0.94; p = 2.5 × 10−6). Neither of these SNPs remained statistically significant when conditioning on correlated known prostate cancer SNPs. The meta-analysis by BPC3 and PRACTICAL identified a third promising signal, marked by rs16844874 at 2q34, independent of known prostate cancer loci (OR 1.12,95% CI 1.06–1.19; p = 4.67 × 10−5); it has been shown that SNPs correlated with this signal affect glycine concentrations. The main limitation is the heterogeneity in the definition of aggressive prostate cancer between BPC3 and PRACTICAL. Conclusions We did

Genome-widescans were conducted in a search for genetic locations linked to energy expenditure and substrate oxidation in children. Pedigreed data of 1030 Hispanic children and adolescents were from the Viva La Familia Study, which was designed to investigate genetic and environmental risk factors ...

Background. Several genomescans have explored the linkage of chronic kidney disease phenotypes to chromosomic regions with disparate results. Genomescan meta-analysis (GSMA) is a quantitative method to synthesize linkage results from independent studies and assess their concordance. Methods. We searched PubMed to identify genome linkage analyses of renal function traits in humans, such as estimated glomerular filtration rate (GFR), albuminuria, serum creatinine concentration and creatinine clearance. We contacted authors for numerical data and extracted information from individual studies. We applied the GSMA nonparametric approach to combine results across 14 linkage studies for GFR, 11 linkage studies for albumin creatinine ratio, 11 linkage studies for serum creatinine and 4 linkage studies for creatinine clearance. Results. No chromosomal region reached genome-wide statistical significance in the main analysis which included all scans under each phenotype; however, regions on Chromosomes 7, 10 and 16 reached suggestive significance for linkage to two or more phenotypes. Subgroup analyses by disease status or ethnicity did not yield additional information. Conclusions. While heterogeneity across populations, methodologies and study designs likely explain this lack of agreement, it is possible that linkage scan methodologies lack the resolution for investigating complex traits. Combining family-based linkage studies with genome-wide association studies may be a powerful approach to detect private mutations contributing to complex renal phenotypes. PMID:21622988

In order to take into account the complex genomic distribution of SNP variations when identifying chromosomal regions with significant SNP effects, a single nucleotide polymorphism (SNP) association scan statistic was developed. To address the computational needs of genomewide association (GWA) studies, a fast Java application, which combines single-locus SNP tests and a scan statistic for identifying chromosomal regions with significant clusters of significant SNP effects, was developed and implemented. To illustrate this application, SNP associations were analyzed in a pharmacogenomic study of the blood pressure lowering effect of thiazide-diuretics (N=195) using the Affymetrix Human Mapping 100K Set. 55,335 tagSNPs (pair-wise linkage disequilibrium R2<0.5) were selected to reduce the frequency correlation between SNPs. A typical workstation can complete the whole genomescan including 10,000 permutation tests within 3 hours. The most significant regions locate on chromosome 3, 6, 13 and 16, two of which contain candidate genes that may be involved in the underlying drug response mechanism. The computational performance of ChromoScan-GWA and its scalability were tested with up to 1,000,000 SNPs and up to 4,000 subjects. Using 10,000 permutations, the computation time grew linearly in these datasets. This scan statistic application provides a robust statistical and computational foundation for identifying genomic regions associated with disease and provides a method to compare GWA results even across different platforms. PMID:20161066

Aberrant connectivity is implicated in many neurological and psychiatric disorders, including Alzheimer’s disease and schizophrenia. However, other than a few disease-associated candidate genes, we know little about the degree to which genetics play a role in the brain networks; we know even less about specific genes that influence brain connections. Twin and family-based studies can generate estimates of overall genetic influences on a trait, but genome-wide association scans (GWASs) can screen the genome for specific variants influencing the brain or risk for disease. To identify the heritability of various brain connections, we scanned healthy young adult twins with high-field, high-angular resolution diffusion MRI. We adapted GWASs to screen the brain’s connectivity pattern, allowing us to discover genetic variants that affect the human brain’s wiring. The association of connectivity with the SPON1 variant at rs2618516 on chromosome 11 (11p15.2) reached connectome-wide, genome-wide significance after stringent statistical corrections were enforced, and it was replicated in an independent subsample. rs2618516 was shown to affect brain structure in an elderly population with varying degrees of dementia. Older people who carried the connectivity variant had significantly milder clinical dementia scores and lower risk of Alzheimer’s disease. As a posthoc analysis, we conducted GWASs on several organizational and topological network measures derived from the matrices to discover variants in and around genes associated with autism (MACROD2), development (NEDD4), and mental retardation (UBE2A) significantly associated with connectivity. Connectome-wide, genome-wide screening offers substantial promise to discover genes affecting brain connectivity and risk for brain diseases. PMID:23471985

Prostate cancer is the most common non-skin cancer and the second leading cause of cancer related mortality for men in the United States. There is strong empirical and epidemiological evidence supporting a stronger role of genetics in early-onset prostate cancer. We performed a genome-wide association scan for early-onset prostate cancer. Novel aspects of this study include the focus on early-onset disease (defined as men with prostate cancer diagnosed before age 56 years) and use of publically available control genotype data from previous genome-wide association studies. We found genome-wide significant (p<5×10−8) evidence for variants at 8q24 and 11p15 and strong supportive evidence for a number of previously reported loci. We found little evidence for individual or systematic inflated association findings resulting from using public controls, demonstrating the utility of using public control data in large-scale genetic association studies of common variants. Taken together, these results demonstrate the importance of established common genetic variants for early-onset prostate cancer and the power of including early-onset prostate cancer cases in genetic association studies. PMID:24740154

This article provides an introductory overview of the investigative strategy employed to evaluate the genetic basis of 17 endophenotypes examined as part of a 20-year data collection effort from the Minnesota Center for Twin and Family Research. Included are characterization of the study samples, descriptive statistics for key properties of the psychophysiological measures, and rationale behind the steps taken in the molecular genetic study design. The statistical approach included (a) biometric analysis of twin and family data, (b) heritability analysis using 527,829 single nucleotide polymorphisms (SNPs), (c) genome-wide association analysis of these SNPs and 17,601 autosomal genes, (d) follow-up analyses of candidate SNPs and genes hypothesized to have an association with each endophenotype, (e) rare variant analysis of nonsynonymous SNPs in the exome, and (f) whole genome sequencing association analysis using 27 million genetic variants. These methods were used in the accompanying empirical articles comprising this special issue, Genome-WideScans of Genetic Variants for Psychophysiological Endophenotypes. PMID:25387703

Mutation mapping in mice can be readily accomplished by genomewide segregation analysis of polymorphic DNA markers. In this study, we showed the efficacy of Ion Torrent next generation sequencing for conducting genome-widescans to map and identify a mutation causing congenital heart disease in a mouse mutant, Bishu, recovered from a mouse mutagenesis screen. The Bishu mutant line generated in a C57BL/6J (B6) background was intercrossed with another inbred strain, C57BL/10J (B10), and the resulting B6/B10 hybrid offspring were intercrossed to generate mutants used for the mapping analysis. For each mutant sample, a panel of 123 B6/B10 polymorphic SNPs distributed throughout the mouse genome was PCR amplified, bar coded, and then pooled to generate a single library used for Ion Torrent sequencing. Sequencing carried out using the 314 chip yielded >600,000 usable reads. These were aligned and mapped using a custom bioinformatics pipeline. Each SNP was sequenced to a depth >500×, allowing accurate automated calling of the B6/B10 genotypes. This analysis mapped the mutation in Bishu to an interval on the proximal region of mouse chromosome 4. This was confirmed by parallel capillary sequencing of the 123 polymorphic SNPs. Further analysis of genes in the map interval identified a splicing mutation in Dnaic1(c.204+1G>A), an intermediate chain dynein, as the disease causing mutation in Bishu. Overall, our experience shows Ion Torrent amplicon sequencing is high throughput and cost effective for conducting genome-wide mapping analysis and is easily scalable for other high volume genotyping analyses. PMID:24306492

In population genomics studies, accounting for the neutral covariance structure across population allele frequencies is critical to improve the robustness of genome-widescan approaches. Elaborating on the BayEnv model, this study investigates several modeling extensions (i) to improve the estimation accuracy of the population covariance matrix and all the related measures, (ii) to identify significantly overly differentiated SNPs based on a calibration procedure of the XtX statistics, and (iii) to consider alternative covariate models for analyses of association with population-specific covariables. In particular, the auxiliary variable model allows one to deal with multiple testing issues and, providing the relative marker positions are available, to capture some linkage disequilibrium information. A comprehensive simulation study was carried out to evaluate the performances of these different models. Also, when compared in terms of power, robustness, and computational efficiency to five other state-of-the-art genome-scan methods (BayEnv2, BayScEnv, BayScan, flk, and lfmm), the proposed approaches proved highly effective. For illustration purposes, genotyping data on 18 French cattle breeds were analyzed, leading to the identification of 13 strong signatures of selection. Among these, four (surrounding the KITLG, KIT, EDN3, and ALB genes) contained SNPs strongly associated with the piebald coloration pattern while a fifth (surrounding PLAG1) could be associated to morphological differences across the populations. Finally, analysis of Pool-Seq data from 12 populations of Littorina saxatilis living in two different ecotypes illustrates how the proposed framework might help in addressing relevant ecological issues in nonmodel species. Overall, the proposed methods define a robust Bayesian framework to characterize adaptive genetic differentiation across populations. The BayPass program implementing the different models is available at http://www1.montpellier

Using next-generation sequencing, we conducted a genome-widescan of selective sweeps associated with selection toward genetic improvement in Thoroughbreds. We investigated potential phenotypic consequence of putative candidate loci by candidate gene association mapping for the finishing time in 240 Thoroughbred horses. We found a significant association with the trait for Ral GApase alpha 2 (RALGAP2) that regulates a variety of cellular processes of signal trafficking. Neighboring genes around RALGAP2 included insulinoma-associated 1 (INSM1), pallid (PLDN), and Ras and Rab interactor 2 (RIN2) genes have similar roles in signal trafficking, suggesting that a co-evolving gene cluster located on the chromosome 22 is under strong artificial selection in racehorses. PMID:26333666

Objective: Genes likely play a substantial role in the etiology of attention-deficit/hyperactivity disorder (ADHD). However, the genetic architecture of the disorder is unknown, and prior genome-wide association studies (GWAS) have not identified a genome-wide significant association. We have conducted a third, independent, multisite GWAS of…

Soybean oil and meal are major contributors to world-wide food production. Consequently, the genetic basis for soybean seed composition has been intensely studied using family-based mapping. Population-based mapping approaches, in the form of genome-wide association (GWA) scans, have been able to re...

We revisited 46 families with two or more siblings affected with an orofacial cleft that participated in previous genomewide studies and collected complete dental information. Genotypes from 392 microsatellite markers at 10 cM intervals were reanalyzed. We carried out four sets of genomewide analyses. First, we ran the analysis solely on the cleft status. Second, we assigned to any dental anomaly (tooth agenesis, supernumerary teeth, and microdontia) an affection status, and repeated the analysis. Third, we ran only the 19 families where the proband had a cleft with no dental anomalies. Finally, we ran only the 27 families that had a proband with cleft and additional dental anomalies outside the cleft area. Chromosomes (1, 2, 6, 8, 16, and 19) presented regions with LOD scores >2.0. Chromosome 19 has the most compelling results in our study. The LOD scores increased from 3.11 (in the scan of all 46 families with clefts as the only assigned affection status) to 3.91 when the 19 families whose probands present with no additional dental anomalies were studied, suggesting the interval 19p13.12-19q12 may contain a gene that contributes to clefts but not to dental anomalies. On the other hand, we found a LOD score of 3.00 in the 2q22.3 region when dental anomalies data were added to the analysis to define affection status. Our preliminary results support the hypothesis that some loci may contribute to both clefts and congenital dental anomalies. Also, adding dental anomalies information will provide new opportunities to map susceptibility loci for clefts. PMID:18442096

Iron overload phenotypes in persons with and without hemochromatosis are variable. To investigate this further, probands with hemochromatosis or evidence of elevated iron stores and their family members were recruited for a genome-wide linkage scan to identify potential quantitative trait loci (QTL) that contribute to variation in transferrin saturation (TS), unsaturated iron-binding capacity (UIBC), and serum ferritin (SF). Genotyping utilized 402 microsatellite markers with average spacing of 9 cM. A total of 943 individuals, 64% Caucasian, were evaluated from 174 families. After adjusting for age, gender, and race/ethnicity, there was evidence for linkage of UIBC to chromosome 4q logarithm of the odds (LOD) = 2.08, p = 0.001) and of UIBC (LOD = 9.52), TS (LOD = 4.78), and SF (LOD = 2.75) to the chromosome 6p region containing HFE (each p < 0.0001). After adjustments for HFE genotype and other covariates, there was evidence of linkage of SF to chromosome 16p (LOD = 2.63, p = 0.0007) and of UIBC to chromosome 5q (LOD = 2.12, p = 0.002) and to chromosome 17q (LOD = 2.19, p = 0.002). We conclude that these regions should be considered for fine mapping studies to identify QTL that contribute to variation in SF and UIBC. PMID:17539901

Personality traits are summarized by five broad dimensions with pervasive influences on major life outcomes, strong links to psychiatric disorders, and clear heritable components. To identify genetic variants associated with each of the five dimensions of personality we performed a genomewide association (GWA) scan of 3,972 individuals from a genetically isolated population within Sardinia, Italy. Based on analyses of 362,129 single nucleotide polymorphisms (SNPs) we found several strong signals within or near genes previously implicated in psychiatric disorders. They include the association of Neuroticism with SNAP25 (rs362584, P = 5 × 10−5), Extraversion with BDNF and two cadherin genes (CDH13 and CDH23; Ps < 5 × 10−5), Openness with CNTNAP2 (rs10251794, P = 3 × 10−5), Agreeableness with CLOCK (rs6832769, P = 9 × 10−6), and Conscientiousness with DYRK1A (rs2835731, P = 3 × 10−5). Effect sizes were small (less than 1% of variance), and most failed to replicate in the follow-up independent samples (N up to 3,903), though the association between Agreeableness and CLOCK was supported in two of three replication samples (overall P = 2 × 10−5). We infer that a large number of loci may influence personality traits and disorders, requiring larger sample sizes for the GWA approach to identify significant genetic variants. PMID:18957941

A genome-widescanning of Sorghum bicolor resulted in the identification of 25 SbHsf genes. Phylogenetic analysis shows the ortholog genes that are clustered with only rice, representing a common ancestor. Promoter analysis revealed the identification of different cis-acting elements that are responsible for abiotic as well as biotic stresses. Hsf domains like DBD, NLS, NES, and AHA have been analyzed for their sequence similarity and functional characterization. Tissue specific expression patterns of Hsfs in different tissues like mature embryo, seedling, root, and panicle were studied using real-time PCR. While Hsfs4 and 22 are highly expressed in panicle, 4 and 9 are expressed in seedlings. Sorghum plants were exposed to different abiotic stress treatments but no expression of any Hsf was observed when seedlings were treated with ABA. High level expression of Hsf1 was noticed during high temperature as well as cold stresses, 4 and 6 during salt and 5, 6, 10, 13, 19, 23 and 25 during drought stress. This comprehensive analysis of SbHsf genes will provide an insight on how these genes are regulated in different tissues and also under different abiotic stresses and help to determine the functions of Hsfs during drought and temperature stress tolerance. PMID:27006630

A genome-widescanning of Sorghum bicolor resulted in the identification of 25 SbHsf genes. Phylogenetic analysis shows the ortholog genes that are clustered with only rice, representing a common ancestor. Promoter analysis revealed the identification of different cis-acting elements that are responsible for abiotic as well as biotic stresses. Hsf domains like DBD, NLS, NES, and AHA have been analyzed for their sequence similarity and functional characterization. Tissue specific expression patterns of Hsfs in different tissues like mature embryo, seedling, root, and panicle were studied using real-time PCR. While Hsfs4 and 22 are highly expressed in panicle, 4 and 9 are expressed in seedlings. Sorghum plants were exposed to different abiotic stress treatments but no expression of any Hsf was observed when seedlings were treated with ABA. High level expression of Hsf1 was noticed during high temperature as well as cold stresses, 4 and 6 during salt and 5, 6, 10, 13, 19, 23 and 25 during drought stress. This comprehensive analysis of SbHsf genes will provide an insight on how these genes are regulated in different tissues and also under different abiotic stresses and help to determine the functions of Hsfs during drought and temperature stress tolerance. PMID:27006630

OBJECTIVE— Genome-wide association scans (GWASs) have identified novel diabetes-associated genes. We evaluated how these variants impact diabetes incidence, quantitative glycemic traits, and response to preventive interventions in 3,548 subjects at high risk of type 2 diabetes enrolled in the Diabetes Prevention Program (DPP), which examined the effects of lifestyle intervention, metformin, and troglitazone versus placebo. RESEARCH DESIGN AND METHODS— We genotyped selected single nucleotide polymorphisms (SNPs) in or near diabetes-associated loci, including EXT2, CDKAL1, CDKN2A/B, IGF2BP2, HHEX, LOC387761, and SLC30A8 in DPP participants and performed Cox regression analyses using genotype, intervention, and their interactions as predictors of diabetes incidence. We evaluated their effect on insulin resistance and secretion at 1 year. RESULTS— None of the selected SNPs were associated with increased diabetes incidence in this population. After adjustments for ethnicity, baseline insulin secretion was lower in subjects with the risk genotype at HHEX rs1111875 (P = 0.01); there were no significant differences in baseline insulin sensitivity. Both at baseline and at 1 year, subjects with the risk genotype at LOC387761 had paradoxically increased insulin secretion; adjustment for self-reported ethnicity abolished these differences. In ethnicity-adjusted analyses, we noted a nominal differential improvement in β-cell function for carriers of the protective genotype at CDKN2A/B after 1 year of troglitazone treatment (P = 0.01) and possibly lifestyle modification (P = 0.05). CONCLUSIONS— We were unable to replicate the GWAS findings regarding diabetes risk in the DPP. We did observe genotype associations with differences in baseline insulin secretion at the HHEX locus and a possible pharmacogenetic interaction at CDKNA2/B. PMID:18544707

Genetic tests for beef tenderness are currently limited to single nucleotide polymorphisms (SNPs) within µ-calpain (CAPN1) and calpastatin (CAST) and explain little of the phenotypic variation in Warner-Bratzler shear force (WBSF). We performed a genome-wide association study for WBSF by genotyping...

Summary Although autism is a highly heritable neurodevelopmental disorder, attempts to identify specific susceptibility genes have thus far met with limited success 1. Genome-wide association studies (GWAS) using half a million or more markers, particularly those with very large sample sizes achieved through meta-analysis, have shown great success in mapping genes for other complex genetic traits (http://www.genome.gov/26525384). Consequently, we initiated a linkage and association mapping study using half a million genome-wide SNPs in a common set of 1,031 multiplex autism families (1,553 affected offspring). We identified regions of suggestive and significant linkage on chromosomes 6q27 and 20p13, respectively. Initial analysis did not yield genome-wide significant associations; however, genotyping of top hits in additional families revealed a SNP on chromosome 5p15 (between SEMA5A and TAS2R1) that was significantly associated with autism (P = 2 × 10−7). We also demonstrated that expression of SEMA5A is reduced in brains from autistic patients, further implicating SEMA5A as an autism susceptibility gene. The linkage regions reported here provide targets for rare variation screening while the discovery of a single novel association demonstrates the action of common variants. PMID:19812673

Breastfeeding has been an important survival trait during human history, though it has long been recognised that individuals differ in their exact breastfeeding behaviour. Here our aims were, first, to explore to what extent genetic and environmental influences contributed to the individual differences in breastfeeding behaviour; second, to detect possible genetic variants related to breastfeeding; and lastly, to test if the genetic variants associated with breastfeeding have been previously found to be related with breast size. Data were collected from a large community-based cohort of Australian twins, with 3,364 women for the twin modelling analyses and 1,521 of them included in the genomewide association study. Monozygotic twin correlations (rMZ = .52, 95% CI .46 – .57) were larger than dizygotic twin correlations (rDZ = .35, 95% CI .25 – .43) and the best-fitting model was the one composed by additive genetics and unique environmental factors, explaining 53% and 47% of the variance in breastfeeding behaviour, respectively. No breastfeeding-related genetic variants reached genome-wide significance. The polygenic risk score analyses showed no significant results, suggesting breast size does not influence breastfeeding. This study confers a replication of a previous one exploring the sources of variance of breastfeeding and, to our knowledge, is the first one to conduct a Genome-Wide Association Study on breastfeeding and look at the overlap with variants for breast size. PMID:25475840

Background and Purpose Ischemic stroke has a strong familial component to risk. The Siblings with Ischemic Stroke Study (SWISS) is a genome-wide family-based analysis that included use of imputed genotypes. SWISS was conducted to examine associations between SNPs and risk of stroke and stroke subtypes within pairs. Methods SWISS enrolled 312 probands with ischemic stroke across 70 US and Canadian centers. Affected siblings were ascertained by centers and confirmed by central record review; unaffected siblings were ascertained by telephone contact. Ischemic stroke was subtyped using TOAST criteria. Genotyping was performed using an Illumina 610 quad array (probands) and an Illumina linkage V array (affected siblings). SNPs were imputed using 1000 Genomes Project data and MACH software. Family-based association analyses were conducted using the sibling-transmission disequilibrium test. Results For all pairs, the correlation of age at stroke within pairs of affected siblings was r = 0.83 (95%CI, 0.78 to 0.86; P < 2.2×10−16). The correlation did not differ substantially by subtype. The concordance of stroke subtypes among affected pairs was 33.8% (kappa = 0.13; P = 5.06×10−4) and did not differ by age at stroke in the proband. Although no SNP achieved genome-wide significance for risk of ischemic stroke, there was clustering of the most associated SNPs on chromosomes 3p (NOS1) and 6p. Conclusions Stroke subtype and age at stroke in affected sibling pairs exhibit significant clustering. No individual SNP reached genome-wide significance. However, two promising candidate loci were identified, including one that contains NOS1, though these risk loci warrant further examination in larger sample collections. PMID:21940970

We report a genome-wide association scan in over 6,000 Latin Americans for features of scalp hair (shape, colour, greying, balding) and facial hair (beard thickness, monobrow, eyebrow thickness). We found 18 signals of association reaching genome-wide significance (P values 5 × 10−8 to 3 × 10−119), including 10 novel associations. These include novel loci for scalp hair shape and balding, and the first reported loci for hair greying, monobrow, eyebrow and beard thickness. A newly identified locus influencing hair shape includes a Q30R substitution in the Protease Serine S1 family member 53 (PRSS53). We demonstrate that this enzyme is highly expressed in the hair follicle, especially the inner root sheath, and that the Q30R substitution affects enzyme processing and secretion. The genome regions associated with hair features are enriched for signals of selection, consistent with proposals regarding the evolution of human hair. PMID:26926045

We report a genome-wide association scan in over 6,000 Latin Americans for features of scalp hair (shape, colour, greying, balding) and facial hair (beard thickness, monobrow, eyebrow thickness). We found 18 signals of association reaching genome-wide significance (P values 5 × 10(-8) to 3 × 10(-119)), including 10 novel associations. These include novel loci for scalp hair shape and balding, and the first reported loci for hair greying, monobrow, eyebrow and beard thickness. A newly identified locus influencing hair shape includes a Q30R substitution in the Protease Serine S1 family member 53 (PRSS53). We demonstrate that this enzyme is highly expressed in the hair follicle, especially the inner root sheath, and that the Q30R substitution affects enzyme processing and secretion. The genome regions associated with hair features are enriched for signals of selection, consistent with proposals regarding the evolution of human hair. PMID:26926045

Objective . Genes likely play a substantial role in the etiology of attention-deficit hyperactivity disorder (ADHD). However, the genetic architecture of the disorder is unknown, and prior genome-wide association studies have not identified a genome-wide significant association. We have conducted a third, independent multi-site GWAS of DSM-IV-TR ADHD. Method . Families were ascertained at Massachusetts General Hospital (MGH, N=309 trios), Washington University at St Louis (WASH-U, N=272 trios), and University of California at Los Angeles (UCLA, N=156 trios). Genotyping was conducted with the Illumina Human1M or Human1M-Duo BeadChip platforms. After applying quality control filters, association with ADHD was tested with 835,136 SNPs in 735 DSM-IV ADHD trios from 732 families. Results . Our smallest p-value (6.7E-07) did not reach the threshold for genome-wide statistical significance (5.0E-08) but one of the 20 most significant associations was located in a candidate gene of interest for ADHD, (SLC9A9, rs9810857, p=6.4E-6). We also conducted gene-based tests of candidate genes identified in the literature and found additional evidence of association with SLC9A9. Conclusion . We and our colleagues in the Psychiatric GWAS Consortium are working to pool together GWAS samples to establish the large data sets needed to follow-up on these results and to identify genes for ADHD and other disorders. PMID:20732626

Rapidly evolving viruses and other pathogens can have an immense impact on human evolution as natural selection acts to increase the prevalence of genetic variants providing resistance to disease. With the emergence of large datasets of human genetic variation, we can search for signatures of natural selection in the human genome driven by such disease-causing microorganisms. Based on this approach, we have previously hypothesized that Lassa virus (LASV) may have been a driver of natural selection in West African populations where Lassa haemorrhagic fever is endemic. In this study, we provide further evidence for this notion. By applying tests for selection to genome-wide data from the International Haplotype Map Consortium and the 1000 Genomes Consortium, we demonstrate evidence for positive selection in LARGE and interleukin 21 (IL21), two genes implicated in LASV infectivity and immunity. We further localized the signals of selection, using the recently developed composite of multiple signals method, to introns and putative regulatory regions of those genes. Our results suggest that natural selection may have targeted variants giving rise to alternative splicing or differential gene expression of LARGE and IL21. Overall, our study supports the hypothesis that selective pressures imposed by LASV may have led to the emergence of particular alleles conferring resistance to Lassa fever, and opens up new avenues of research pursuit. PMID:22312054

Rapidly evolving viruses and other pathogens can have an immense impact on human evolution as natural selection acts to increase the prevalence of genetic variants providing resistance to disease. With the emergence of large datasets of human genetic variation, we can search for signatures of natural selection in the human genome driven by such disease-causing microorganisms. Based on this approach, we have previously hypothesized that Lassa virus (LASV) may have been a driver of natural selection in West African populations where Lassa haemorrhagic fever is endemic. In this study, we provide further evidence for this notion. By applying tests for selection to genome-wide data from the International Haplotype Map Consortium and the 1000 Genomes Consortium, we demonstrate evidence for positive selection in LARGE and interleukin 21 (IL21), two genes implicated in LASV infectivity and immunity. We further localized the signals of selection, using the recently developed composite of multiple signals method, to introns and putative regulatory regions of those genes. Our results suggest that natural selection may have targeted variants giving rise to alternative splicing or differential gene expression of LARGE and IL21. Overall, our study supports the hypothesis that selective pressures imposed by LASV may have led to the emergence of particular alleles conferring resistance to Lassa fever, and opens up new avenues of research pursuit. PMID:22312054

Hereditary underdevelopment of the ear, a condition also known as microtia, has been observed in several sheep breeds as well as in humans and other species. Its genetic basis in sheep is unknown. The Awassi sheep, a breed native to southwest Asia, carries this phenotype and was targeted for molecular characterization via a genome-wide association study. DNA samples were collected from sheep in Jordan. Eight affected and 12 normal individuals were genotyped with the Illumina OvineSNP50(®) chip. Multilocus analyses failed to identify any genotypic association. In contrast, a single-locus analysis revealed a statistically significant association (P = 0.012, genome-wide) with a SNP at basepair 34 647 499 on OAR23. This marker is adjacent to the gene encoding transcription factor GATA-6, which has been shown to play a role in many developmental processes, including chondrogenesis. The lack of extended homozygosity in this region suggests a fairly ancient mutation, and the time of occurrence was estimated to be approximately 3000 years ago. Many of the earless sheep breeds may thus share the causative mutation, especially within the subgroup of fat-tailed, wool sheep. PMID:26990958

Context: Age at menarche (AAM) is an important trait both biologically and socially, a clearly defined event in female pubertal development, and has been associated with many clinically significant phenotypes. Objective: The objective of the study was to identify genetic loci influencing variation in AAM in large population-based samples from three countries. Design/Participants: Recalled AAM data were collected from 13,697 individuals and 4,899 pseudoindependent sister-pairs from three different populations (Australia, The Netherlands, and the United Kingdom) by mailed questionnaire or interview. Genome-wide variance components linkage analysis was implemented on each sample individually and in combination. Results: The mean, sd, and heritability of AAM across the three samples was 13.1 yr, 1.5 yr, and 0.69, respectively. No loci were detected that reached genome-wide significance in the combined analysis, but a suggestive locus was detected on chromosome 12 (logarithm of the odds = 2.0). Three loci of suggestive significance were seen in the U.K. sample on chromosomes 1, 4, and 18 (logarithm of the odds = 2.4, 2.2 and 3.2, respectively). Conclusions: There was no evidence for common highly penetrant variants influencing AAM. Linkage and association suggest that one trait locus for AAM is located on chromosome 12, but further studies are required to replicate these results. PMID:18647812

Background Since the times of domestication, cattle have been continually shaped by the influence of humans. Relatively recent history, including breed formation and the still enduring enormous improvement of economically important traits, is expected to have left distinctive footprints of selection within the genome. The purpose of this study was to map genome-wide selection signatures in ten cattle breeds and thus improve the understanding of the genome response to strong artificial selection and support the identification of the underlying genetic variants of favoured phenotypes. We analysed 47,651 single nucleotide polymorphisms (SNP) using Cross Population Extended Haplotype Homozygosity (XP-EHH). Results We set the significance thresholds using the maximum XP-EHH values of two essentially artificially unselected breeds and found up to 229 selection signatures per breed. Through a confirmation process we verified selection for three distinct phenotypes typical for one breed (polledness in Galloway, double muscling in Blanc-Bleu Belge and red coat colour in Red Holstein cattle). Moreover, we detected six genes strongly associated with known QTL for beef or dairy traits (TG, ABCG2, DGAT1, GH1, GHR and the Casein Cluster) within selection signatures of at least one breed. A literature search for genes lying in outstanding signatures revealed further promising candidate genes. However, in concordance with previous genome-wide studies, we also detected a substantial number of signatures without any yet known gene content. Conclusions These results show the power of XP-EHH analyses in cattle to discover promising candidate genes and raise the hope of identifying phenotypically important variants in the near future. The finding of plausible functional candidates in some short signatures supports this hope. For instance, MAP2K6 is the only annotated gene of two signatures detected in Galloway and Gelbvieh cattle and is already known to be associated with carcass

Although autism spectrum disorders (ASDs) have a substantial genetic basis, most of the known genetic risk has been traced to rare variants, principally copy number variants (CNVs). To identify common risk variation, the Autism Genome Project (AGP) Consortium genotyped 1558 rigorously defined ASD families for 1 million single-nucleotide polymorphisms (SNPs) and analyzed these SNP genotypes for association with ASD. In one of four primary association analyses, the association signal for marker rs4141463, located within MACROD2, crossed the genome-wide association significance threshold of P < 5 × 10(-8). When a smaller replication sample was analyzed, the risk allele at rs4141463 was again over-transmitted; yet, consistent with the winner's curse, its effect size in the replication sample was much smaller; and, for the combined samples, the association signal barely fell below the P < 5 × 10(-8) threshold. Exploratory analyses of phenotypic subtypes yielded no significant associations after correction for multiple testing. They did, however, yield strong signals within several genes, KIAA0564, PLD5, POU6F2, ST8SIA2 and TAF1C. PMID:20663923

Although autism spectrum disorders (ASDs) have a substantial genetic basis, most of the known genetic risk has been traced to rare variants, principally copy number variants (CNVs). To identify common risk variation, the Autism Genome Project (AGP) Consortium genotyped 1558 rigorously defined ASD families for 1 million single-nucleotide polymorphisms (SNPs) and analyzed these SNP genotypes for association with ASD. In one of four primary association analyses, the association signal for marker rs4141463, located within MACROD2, crossed the genome-wide association significance threshold of P < 5 × 10−8. When a smaller replication sample was analyzed, the risk allele at rs4141463 was again over-transmitted; yet, consistent with the winner's curse, its effect size in the replication sample was much smaller; and, for the combined samples, the association signal barely fell below the P < 5 × 10−8 threshold. Exploratory analyses of phenotypic subtypes yielded no significant associations after correction for multiple testing. They did, however, yield strong signals within several genes, KIAA0564, PLD5, POU6F2, ST8SIA2 and TAF1C. PMID:20663923

The molecular mechanisms involved in the development of type 2 diabetes are poorly understood. Starting from genome-wide genotype data for 1,924 diabetic cases and 2,938 population controls generated by the Wellcome Trust Case Control Consortium, we set out to detect replicated diabetes association signals through analysis of 3,757 additional cases and 5,346 controls, and by integration of our findings with equivalent data from other international consortia. We detected diabetes susceptibility loci in and around the genes CDKAL1, CDKN2A/CDKN2B and IGF2BP2 and confirmed the recently described associations at HHEX/IDE and SLC30A8. Our findings provide insights into the genetic architecture of type 2 diabetes, emphasizing the contribution of multiple variants of modest effect. The regions identified underscore the importance of pathways influencing pancreatic beta cell development and function in the etiology of type 2 diabetes. PMID:17463249

Introduction Breast cancer is a heterogeneous disease and may be characterized on the basis of whether estrogen receptors (ER) are expressed in the tumour cells. ER status of breast cancer is important clinically, and is used both as a prognostic indicator and treatment predictor. In this study, we focused on identifying genetic markers associated with ER-negative breast cancer risk. Methods We conducted a genome-wide association analysis of 285,984 single nucleotide polymorphisms (SNPs) genotyped in 617 ER-negative breast cancer cases and 4,583 controls. We also conducted a genome-wide pathway analysis on the discovery dataset using permutation-based tests on pre-defined pathways. The extent of shared polygenic variation between ER-negative and ER-positive breast cancers was assessed by relating risk scores, derived using ER-positive breast cancer samples, to disease state in independent, ER-negative breast cancer cases. Results Association with ER-negative breast cancer was not validated for any of the five most strongly associated SNPs followed up in independent studies (1,011 ER-negative breast cancer cases, 7,604 controls). However, an excess of small P-values for SNPs with known regulatory functions in cancer-related pathways was found (global P = 0.052). We found no evidence to suggest that ER-negative breast cancer shares a polygenic basis to disease with ER-positive breast cancer. Conclusions ER-negative breast cancer is a distinct breast cancer subtype that merits independent analyses. Given the clinical importance of this phenotype and the likelihood that genetic effect sizes are small, greater sample sizes and further studies are required to understand the etiology of ER-negative breast cancers. PMID:21062454

Background Over 90% of adults aged 20 years or older with permanent teeth have suffered from dental caries leading to pain, infection, or even tooth loss. Although caries prevalence has decreased over the past decade, there are still about 23% of dentate adults who have untreated carious lesions in the US. Dental caries is a complex disorder affected by both individual susceptibility and environmental factors. Approximately 35-55% of caries phenotypic variation in the permanent dentition is attributable to genes, though few specific caries genes have been identified. Therefore, we conducted the first genome-wide association study (GWAS) to identify genes affecting susceptibility to caries in adults. Methods Five independent cohorts were included in this study, totaling more than 7000 participants. For each participant, dental caries was assessed and genetic markers (single nucleotide polymorphisms, SNPs) were genotyped or imputed across the entire genome. Due to the heterogeneity among the five cohorts regarding age, genotyping platform, quality of dental caries assessment, and study design, we first conducted genome-wide association (GWA) analyses on each of the five independent cohorts separately. We then performed three meta-analyses to combine results for: (i) the comparatively younger, Appalachian cohorts (N = 1483) with well-assessed caries phenotype, (ii) the comparatively older, non-Appalachian cohorts (N = 5960) with inferior caries phenotypes, and (iii) all five cohorts (N = 7443). Top ranking genetic loci within and across meta-analyses were scrutinized for biologically plausible roles on caries. Results Different sets of genes were nominated across the three meta-analyses, especially between the younger and older age cohorts. In general, we identified several suggestive loci (P-value ≤ 10E-05) within or near genes with plausible biological roles for dental caries, including RPS6KA2 and PTK2B, involved in p38-depenedent MAPK signaling

A time-to-onset analysis for family-based samples was performed on the genomewide association (GWAS) data for attention deficit hyperactivity disorder (ADHD) to determine if associations exist with the age at onset of ADHD. The initial dataset consisted of 958 parent-offspring trios that were genotyped on the Perlegen 600,000 SNP array. After data cleaning procedures, 429,981 autosomal SNPs and 930 parent-offspring trios were used found suitable for use and a family-based logrank analysis was performed using that age at first ADHD symptoms as the quantitative trait of interest. No SNP achieved genome-wide significance, and the lowest P-values had a magnitude of 10(-7). Several SNPs among a pre-specified list of candidate genes had nominal associations including SLC9A9, DRD1, ADRB2, SLC6A3, NFIL3, ADRB1, SYT1, HTR2A, ARRB2, and CHRNA4. Of these findings SLC9A9 stood out as a promising candidate, with nominally significant SNPs in six distinct regions of the gene. PMID:18937294

Metastasis is the leading cause of death in osteosarcoma patients, the most common pediatric bone malignancy. We conducted a multi-stage genome-wide association study of osteosarcoma metastasis at diagnosis in 935 osteosarcoma patients to determine whether germline genetic variation contributes to risk of metastasis. We identified a SNP, rs7034162, in NFIB significantly associated with metastasis in European osteosarcoma cases, as well as in cases of African and Brazilian ancestry (meta-analysis of all cases: P=1.2×10−9, OR 2.43, 95% CI 1.83–3.24). The risk allele was significantly associated with lowered NFIB expression, which led to increased osteosarcoma cell migration, proliferation, and colony formation. Additionally, a transposon screen in mice identified a significant proportion of osteosarcomas harboring inactivating insertions in Nfib, and had lowered Nfib expression. These data suggest that germline genetic variation at rs7034162 is important in osteosarcoma metastasis, and that NFIB is an osteosarcoma metastasis susceptibility gene. PMID:26084801

Brazilian Nelore cattle have been selected for growth traits over more than four decades. In recent years, reproductive and meat quality traits have become more important because of increasing consumption, exports and consumer demand. The identification of genomic regions altered by artificial selec...

Since their divergence from the terrestrial artiodactyls, cetaceans have fully adapted to an aquatic lifestyle, which represents one of the most dramatic transformations in mammalian evolutionary history. Numerous morphological and physiological characters of cetaceans have been acquired in response to this drastic habitat transition, such as thickened blubber, echolocation, and ability to hold their breath for a long period of time. However, knowledge about the molecular basis underlying these adaptations is still limited. The sequence of the genome of Tursiops truncates provides an opportunity for a comparative genomic analyses to examine the molecular adaptation of this species. Here, we constructed 11,838 high-quality orthologous gene alignments culled from the dolphin and four other terrestrial mammalian genomes and screened for positive selection occurring in the dolphin lineage. In total, 368 (3.1%) of the genes were identified as having undergone positive selection by the branch-site model. Functional characterization of these genes showed that they are significantly enriched in the categories of lipid transport and localization, ATPase activity, sense perception of sound, and muscle contraction, areas that are potentially related to cetacean adaptations. In contrast, we did not find a similar pattern in the cow, a closely related species. We resequenced some of the positively selected sites (PSSs), within the positively selected genes, and showed that most of our identified PSSs (50/52) could be replicated. The results from this study should have important implications for our understanding of cetacean evolution and their adaptations to the aquatic environment. PMID:23246795

Background In tropical countries, losses caused by bovine tick Rhipicephalus (Boophilus) microplus infestation have a tremendous economic impact on cattle production systems. Genetic variation between Bos taurus and Bos indicus to tick resistance and molecular biology tools might allow for the identification of molecular markers linked to resistance traits that could be used as an auxiliary tool in selection programs. The objective of this work was to identify QTL associated with tick resistance/susceptibility in a bovine F2 population derived from the Gyr (Bos indicus) × Holstein (Bos taurus) cross. Results Through a whole genomescan with microsatellite markers, we were able to map six genomic regions associated with bovine tick resistance. For most QTL, we have found that depending on the tick evaluation season (dry and rainy) different sets of genes could be involved in the resistance mechanism. We identified dry season specific QTL on BTA 2 and 10, rainy season specific QTL on BTA 5, 11 and 27. We also found a highly significant genomewide QTL for both dry and rainy seasons in the central region of BTA 23. Conclusions The experimental F2 population derived from Gyr × Holstein cross successfully allowed the identification of six highly significant QTL associated with tick resistance in cattle. QTL located on BTA 23 might be related with the bovine histocompatibility complex. Further investigation of these QTL will help to isolate candidate genes involved with tick resistance in cattle. PMID:20433753

Although there is considerable evidence that individual differences in language development are highly heritable, there have been few genome-widescans to locate genes associated with the trait. Previous analyses of language impairment have yielded replicable evidence for linkage to regions on chromosomes 16q, 19q, 13q (within lab) and at 13q (between labs). Here we report the first linkage study to screen the continuum of language ability, from normal to disordered, as found in the general population. 383 children from 147 sib-ships (214 sib-pairs) were genotyped on the Illumina® Linkage IVb Marker Panel using three composite language-related phenotypes and a measure of phonological memory (PM). Two regions (10q23.33; 13q33.3) yielded genome-wide significant peaks for linkage with PM. A peak suggestive of linkage was also found at 17q12 for the overall language composite. This study presents two novel genetic loci for the study of language development and disorders, but fails to replicate findings by previous groups. Possible reasons for this are discussed. PMID:25997078

The extent of recent selection in admixed populations is currently an unresolved question. We scanned the genomes of 29,141 African Americans and failed to find any genome-wide-significant deviations in local ancestry, indicating no evidence of selection influencing ancestry after admixture. A recent analysis of data from 1,890 African Americans reported that there was evidence of selection in African Americans after their ancestors left Africa, both before and after admixture. Selection after admixture was reported on the basis of deviations in local ancestry, and selection before admixture was reported on the basis of allele-frequency differences between African Americans and African populations. The local-ancestry deviations reported by the previous study did not replicate in our very large sample, and we show that such deviations were expected purely by chance, given the number of hypotheses tested. We further show that the previous study's conclusion of selection in African Americans before admixture is also subject to doubt. This is because the FST statistics they used were inflated and because true signals of unusual allele-frequency differences between African Americans and African populations would be best explained by selection that occurred in Africa prior to migration to the Americas. PMID:25242497

Soybean oil and meal are major contributors to world-wide food production. Consequently, the genetic basis for soybean seed composition has been intensely studied using family-based mapping. Population-based mapping approaches, in the form of genome-wide association (GWA) scans, have been able to resolve loci controlling moderately complex quantitative traits (QTL) in numerous crop species. Yet, it is still unclear how soybean’s unique population history will affect GWA scans. Using one of the populations in this study, we simulated phenotypes resulting from a range of genetic architectures. We found that with a heritability of 0.5, ∼100% and ∼33% of the 4 and 20 simulated QTL can be recovered, respectively, with a false-positive rate of less than ∼6×10−5 per marker tested. Additionally, we demonstrated that combining information from multi-locus mixed models and compressed linear-mixed models improves QTL identification and interpretation. We applied these insights to exploring seed composition in soybean, refining the linkage group I (chromosome 20) protein QTL and identifying additional oil QTL that may allow some decoupling of highly correlated oil and protein phenotypes. Because the value of protein meal is closely related to its essential amino acid profile, we attempted to identify QTL underlying methionine, threonine, cysteine, and lysine content. Multiple QTL were found that have not been observed in family-based mapping studies, and each trait exhibited associations across multiple populations. Chromosomes 1 and 8 contain strong candidate alleles for essential amino acid increases. Overall, we present these and additional data that will be useful in determining breeding strategies for the continued improvement of soybean’s nutrient portfolio. PMID:25246241

A novel method for calculating the surface shapes for subreflectors in a suboptic assembly of a tri-reflector spherical antenna system is introduced, modeled from a generalization of Galindo-Israel's method of solving partial differential equations to correct for spherical aberration and provide uniform feed to aperture mapping. In a first embodiment, the suboptic assembly moves as a single unit to achieve scan while the main reflector remains stationary. A feed horn is tilted during scan to maintain the illuminated area on the main spherical reflector fixed throughout the scan thereby eliminating the need to oversize the main spherical reflector. In an alternate embodiment, both the main spherical reflector and the suboptic assembly are fixed. A flat mirror is used to create a virtual image of the suboptic assembly. Scan is achieved by rotating the mirror about the spherical center of the main reflector. The feed horn is tilted during scan to maintain the illuminated area on the main spherical reflector fixed throughout the scan.

Cytokines are major immune system regulators. Previously, innate cytokine profiles determined by lipopolysaccharide stimulation were shown to be highly heritable. To identify regulating genes in innate immunity, we analyzed data from a genome-wide linkage scan using microsatellites in osteoarthritis (OA) patients (The GARP study) and their innate cytokine data on interleukin (IL)-1β, IL-1Ra, IL-10 and tumor necrosis factor (TNF)α. A confirmation cohort consisted of the Leiden 85-Plus study. In this study, a linkage analysis was followed by manual selection of candidate genes in linkage regions showing LOD scores over 2.5. An single-nucleotide polymorphism (SNP) gene tagging method was applied to select SNPs on the basis of the highest level of gene tagging and possible functional effects. QTDT was used to identify the SNPs associated with innate cytokine production. Initial association signals were modeled by a linear mixed model. Through these analyses, we identified 10 putative genes involved in the regulation of TNFα. SNP rs6679497 in gene CD53 showed significant association with TNFα levels (P=0.001). No association of this SNP was observed with OA. A novel gene involved in the innate immune response of TNFα is identified. Genetic variation in this gene may have a role in diseases and disorders in which TNFα is closely involved. PMID:20407468

Generalized epilepsy with febrile seizure plus (GEFS+) is an autosomal dominant disorder. In the literature, 5 responsible genes were identified and 2 novel susceptibility loci for GEFS+ at 2p24 and 8p23-p21 were reported, indicating the genetic heterogeneity of this disorder. The aim of this report is to identify the responsible loci in a large affected Tunisian family by performing a 10cM density genome-widescan. The highest multipoint logarithm of odds (LOD) score (1.04) was found for D5S407 in the absence of recombination. Two other interesting regions were found around marker D19S210 (LOD=0.799) and D7S484 (LOD=0.61) markers. To fine map these loci, additional markers in 2 regions on 5q13.3 and 7p14.2 were analyzed and positive LOD scores for both loci were obtained. Sequencing of the Sodium channel subunit beta-1 gene (SCN1B) (19q13.1) showed the absence of any causal mutation. Our findings emphasized the genetic heterogeneity of febrile seizures. PMID:20382841

We conducted a genome-widescan for visceral leishmaniasis in mixed-breed dogs from a highly endemic area in Brazil using 149,648 single nucleotide polymorphism (SNP) markers genotyped in 20 cases and 28 controls. Using a mixed model approach, we found two candidate loci on canine autosomes 1 and 2....

A genome-widescan for chromosomal regions influencing carcass traits was conducted spanning 2.497 Morgans on 29 bovine autosomes using 170 microsatellite markers. There were 151 progeny from a single Hereford x composite bull produced by backcross matings to both Hereford and composite dams. Cattl...

To balance the demand for uptake of essential elements with their potential toxicity living cells have complex regulatory mechanisms. Here, we describe a genome-wide screen to identify genes that impact the elemental composition (‘ionome’) of yeast Saccharomyces cerevisiae. Using inductively coupled...

Background Genomic regions controlling abdominal fatness (AF) were studied in the Northeast Agricultural University broiler line divergently selected for AF. In this study, the chicken 60KSNP chip and extended haplotype homozygosity (EHH) test were used to detect genome-wide signatures of AF. Results A total of 5357 and 5593 core regions were detected in the lean and fat lines, and 51 and 57 reached a significant level (P<0.01), respectively. A number of genes in the significant core regions, including RB1, BBS7, MAOA, MAOB, EHBP1, LRP2BP, LRP1B, MYO7A, MYO9A and PRPSAP1, were detected. These genes may be important for AF deposition in chickens. Conclusions We provide a genome-wide map of selection signatures in the chicken genome, and make a contribution to the better understanding the mechanisms of selection for AF content in chickens. The selection for low AF in commercial breeding using this information will accelerate the breeding progress. PMID:23241142

Our genetic diversity study uses microsatellites of known map position to estimate genome level population structure and linkage disequilibrium, and to identify genomic regions that have undergone selection during watermelon domestication and improvement. Thirty regions that showed evidence of selective sweep were scanned for the presence of candidate genes using the watermelon genome browser (www.icugi.org). We localized selective sweeps in intergenic regions, close to the promoters, and within the exons and introns of various genes. This study provided an evidence of convergent evolution for the presence of diverse ecotypes with special reference to American and European ecotypes. Our search for location of linked markers in the whole-genome draft sequence revealed that BVWS00358, a GA repeat microsatellite, is the GAGA type transcription factor located in the 5' untranslated regions of a structure and insertion element that expresses a Cys2His2 Zinc finger motif, with presumed biological processes related to chitin response and transcriptional regulation. In addition, BVWS01708, an ATT repeat microsatellite, located in the promoter of a DTW domain-containing protein (Cla002761); and 2 other simple sequence repeats that association mapping link to fruit length and rind thickness. PMID:25425675

Our genetic diversity study uses microsatellites of known map position to estimate genome level population structure and linkage disequilibrium, and to identify genomic regions that have undergone selection during watermelon domestication and improvement. Thirty regions that showed evidence of selective sweep were scanned for the presence of candidate genes using the watermelon genome browser (www.icugi.org). We localized selective sweeps in intergenic regions, close to the promoters, and within the exons and introns of various genes. This study provided an evidence of convergent evolution for the presence of diverse ecotypes with special reference to American and European ecotypes. Our search for location of linked markers in the whole-genome draft sequence revealed that BVWS00358, a GA repeat microsatellite, is the GAGA type transcription factor located in the 5′ untranslated regions of a structure and insertion element that expresses a Cys2His2 Zinc finger motif, with presumed biological processes related to chitin response and transcriptional regulation. In addition, BVWS01708, an ATT repeat microsatellite, located in the promoter of a DTW domain-containing protein (Cla002761); and 2 other simple sequence repeats that association mapping link to fruit length and rind thickness. PMID:25425675

Genetic markers associated with parasite indicator traits are ideal targets for study of marker assisted selection aimed at controlling infections that reduce herd use of anthelminthics. For this study, we collected gastrointestinal (GI) nematode fecal egg count (FEC) data from post-weaning animals of an Angus resource population challenged to a 26 week natural exposure on pasture. In all, data from 487 animals was collected over a 16 year period between 1992 and 2007, most of which were selected for a specific DRB1 allele to reduce the influence of potential allelic variant effects of the MHC locus. A genome-wide association study (GWAS) based on BovineSNP50 genotypes revealed six genomic regions located on bovine Chromosomes 3, 5, 8, 15 and 27; which were significantly associated (-log10 p=4.3) with Box-Cox transformed mean FEC (BC-MFEC). DAVID analysis of the genes within the significant genomic regions suggested a correlation between our results and annotation for genes involved in inflammatory response to infection. Furthermore, ROH and selection signature analyses provided strong evidence that the genomic regions associated BC-MFEC have not been affected by local autozygosity or recent experimental selection. These findings provide useful information for parasite resistance prediction for young grazing cattle and suggest new candidate gene targets for development of disease-modifying therapies or future studies of host response to GI parasite infection. PMID:25803687

Developmental dyslexia (DD) is a complex heritable disorder with unexpected difficulty in learning to read and spell despite adequate intelligence, education, environment, and normal senses. We performed genome-wide screening for copy number variations (CNVs) in 10 large Indian dyslexic families using Affymetrix Genome-Wide Human SNP Array 6.0. Results revealed the complex genomic rearrangements due to one non-contiguous deletion and five contiguous micro duplications and micro deletions at 17q21.31 region in three dyslexic families. CNVs in this region harbor the genes KIAA1267, LRRC37A, ARL17A/B, NSFP1, and NSF. The CNVs in case 1 and case 2 at this locus were found to be in homozygous state and case 3 was a de novo CNV. These CNVs were found with at least one CNV having a common break and end points in the parents. This cluster of genes containing NSF is implicated in learning, cognition, and memory, though not formally associated with dyslexia. Molecular network analysis of these and other dyslexia related module genes suggests NSF and other genes to be associated with cellular/vesicular membrane fusion and synaptic transmission. Thus, we suggest that NSF in this cluster would be the nearest gene responsible for the learning disability phenotype. PMID:25139666

This special issue addresses the heritability and molecular genetic basis of 17 putative endophenotypes involving resting EEG power, P300 event-related potential amplitude, electrodermal orienting and habituation, antisaccade eye tracking, and affective modulation of the startle eye blink. These measures were collected from approximately 4,900 twins and parents who provided DNA samples through their participation in the Minnesota Twin Family Study. Included are papers that detail the methodology followed, genome-wide association analyses of single nucleotide polymorphisms and genes, analysis of rare variants in the human exome, and a whole genome sequencing study. Also included are 11 articles by leading experts in psychophysiology and genetics that provide perspective and commentary. A final integrative report summarizes findings and addresses issues raised. This introduction provides an overview of the aims and rationale behind these studies. PMID:25387700

Primary open-angle glaucoma (POAG) is the most common form of glaucoma and one of the leading causes of vision loss worldwide. The genetic etiology of POAG is complex and poorly understood. The purpose of this work is to identify genomic regions of interest linked to POAG. This study is the largest genetic linkage study of POAG performed to date: genomic DNA samples from 786 subjects (538 Caucasian ancestry, 248 African ancestry) were genotyped using either the Illumina GoldenGate Linkage 4 Panel or the Illumina Infinium Human Linkage-12 Panel. A total of 5233 SNPs was analyzed in 134 multiplex POAG families (89 Caucasian ancestry, 45 African ancestry). Parametric and non-parametric linkage analyses were performed on the overall dataset and within race-specific datasets (Caucasian ancestry and African ancestry). Ordered subset analysis was used to stratify the data on the basis of age of glaucoma diagnosis. Novel linkage regions were identified on chromosomes 1 and 20, and two previously described loci—GLC1D on chromosome 8 and GLC1I on chromosome 15—were replicated. These data will prove valuable in the context of interpreting results from genome-wide association studies for POAG. PMID:21765929

Genome-wide association studies are routinely used to identify genomic regions associated with traits of interest. However, this ignores an important class of genomic associations, that of epistatic interactions. A genome-wide interaction analysis between single nucleotide polymorphisms (SNPs) using highly dense markers can detect epistatic interactions, but is a difficult task due to multiple testing and computational demand. However, It is important for revealing complex trait heredity. This study considers analytical methods that detect statistical interactions between pairs of loci. We investigated a three-stage modelling procedure: (i) a model without the SNP to estimate the variance components; (ii) a model with the SNP using variance component estimates from (i), thus avoiding iteration; and (iii) using the significant SNPs from (ii) for genome-wide epistasis analysis. We fitted these three-stage models to field data for growth and ultrasound measures for subcutaneous fat thickness in Brahman cattle. The study demonstrated the usefulness of modelling epistasis in the analysis of complex traits as it revealed extra sources of genetic variation and identified potential candidate genes affecting the concentration of insulin-like growth factor-1 and ultrasound scan measure of fat depth traits. Information about epistasis can add to our understanding of the complex genetic networks that form the fundamental basis of biological systems. PMID:25754883

Coffee is one of the most consumed beverages world-wide and one of the primary sources of caffeine intake. Given its important health and economic impact, the underlying genetics of its consumption has been widely studied. Despite these efforts, much has still to be uncovered. In particular, the use of non-additive genetic models may uncover new information about the genetic variants driving coffee consumption. We have conducted a genome-wide association study in two Italian populations using additive, recessive and dominant models for analysis. This has uncovered a significant association in the PDSS2 gene under the recessive model that has been replicated in an independent cohort from the Netherlands (ERF). The identified gene has been shown to negatively regulate the expression of the caffeine metabolism genes and can thus be linked to coffee consumption. Further bioinformatics analysis of eQTL and histone marks from Roadmap data has evidenced a possible role of the identified SNPs in regulating PDSS2 gene expression through enhancers present in its intron. Our results highlight a novel gene which regulates coffee consumption by regulating the expression of the genes linked to caffeine metabolism. Further studies will be needed to clarify the biological mechanism which links PDSS2 and coffee consumption. PMID:27561104

Coffee is one of the most consumed beverages world-wide and one of the primary sources of caffeine intake. Given its important health and economic impact, the underlying genetics of its consumption has been widely studied. Despite these efforts, much has still to be uncovered. In particular, the use of non-additive genetic models may uncover new information about the genetic variants driving coffee consumption. We have conducted a genome-wide association study in two Italian populations using additive, recessive and dominant models for analysis. This has uncovered a significant association in the PDSS2 gene under the recessive model that has been replicated in an independent cohort from the Netherlands (ERF). The identified gene has been shown to negatively regulate the expression of the caffeine metabolism genes and can thus be linked to coffee consumption. Further bioinformatics analysis of eQTL and histone marks from Roadmap data has evidenced a possible role of the identified SNPs in regulating PDSS2 gene expression through enhancers present in its intron. Our results highlight a novel gene which regulates coffee consumption by regulating the expression of the genes linked to caffeine metabolism. Further studies will be needed to clarify the biological mechanism which links PDSS2 and coffee consumption. PMID:27561104

Human pygmy populations inhabit different regions of the world, from Africa to Melanesia. In Asia, short-statured populations are often referred to as "negritos." Their short stature has been interpreted as a consequence of thermoregulatory, nutritional, and/or locomotory adaptations to life in tropical forests. A more recent hypothesis proposes that their stature is the outcome of a life history trade-off in high-mortality environments, where early reproduction is favored and, consequently, early sexual maturation and early growth cessation have coevolved. Some serological evidence of deficiencies in the growth hormone/insulin-like growth factor axis have been previously associated with pygmies' short stature. Using genome-wide single-nucleotide polymorphism genotype data, we first tested whether different negrito groups living in the Philippines and Papua New Guinea are closely related and then investigated genomic signals of recent positive selection in African, Asian, and Papuan pygmy populations. We found that negritos in the Philippines and Papua New Guinea are genetically more similar to their nonpygmy neighbors than to one another and have experienced positive selection at different genes. These results indicate that geographically distant pygmy groups are likely to have evolved their short stature independently. We also found that selection on common height variants is unlikely to explain their short stature and that different genes associated with growth, thyroid function, and sexual development are under selection in different pygmy groups. PMID:24297229

Approximately 15-30% of all breast cancer tumors are estrogen receptor negative (ER-). Compared with ER-positive (ER+) disease they have an earlier age at onset and worse prognosis. Despite the vast number of risk variants identified for numerous cancer types, only seven loci have been unambiguously identified for ER-negative breast cancer. With the aim of identifying new susceptibility SNPs for this disease we performed a pleiotropic genome-wide association study (GWAS). We selected 3079 SNPs associated with a human complex trait or disease at genome-wide significance level (P<5 × 10(-8)) to perform a secondary analysis of an ER-negative GWAS from the National Cancer Institute's Breast and Prostate Cancer Cohort Consortium (BPC3), including 1998 cases and 2305 controls from prospective studies. We then tested the top ten associations (i.e. with the lowest P-values) using three additional populations with a total sample size of 3509 ER+ cases, 2543 ER- cases and 7031 healthy controls. None of the 3079 selected variants in the BPC3 ER-GWAS were significant at the adjusted threshold. 186 variants were associated with ER- breast cancer risk at a conventional threshold of P<0.05, with P-values ranging from 0.049 to 2.3 × 10(-4). None of the variants reached statistical significance in the replication phase. In conclusion, this study did not identify any novel susceptibility loci for ER-breast cancer using a "pleiotropic approach". PMID:24523857

The torque-velocity relationship is known to be affected by ageing, decreasing its protective role in the prevention of falls. Interindividual variability in this torque-velocity relationship is partly determined by genetic factors (h(2): 44-67%). As a first attempt, this genome-wide linkage study aimed to identify chromosomal regions linked to the torque-velocity relationship of the knee flexors and extensors. A selection of 283 informative male siblings (17-36 yr), belonging to 105 families, was used to conduct a genome-wide SNP-based (Illumina Linkage IVb panel) multipoint linkage analysis for the torque-velocity relationship of the knee flexors and extensors. The strongest evidence for linkage was found at 15q23 for the torque-velocity slope of the knee extensors (TVSE). Other interesting linkage regions with LOD scores >2 were found at 7p12.3 [logarithm of the odds ratio (LOD) = 2.03, P = 0.0011] for the torque-velocity ratio of the knee flexors (TVRF), at 2q14.3 (LOD = 2.25, P = 0.0006) for TVSE, and at 4p14 and 18q23 for the torque-velocity ratio of the knee extensors TVRE (LOD = 2.23 and 2.08; P = 0.0007 and 0.001, respectively). We conclude that many small contributing genes are involved in causing variation in the torque-velocity relationship of the knee flexor and extensor muscles. Several earlier reported candidate genes for muscle strength and muscle mass and new candidates are harbored within or in close vicinity of the linkage regions reported in the present study. PMID:18682575

Summary Van der Woude syndrome (VWS) is an autosomal dominant developmental malformation presenting with bilateral lower lip pits related to cleft lip, cleft palate and other malformations. We performed a whole-genome copy number variations (CNVs) scan in an Indian family with members suffering from VWS using 2·6 million combined SNP and CNV markers. We found CNVs affecting IRF6, a known candidate gene for VWS, in all three cases, while none of the non-VWS members showed any CNVs in the IRF6 region. The duplications and deletions of the chromosomal critical region in 1q32-q41 confirm the involvement of CNVs in IRF6 in South Indian VWS patients. Molecular network analysis of these and other cleft lip/palate related module genes suggests that they are associated with cytokine-mediated signalling pathways and response to interferon-gamma mediated signalling pathways. This is a maiden study indicating the involvement of CNVs in IRF6 in causing VWS in the Indian population. PMID:25579819

As the methodologies available for the detection of positive selection from genomic data vary in terms of assumptions and execution, weak correlations are expected among them. However, if there is any given signal that is consistently supported across different methodologies, it is strong evidence that the locus has been under past selection. In this paper, a straightforward frequentist approach based on the Stouffer Method to combine P-values across different tests for evidence of recent positive selection in common variations, as well as strategies for extracting biological information from the detected signals, were described and applied to high density single nucleotide polymorphism (SNP) data generated from dairy and beef cattle (taurine and indicine). The ancestral Bovinae allele state of over 440,000 SNP is also reported. Using this combination of methods, highly significant (P<3.17×10−7) population-specific sweeps pointing out to candidate genes and pathways that may be involved in beef and dairy production were identified. The most significant signal was found in the Cornichon homolog 3 gene (CNIH3) in Brown Swiss (P = 3.82×10−12), and may be involved in the regulation of pre-ovulatory luteinizing hormone surge. Other putative pathways under selection are the glucolysis/gluconeogenesis, transcription machinery and chemokine/cytokine activity in Angus; calpain-calpastatin system and ribosome biogenesis in Brown Swiss; and gangliosides deposition in milk fat globules in Gyr. The composite method, combined with the strategies applied to retrieve functional information, may be a useful tool for surveying genome-wide selective sweeps and providing insights in to the source of selection. PMID:23696874

The Asian cycads are mostly allopatric, distributed in small population sizes. Hybridization between allopatric species provides clues in determining the mechanism of species divergence. Horticultural introduction provides the chance of interspecific gene flow between allopatric species. Two allopatrically eastern Asian Cycas sect. Asiorientales species, C. revoluta and C. taitungensis, which are widely distributed in Ryukyus and Fujian Province and endemic to Taiwan, respectively, were planted in eastern Taiwan for horticultural reason. Higher degrees of genetic admixture in cultivated samples than wild populations in both cycad species were detected based on multilocus scans by neutral AFLP markers. Furthermore, bidirectional but asymmetric introgression by horticultural introduction of C. revoluta is evidenced by the reanalyses of species associated loci, which are assumed to be diverged after species divergence. Partial loci introgressed from native cycad to the invaders were also detected at the loci of strong species association. Consistent results tested by all neutral loci, and the species-associated loci, specify the recent introgression from the paradox of sharing of ancestral polymorphisms. Phenomenon of introgression of cultivated cycads implies niche conservation among two geographic-isolated cycads, even though the habitats of the extant wild populations of two species are distinct. PMID:23591840

Purpose To explore the effects of single nucleotide polymorphisms (SNPs) on pancreatic cancer risk and overall survival. Experimental Design The germline DNA of 531 pancreatic cancer cases and 305 healthy controls from a hospital-based study was genotyped at SNPs previously reported to be associated with pancreatic cancer risk or clinical outcome. We analyzed putative risk SNPs for replication of their reported effects on risk and tested for novel effects on overall survival (OS). Similarly, we analyzed putative survival-associated SNPs for replication of their reported effects on OS and tested for novel effects on risk. Lastly, we performed a genome-wide association study of OS using a subset of 252 cases, with two subsequent validation sets of 261 and 572 patients, respectively. Results Among seven risk SNPs analyzed, two (rs505922, rs9543325) were associated with risk (p<0.05). Among 24 survival-associated SNPs analyzed, one (rs9350) was associated with OS (p<0.05). No putative risk SNPs or putative survival-associated SNPs were found to be associated with OS or risk, respectively. Further, our GWAS identified a novel SNP (rs1482426, combined stage 1 and 2 p = 1.7 ×10−6, per-allele HR = 1.74, 95% CI 1.38–2.18) to be putatively associated with OS. Conclusions The effects of SNPs on pancreatic cancer risk and overall survival were replicated in our study, though further work is necessary to understand the functional mechanisms underlying these effects. More importantly, the putative association with OS identified by GWAS suggests that GWAS may be useful in identifying SNPs associated with clinical outcome in pancreatic cancer. PMID:22665904

Age-related cognitive decline is likely promoted by accumulated brain injury due to chronic conditions of aging, including neurodegenerative and vascular disease. Since common neuronal mechanisms may mediate the adaptation to diverse cerebral insults, we hypothesized that susceptibility for age-related cognitive decline may be due in part to a shared genetic network. We have therefore performed a genome-wide association study using a quantitative measure of global cognitive decline slope, based on repeated measures of 17 cognitive tests in 749 subjects from the Religious Orders Study. Top results were evaluated in three independent replication cohorts, consisting of 2,279 additional subjects with repeated cognitive testing. As expected, we find that the Alzheimer’s disease (AD) susceptibility locus, APOE, is strongly associated with rate of cognitive decline (PDISC=5.6×10−9; PJOINT=3.7×10−27). We additionally discover a variant, rs10808746, which shows consistent effects in the replication cohorts and modestly improved evidence of association in the joint analysis (PDISC=6.7×10−5; PREP=9.4×10−3; PJOINT=2.3×10−5). This variant influences the expression of two adjacent genes, PDE7A and MTFR1, which are potential regulators of inflammation and oxidative injury, respectively. Using aggregate measures of genetic risk, we find that known susceptibility loci for cardiovascular disease, type II diabetes, and inflammatory diseases are not significantly associated with cognitive decline in our cohort. Our results suggest that intermediate phenotypes, when coupled with larger sample sizes, may be a useful tool to dissect susceptibility loci for age-related cognitive decline and uncover shared molecular pathways with a role in neuronal injury. PMID:22054870

As the methodologies available for the detection of positive selection from genomic data vary in terms of assumptions and execution, weak correlations are expected among them. However, if there is any given signal that is consistently supported across different tests, it might be a strong evidence o...

We report a genome-wide association scan for facial features in ∼6,000 Latin Americans. We evaluated 14 traits on an ordinal scale and found significant association (P values<5 × 10−8) at single-nucleotide polymorphisms (SNPs) in four genomic regions for three nose-related traits: columella inclination (4q31), nose bridge breadth (6p21) and nose wing breadth (7p13 and 20p11). In a subsample of ∼3,000 individuals we obtained quantitative traits related to 9 of the ordinal phenotypes and, also, a measure of nasion position. Quantitative analyses confirmed the ordinal-based associations, identified SNPs in 2q12 associated to chin protrusion, and replicated the reported association of nasion position with SNPs in PAX3. Strongest association in 2q12, 4q31, 6p21 and 7p13 was observed for SNPs in the EDAR, DCHS2, RUNX2 and GLI3 genes, respectively. Associated SNPs in 20p11 extend to PAX1. Consistent with the effect of EDAR on chin protrusion, we documented alterations of mandible length in mice with modified Edar funtion. PMID:27193062

We report a genome-wide association scan for facial features in ∼6,000 Latin Americans. We evaluated 14 traits on an ordinal scale and found significant association (P values<5 × 10(-8)) at single-nucleotide polymorphisms (SNPs) in four genomic regions for three nose-related traits: columella inclination (4q31), nose bridge breadth (6p21) and nose wing breadth (7p13 and 20p11). In a subsample of ∼3,000 individuals we obtained quantitative traits related to 9 of the ordinal phenotypes and, also, a measure of nasion position. Quantitative analyses confirmed the ordinal-based associations, identified SNPs in 2q12 associated to chin protrusion, and replicated the reported association of nasion position with SNPs in PAX3. Strongest association in 2q12, 4q31, 6p21 and 7p13 was observed for SNPs in the EDAR, DCHS2, RUNX2 and GLI3 genes, respectively. Associated SNPs in 20p11 extend to PAX1. Consistent with the effect of EDAR on chin protrusion, we documented alterations of mandible length in mice with modified Edar funtion. PMID:27193062

We conducted a genome-widescan for visceral leishmaniasis in mixed-breed dogs from a highly endemic area in Brazil using 149,648 single nucleotide polymorphism (SNP) markers genotyped in 20 cases and 28 controls. Using a mixed model approach, we found two candidate loci on canine autosomes 1 and 2. The positional association on chromosome 2 mapped to a predicted DNAse sensitive site in CD14+ monocytes that serve as a cis-regulatory element for the expression of interleukin alpha receptors 2 (IL2RA) and 15 (IL15RA). Both interleukins were previously found to lead to protective T helper 1 cell (Th1) response against Leishmania spp. in humans and mice. The associated marker on chromosome 1 was located between two predicted transcription factor binding sites regulating the expression of the transducin-like enhancer of split 1 gene (TLE1), an important player in Notch signaling. This pathway is critical for macrophage activity and CD4+ T cell differentiation into Th1 and T helper 2. Together, these findings suggest that the human and mouse model for protective response against Leishmania spp., which involves Th1 and macrophage modulation by interleukins 2, 15, gamma interferon and Notch signaling, may also hold for the canine model. PMID:26348501

Genome-wide association studies in human type 2 diabetes (T2D) have renewed interest in the pancreatic islet as a contributor to T2D risk. Chronic low-grade inflammation resulting from obesity is a risk factor for T2D and a possible trigger of β-cell failure. In this study, microarray data were collected from mouse islets after overnight treatment with cytokines at concentrations consistent with the chronic low-grade inflammation in T2D. Genes with a cytokine-induced change of >2-fold were then examined for associations between single nucleotide polymorphisms and the acute insulin response to glucose (AIRg) using data from the Genetics Underlying Diabetes in Hispanics (GUARDIAN) Consortium. Significant evidence of association was found between AIRg and single nucleotide polymorphisms in Arap3 (5q31.3), F13a1 (6p25.3), Klhl6 (3q27.1), Nid1 (1q42.3), Pamr1 (11p13), Ripk2 (8q21.3), and Steap4 (7q21.12). To assess the potential relevance to islet function, mouse islets were exposed to conditions modeling low-grade inflammation, mitochondrial stress, endoplasmic reticulum (ER) stress, glucotoxicity, and lipotoxicity. RT-PCR revealed that one or more forms of stress significantly altered expression levels of all genes except Arap3. Thapsigargin-induced ER stress up-regulated both Pamr1 and Klhl6. Three genes confirmed microarray predictions of significant cytokine sensitivity: F13a1 was down-regulated 3.3-fold by cytokines, Ripk2 was up-regulated 1.5- to 3-fold by all stressors, and Steap4 was profoundly cytokine sensitive (167-fold up-regulation). Three genes were thus closely associated with low-grade inflammation in murine islets and also with a marker for islet function (AIRg) in a diabetes-prone human population. This islet-targeted genome-wide association scan identified several previously unrecognized candidate genes related to islet dysfunction during the development of T2D. PMID:26018251

Identifying the genes that influence levels of pro-inflammatory molecules can help to elucidate the mechanisms underlying this process. We first conducted a two-stage genome-wide association scan (GWAS) for the key inflammatory biomarkers Interleukin-6 (IL-6), the general measure of inflammation erythrocyte sedimentation rate (ESR), monocyte chemotactic protein-1 (MCP-1), and high-sensitivity C-reactive protein (hsCRP) in a large cohort of individuals from the founder population of Sardinia. By analysing 731,213 autosomal or X chromosome SNPs and an additional ∼1.9 million imputed variants in 4,694 individuals, we identified several SNPs associated with the selected quantitative trait loci (QTLs) and replicated all the top signals in an independent sample of 1,392 individuals from the same population. Next, to increase power to detect and resolve associations, we further genotyped the whole cohort (6,145 individuals) for 293,875 variants included on the ImmunoChip and MetaboChip custom arrays. Overall, our combined approach led to the identification of 9 genome-wide significant novel independent signals—5 of which were identified only with the custom arrays—and provided confirmatory evidence for an additional 7. Novel signals include: for IL-6, in the ABO gene (rs657152, p = 2.13×10−29); for ESR, at the HBB (rs4910472, p = 2.31×10−11) and UCN119B/SPPL3 (rs11829037, p = 8.91×10−10) loci; for MCP-1, near its receptor CCR2 (rs17141006, p = 7.53×10−13) and in CADM3 (rs3026968, p = 7.63×10−13); for hsCRP, within the CRP gene (rs3093077, p = 5.73×10−21), near DARC (rs3845624, p = 1.43×10−10), UNC119B/SPPL3 (rs11829037, p = 1.50×10−14), and ICOSLG/AIRE (rs113459440, p = 1.54×10−08) loci. Confirmatory evidence was found for IL-6 in the IL-6R gene (rs4129267); for ESR at CR1 (rs12567990) and TMEM57 (rs10903129); for MCP-1 at DARC (rs12075); and for hsCRP at CRP (rs1205), HNF1A (rs225918), and APOC-I (rs

A genome-wide nonparametric linkage screen was performed to localize Bipolar Disorder (BP) susceptibility loci in a sample of 3757 individuals of Latino ancestry. The sample included 963 individuals with BP phenotype (704 relative pairs) from 686 families recruited from the US, Mexico, Costa Rica, and Guatemala. Non-parametric analyses were performed over a 5 cM grid with an average genetic coverage of 0.67 cM. Multipoint analyses were conducted across the genome using non-parametric Kong & Cox LOD scores along with Sall statistics for all relative pairs. Suggestive and significant genome-wide thresholds were calculated based on 1000 simulations. Single-marker association tests in the presence of linkage were performed assuming a multiplicative model with a population prevalence of 2%. We identified two genome-wide significant susceptibly loci for BP at 8q24 and 14q32, and a third suggestive locus at 2q13-q14. Within these three linkage regions, the top associated single marker (rs1847694, P = 2.40 × 10(-5)) is located 195 Kb upstream of DPP10 in Chromosome 2. DPP10 is prominently expressed in brain neuronal populations, where it has been shown to bind and regulate Kv4-mediated A-type potassium channels. Taken together, these results provide additional evidence that 8q24, 14q32, and 2q13-q14 are susceptibly loci for BP and these regions may be involved in the pathogenesis of BP in the Latino population. PMID:25044503

This paper explores the genetic structure and signatures of natural selection in different sub-populations from the Island of Sardinia, exploiting information from nearly 700 000 autosomal SNPs genotyped with the Affymetrix Genome-Wide Human SNP 6.0 Array. The genetic structure of the Sardinian population and its position within the context of other Mediterranean and European human groups were investigated in depth by comparing our data with publicly available data sets. Principal components and admixture analyses suggest a clustering of the examined samples in two significantly differentiated sub-populations (Ogliastra and Southern Sardinia), as confirmed by AMOVA (FST=0.011; P<0.001). Differentiation of these sub-populations was still evident when they were pooled together with supplementary Sardinian samples from HGDP and compared with several other European, North-African and Near Eastern populations, confirming the uniqueness of the Sardinian genetic background. Moreover, by applying several statistical approaches aimed at assessing differences at the SNP level, the highest differentiated genomic regions between Ogliastra and Southern Sardinia were thus investigated via an extended haplotype homozygosity (EHH)-based test to point out potential selective sweeps. Using this approach, 40 genomic regions were detected, with significant differences between Ogliastra and Southern Sardinia. These regions were subsequently investigated using a long-range haplotype test, which found significant REHH values for SNPs rs11070188 and rs11070192 in the Ogliastra sub-population. In the light of these results and the overlap of the different computed statistics, the region encompassing these loci can be considered a strong candidate to have undergone selective pressure in Ogliastra. PMID:22535185

While susceptibility to hypersensitive reactions is a common problem amongst humans and animals alike, the population structure of certain animal species and breeds provides a more advantageous route to better understanding the biology underpinning these conditions. The current study uses Exmoor ponies, a highly inbred breed of horse known to frequently suffer from insect bite hypersensitivity, to identify genomic regions associated with a type I and type IV hypersensitive reaction. A total of 110 cases and 170 controls were genotyped on the 670K Axiom Equine Genotyping Array. Quality control resulted in 452,457 SNPs and 268 individuals being tested for association. Genome-wide association analyses were performed using the GenABEL package in R and resulted in the identification of two regions of interest on Chromosome 8. The first region contained the most significant SNP identified, which was located in an intron of the DCC netrin 1 receptor gene. The second region identified contained multiple top SNPs and encompassed the PIGN, KIAA1468, TNFRSF11A, ZCCHC2, and PHLPP1 genes. Although additional studies will be needed to validate the importance of these regions in horses and the relevance of these regions in other species, the knowledge gained from the current study has the potential to be a step forward in unraveling the complex nature of hypersensitive reactions. PMID:27070818

While susceptibility to hypersensitive reactions is a common problem amongst humans and animals alike, the population structure of certain animal species and breeds provides a more advantageous route to better understanding the biology underpinning these conditions. The current study uses Exmoor ponies, a highly inbred breed of horse known to frequently suffer from insect bite hypersensitivity, to identify genomic regions associated with a type I and type IV hypersensitive reaction. A total of 110 cases and 170 controls were genotyped on the 670K Axiom Equine Genotyping Array. Quality control resulted in 452,457 SNPs and 268 individuals being tested for association. Genome-wide association analyses were performed using the GenABEL package in R and resulted in the identification of two regions of interest on Chromosome 8. The first region contained the most significant SNP identified, which was located in an intron of the DCC netrin 1 receptor gene. The second region identified contained multiple top SNPs and encompassed the PIGN, KIAA1468, TNFRSF11A, ZCCHC2, and PHLPP1 genes. Although additional studies will be needed to validate the importance of these regions in horses and the relevance of these regions in other species, the knowledge gained from the current study has the potential to be a step forward in unraveling the complex nature of hypersensitive reactions. PMID:27070818

Dyslexia is one of the most common learning disorders affecting about 5% of all school-aged children. It has been shown that event-related potential measurements reveal differences between dyslexic children and age-matched controls. This holds particularly true for mismatch negativity (MMN), which reflects automatic speech deviance processing and is altered in dyslexic children. We performed a whole-genome association analysis in 200 dyslexic children, focusing on MMN measurements. We identified rs4234898, a marker located on chromosome 4q32.1, to be significantly associated with the late MMN component. This association could be replicated in an independent second sample of 186 dyslexic children, reaching genome-wide significance in the combined sample (P = 5.14e-08). We also found an association between the late MMN component and a two-marker haplotype of rs4234898 and rs11100040, one of its neighboring single nucleotide polymorphisms (SNPs). In the combined sample, this marker combination withstands correction for multiple testing (P = 6.71e-08). Both SNPs lie in a region devoid of any protein-coding genes; however, they both show significant association with mRNA-expression levels of SLC2A3 on chromosome 12, the predominant facilitative glucose transporter in neurons. Our results suggest a possible trans-regulation effect on SLC2A3, which might lead to glucose deficits in dyslexic children and could explain their attenuated MMN in passive listening tasks. PMID:19786962

Job-related exhaustion is the core dimension of burnout, a work-related stress syndrome that has several negative health consequences. In this study, we explored the molecular genetic background of job-related exhaustion. A genome-wide analysis of job-related exhaustion was performed in the GENMETS subcohort (n = 1256) of the Finnish population-based Health 2000 study. Replication analyses included an analysis of the strongest associations in the rest of the Health 2000 sample (n = 1660 workers) and in three independent populations (the FINRISK population cohort, n = 10 753; two occupational cohorts, total n = 1451). Job-related exhaustion was ascertained using a standard self-administered questionnaire (the Maslach Burnout Inventory (MBI)-GS exhaustion scale in the Health 2000 sample and the occupational cohorts) or a single question (FINRISK). A variant located in an intron of UST, uronyl-2-sulfotransferase (rs13219957), gave the strongest statistical evidence in the initial genome-wide study (P = 1.55 × 10−7), and was associated with job-related exhaustion in all the replication sets (P < 0.05; P = 6.75 × 10−7 from the meta-analysis). Consistent with studies of mood disorders, individual common genetic variants did not have any strong effect on job-related exhaustion. However, the nominally significant signals from the allelic variant of UST in four separate samples suggest that this variant might be a weak risk factor for job-related exhaustion. Together with the previously reported associations of other dermatan/chondroitin sulfate genes with mood disorders, these results indicate a potential molecular pathway for stress-related traits and mark a candidate region for further studies of job-related and general exhaustion. PMID:23620144

Abnormal lipid levels are important risk factors for cardiovascular diseases. We conducted genome-wide variance component linkage analyses to search for loci influencing total cholesterol (TC), LDL, HDL and triglyceride in families residing in American Samoa and Samoa as well as in a combined sample from the two polities. We adjusted the traits for a number of environmental covariates, such as smoking, alcohol consumption, physical activity, and material lifestyle. We found suggestive univariate linkage with log of the odds (LOD) scores > 3 for LDL on 6p21-p12 (LOD 3.13) in Samoa and on 12q21-q23 (LOD 3.07) in American Samoa. Furthermore, in American Samoa on 12q21, we detected genome-wide linkage (LODeq 3.38) to the bivariate trait TC-LDL. Telomeric of this region, on 12q24, we found suggestive bivariate linkage to TC-HDL (LODeq 3.22) in the combined study sample. In addition, we detected suggestive univariate linkage (LOD 1.9–2.93) on chromosomes 4p-q, 6p, 7q, 9q, 11q, 12q 13q, 15q, 16p, 18q, 19p, 19q and Xq23 and suggestive bivariate linkage (LODeq 2.05–2.62) on chromosomes 6p, 7q, 12p, 12q, and 19p-q. In conclusion, chromosome 6p and 12q may host promising susceptibility loci influencing lipid levels; however, the low degree of overlap between the three study samples strongly encourages further studies of the lipid-related traits. PMID:18594117

Background New strategies to combat the global scourge of schistosomiasis may be revealed by increased understanding of the mechanisms by which the obligate snail host can resist the schistosome parasite. However, few molecular markers linked to resistance have been identified and characterized in snails. Methodology/Principal Findings Here we test six independent genetic loci for their influence on resistance to Schistosoma mansoni strain PR1 in the 13-16-R1 strain of the snail Biomphalaria glabrata. We first identify a genomic region, RADres, showing the highest differentiation between susceptible and resistant inbred lines among 1611 informative restriction-site associated DNA (RAD) markers, and show that it significantly influences resistance in an independent set of 439 outbred snails. The additive effect of each RADres resistance allele is 2-fold, similar to that of the previously identified resistance gene sod1. The data fit a model in which both loci contribute independently and additively to resistance, such that the odds of infection in homozygotes for the resistance alleles at both loci (13% infected) is 16-fold lower than the odds of infection in snails without any resistance alleles (70% infected). Genome-wide linkage disequilibrium is high, with both sod1 and RADres residing on haplotype blocks >2Mb, and with other markers in each block also showing significant effects on resistance; thus the causal genes within these blocks remain to be demonstrated. Other candidate loci had no effect on resistance, including the Guadeloupe Resistance Complex and three genes (aif, infPhox, and prx1) with immunological roles and expression patterns tied to resistance, which must therefore be trans-regulated. Conclusions/Significance The loci RADres and sod1 both have strong effects on resistance to S. mansoni. Future approaches to control schistosomiasis may benefit from further efforts to characterize and harness this natural genetic variation. PMID:26372103

It is recognized that genetic factors contribute to human longevity. Besides the hypothesis of existence of longevity genes, another suggests that a lower frequency of risk alleles decreases the incidence of age-related diseases in the long-lived people. However, the latter finds no support from recent genetic studies. Considering the crucial role of epigenetic modification in gene regulation, we then hypothesize that suppressing disease-related genes in longevity individuals is likely achieved by epigenetic modification, e.g. DNA methylation. To test this hypothesis, we investigated the genome-wide methylation profile in 4 Chinese female centenarians and 4 middle-aged controls using methyl-DNA immunoprecipitation sequencing. 626 differentially methylated regions (DMRs) were observed between both groups. Interestingly, genes with these DMRs were enriched in age-related diseases, including type-2 diabetes, cardiovascular disease, stroke and Alzheimer’s disease. This pattern remains rather stable after including methylomes of two white individuals. Further analyses suggest that the observed DMRs likely have functional roles in regulating disease-associated gene expressions, with some genes [e.g. caspase 3 (CASP3)] being down-regulated whereas the others [i.e. interleukin 1 receptor, type 2 (IL1R2)] up-regulated. Therefore, our study suggests that suppressing the disease-related genes via epigenetic modification is an important contributor to human longevity. PMID:25793257

Genome-wide linkage studies have been used to localize rare and highly penetrant prostate cancer (PRCA) susceptibility genes. Linkage studies performed in different ethnic backgrounds and populations have been somewhat disparate, resulting in multiple, often irreproducible signals because of genetic heterogeneity and high sporadic background of the disease. Our first genome-wide linkage study and subsequent fine-mapping study of Finnish hereditary prostate cancer (HPC) families gave evidence of linkage to one region. Here, we conducted subsequent scans with microsatellites and SNPs in a total of 69 Finnish HPC families. GENEHUNTER-PLUS was used for parametric and non-parametric analyses. Our microsatellite genome-wide linkage study provided evidence of linkage to 17q12-q23, with a heterogeneity LOD (HLOD) score of 3.14 in a total of 54 of the 69 families. Genome-wide SNP analysis of 59 of the 69 families gave a highest HLOD score of 3.40 at 2q37.3 under a dominant high penetrance model. Analyzing all 69 families by combining microsatellite and SNP maps also yielded HLOD scores of > 3.3 in two regions (2q37.3 and 17q12-q21.3). These significant linkage peaks on chromosome 2 and 17 confirm previous linkage evidence of a locus on 17q from other populations and provide a basis for continued research into genetic factors involved in PRCA. Fine-mapping analysis of these regions is ongoing and candidate genes at linked loci are currently under analysis. PMID:21207418

Crohn's disease (CD) and celiac disease (CelD) are chronic intestinal inflammatory diseases, involving genetic and environmental factors in their pathogenesis. The two diseases can co-occur within families, and studies suggest that CelD patients have a higher risk to develop CD than the general population. These observations suggest that CD and CelD may share common genetic risk loci. Two such shared loci, IL18RAP and PTPN2, have already been identified independently in these two diseases. The aim of our study was to explicitly identify shared risk loci for these diseases by combining results from genome-wide association study (GWAS) datasets of CD and CelD. Specifically, GWAS results from CelD (768 cases, 1,422 controls) and CD (3,230 cases, 4,829 controls) were combined in a meta-analysis. Nine independent regions had nominal association p-value <1.0×10−5 in this meta-analysis and showed evidence of association to the individual diseases in the original scans (p-value <1×10−2 in CelD and <1×10−3 in CD). These include the two previously reported shared loci, IL18RAP and PTPN2, with p-values of 3.37×10−8 and 6.39×10−9, respectively, in the meta-analysis. The other seven had not been reported as shared loci and thus were tested in additional CelD (3,149 cases and 4,714 controls) and CD (1,835 cases and 1,669 controls) cohorts. Two of these loci, TAGAP and PUS10, showed significant evidence of replication (Bonferroni corrected p-values <0.0071) in the combined CelD and CD replication cohorts and were firmly established as shared risk loci of genome-wide significance, with overall combined p-values of 1.55×10−10 and 1.38×10−11 respectively. Through a meta-analysis of GWAS data from CD and CelD, we have identified four shared risk loci: PTPN2, IL18RAP, TAGAP, and PUS10. The combined analysis of the two datasets provided the power, lacking in the individual GWAS for single diseases, to detect shared loci with a relatively small effect. PMID:21298027

This protocol provides a rapid, streamlined and scalable strategy to systematically scangenomic regions for the presence of transcriptional regulatory regions that are active in a specific cell type. It creates genomic tiles spanning a region of interest that are subsequently cloned by recombination into a luciferase reporter vector containing the simian virus 40 promoter. Tiling clones are transfected into specific cell types to test for the presence of transcriptional regulatory regions. The protocol includes testing of different single-nucleotide polymorphism (SNP) alleles to determine their effect on regulatory activity. This procedure provides a systematic framework for identifying candidate functional SNPs within a locus during functional analysis of genome-wide association studies. This protocol adapts and combines previous well-established molecular biology methods to provide a streamlined strategy, based on automated primer design and recombinational cloning, allowing one to rapidly go from a genomic locus to a set of candidate functional SNPs in 8 weeks. PMID:26658467

The present protocol provides a rapid, streamlined and scalable strategy to systematically scangenomic regions for the presence of transcriptional regulatory regions active in a specific cell type. It creates genomic tiles spanning a region of interest that are subsequently cloned by recombination into a luciferase reporter vector containing the Simian Virus 40 promoter. Tiling clones are transfected into specific cell types to test for the presence of transcriptional regulatory regions. The protocol includes testing of different SNP (single nucleotide polymorphism) alleles to determine their effect on regulatory activity. This procedure provides a systematic framework to identify candidate functional SNPs within a locus during functional analysis of genome-wide association studies. This protocol adapts and combines previous well-established molecular biology methods to provide a streamlined strategy, based on automated primer design and recombinational cloning to rapidly go from a genomic locus to a set of candidate functional SNPs in eight weeks. PMID:26658467

Genome-wide polygenic scores (GPS), which aggregate the effects of thousands of DNA variants from genome-wide association studies (GWAS), have the potential to make genetic predictions for individuals. We conducted a systematic investigation of associations between GPS and many behavioral traits, the behavioral phenome. For 3152 unrelated 16-year-old individuals representative of the United Kingdom, we created 13 GPS from the largest GWAS for psychiatric disorders (for example, schizophrenia, depression and dementia) and cognitive traits (for example, intelligence, educational attainment and intracranial volume). The behavioral phenome included 50 traits from the domains of psychopathology, personality, cognitive abilities and educational achievement. We examined phenome-wide profiles of associations for the entire distribution of each GPS and for the extremes of the GPS distributions. The cognitive GPS yielded stronger predictive power than the psychiatric GPS in our UK-representative sample of adolescents. For example, education GPS explained variation in adolescents' behavior problems (~0.6%) and in educational achievement (~2%) but psychiatric GPS were associated with neither. Despite the modest effect sizes of current GPS, quantile analyses illustrate the ability to stratify individuals by GPS and opportunities for research. For example, the highest and lowest septiles for the education GPS yielded a 0.5 s.d. difference in mean math grade and a 0.25 s.d. difference in mean behavior problems. We discuss the usefulness and limitations of GPS based on adult GWAS to predict genetic propensities earlier in development. PMID:26303664

Transmembrane proteins allow cells to extensively communicate with the external world in a very accurate and specific way. They form principal nodes in several signaling pathways and attract large interest in therapeutic intervention, as the majority pharmaceutical compounds target membrane proteins. Thus, according to the current genome annotation methods, a detailed structural/functional characterization at the protein level of each of the elements codified in the genome is also required. The extreme difficulty in obtaining high-resolution three-dimensional structures, calls for computational approaches. Here we review to which extent the efforts made in the last few years, combining the structural characterization of membrane proteins with protein bioinformatics techniques, could help describing membrane proteins at a genome-wide scale. In particular we analyze the use of comparative modeling techniques as a way of overcoming the lack of high-resolution three-dimensional structures in the human membrane proteome. PMID:24403851

Knowledge of the inherited risk for cancer is an important component of preventive oncology. In addition to well-established syndromes of cancer predisposition, much remains to be discovered about the genetic variation underlying susceptibility to common malignancies. Increased knowledge about the human genome and advances in genotyping technology have made possible genome-wide association studies (GWAS) of human diseases. These studies have identified many important regions of genetic variation associated with an increased risk for human traits and diseases including cancer. Understanding the principles, major findings, and limitations of GWAS is becoming increasingly important for oncologists as dissemination of genomic risk tests directly to consumers is already occurring through commercial companies. GWAS have contributed to our understanding of the genetic basis of cancer and will shed light on biologic pathways and possible new strategies for targeted prevention. To date, however, the clinical utility of GWAS-derived risk markers remains limited. PMID:20585100

Genome-wide association studies, which analyzes hundreds of thousands of single-nucleotide polymorphisms to identify disease susceptibility genes, are challenging because the work involves intensive computation and complex modeling. We propose a two-stage genome-wide association scanning procedure, consisting of a single-locus association scan for the first stage and a gene-based association scan for the second stage. Marginal effects of single-nucleotide polymorphisms are examined by using the exact Armitage trend test or logistic regression, and gene effects are examined by using a p-value combination method. Compared with some existing single-locus and multilocus methods, the proposed method has the following merits: 1) convenient for definition of biologically meaningful regions, 2) powerful for detection of minor-effect genes, 3) helpful for alleviation of a multiple-testing problem, and 4) convenient for result interpretation. The method was applied to study Genetic Analysis Workshop 16 Problem 1 rheumatoid arthritis data, and strong association signals were found. The results show that the human major histocompatibility complex region is the most important genomic region associated with rheumatoid arthritis. Moreover, previously reported genes including PTPN22, C5, and IL2RB were confirmed; novel genes including HLA-DRA, BTNL2, C6orf10, NOTCH4, TAP2, and TNXB were identified by our analysis. PMID:20018002

Ancient DNA makes it possible to observe natural selection directly by analysing samples from populations before, during and after adaptation events. Here we report a genome-widescan for selection using ancient DNA, capitalizing on the largest ancient DNA data set yet assembled: 230 West Eurasians who lived between 6500 and 300 bc, including 163 with newly reported data. The new samples include, to our knowledge, the first genome-wide ancient DNA from Anatolian Neolithic farmers, whose genetic material we obtained by extracting from petrous bones, and who we show were members of the population that was the source of Europe's first farmers. We also report a transect of the steppe region in Samara between 5600 and 300 bc, which allows us to identify admixture into the steppe from at least two external sources. We detect selection at loci associated with diet, pigmentation and immunity, and two independent episodes of selection on height. PMID:26595274

Ancient DNA makes it possible to directly witness natural selection by analyzing samples from populations before, during and after adaptation events. Here we report the first scan for selection using ancient DNA, capitalizing on the largest genome-wide dataset yet assembled: 230 West Eurasians dating to between 6500 and 1000 BCE, including 163 with newly reported data. The new samples include the first genome-wide data from the Anatolian Neolithic culture whose genetic material we extracted from the DNA-rich petrous bone and who we show were members of the population that was the source of Europe’s first farmers. We also report a complete transect of the steppe region in Samara between 5500 and 1200 BCE that allows us to recognize admixture from at least two external sources into steppe populations during this period. We detect selection at loci associated with diet, pigmentation and immunity, and two independent episodes of selection on height. PMID:26595274

Structural information on interacting proteins is important for understanding life processes at the molecular level. Genome-wide docking database is an integrated resource for structural studies of protein–protein interactions on the genome scale, which combines the available experimental data with models obtained by docking techniques. Current database version (August 2009) contains 25 559 experimental and modeled 3D structures for 771 organisms spanned over the entire universe of life from viruses to humans. Data are organized in a relational database with user-friendly search interface allowing exploration of the database content by a number of parameters. Search results can be interactively previewed and downloaded as PDB-formatted files, along with the information relevant to the specified interactions. The resource is freely available at http://gwidd.bioinformatics.ku.edu. PMID:19900970

The discovery and prioritization of heritable phenotypes is a computational challenge in a variety of settings, including neuroimaging genetics and analyses of the vast phenotypic repositories in electronic health record systems and population-based biobanks. Classical estimates of heritability require twin or pedigree data, which can be costly and difficult to acquire. Genome-wide complex trait analysis is an alternative tool to compute heritability estimates from unrelated individuals, using genome-wide data that are increasingly ubiquitous, but is computationally demanding and becomes difficult to apply in evaluating very large numbers of phenotypes. Here we present a fast and accurate statistical method for high-dimensional heritability analysis using genome-wide SNP data from unrelated individuals, termed massively expedited genome-wide heritability analysis (MEGHA) and accompanying nonparametric sampling techniques that enable flexible inferences for arbitrary statistics of interest. MEGHA produces estimates and significance measures of heritability with several orders of magnitude less computational time than existing methods, making heritability-based prioritization of millions of phenotypes based on data from unrelated individuals tractable for the first time to our knowledge. As a demonstration of application, we conducted heritability analyses on global and local morphometric measurements derived from brain structural MRI scans, using genome-wide SNP data from 1,320 unrelated young healthy adults of non-Hispanic European ancestry. We also computed surface maps of heritability for cortical thickness measures and empirically localized cortical regions where thickness measures were significantly heritable. Our analyses demonstrate the unique capability of MEGHA for large-scale heritability-based screening and high-dimensional heritability profile construction. PMID:25675487

The discovery and prioritization of heritable phenotypes is a computational challenge in a variety of settings, including neuroimaging genetics and analyses of the vast phenotypic repositories in electronic health record systems and population-based biobanks. Classical estimates of heritability require twin or pedigree data, which can be costly and difficult to acquire. Genome-wide complex trait analysis is an alternative tool to compute heritability estimates from unrelated individuals, using genome-wide data that are increasingly ubiquitous, but is computationally demanding and becomes difficult to apply in evaluating very large numbers of phenotypes. Here we present a fast and accurate statistical method for high-dimensional heritability analysis using genome-wide SNP data from unrelated individuals, termed massively expedited genome-wide heritability analysis (MEGHA) and accompanying nonparametric sampling techniques that enable flexible inferences for arbitrary statistics of interest. MEGHA produces estimates and significance measures of heritability with several orders of magnitude less computational time than existing methods, making heritability-based prioritization of millions of phenotypes based on data from unrelated individuals tractable for the first time to our knowledge. As a demonstration of application, we conducted heritability analyses on global and local morphometric measurements derived from brain structural MRI scans, using genome-wide SNP data from 1,320 unrelated young healthy adults of non-Hispanic European ancestry. We also computed surface maps of heritability for cortical thickness measures and empirically localized cortical regions where thickness measures were significantly heritable. Our analyses demonstrate the unique capability of MEGHA for large-scale heritability-based screening and high-dimensional heritability profile construction. PMID:25675487

Genome-wide association (GWA) studies for pharmacogenomics-related traits are increasingly being performed to identify loci that affect either drug response or susceptibility to adverse drug reactions. Until now, only the largest effects have been detected, partly because of the challenges of obtaining large numbers of cases for pharmacogenomic studies. Since 2007, a range of pharmacogenomics GWA studies have been published that have identified several interesting and novel associations between drug responses or reactions and clinically relevant loci, showing the value of this approach. PMID:20300088

Genome-wide association studies (GWAS) are a powerful tool for understanding the genetic underpinnings of human disease. In this article, we briefly review the role and findings of GWAS in common neurological diseases, including Stroke, Alzheimer’s disease, Parkinson’s disease, epilepsy, multiple sclerosis, migraine, amyotrophic lateral sclerosis, frontotemporal lobar degeneration, restless legs syndrome, intracranial aneurysm, human prion diseases and moyamoya disease. We then discuss the present and future implications of these findings with regards to disease prediction, uncovering basic biology, and the development of potential therapeutic agents. PMID:25568877

There have been nearly 400genome-wide association studies published since 2005. The GWAS approach has been exceptionally successful in identifying common genetic variants that predispose to a variety of complex human diseases and biochemical and anthropometric traits. Although this approach is relatively new, there are many excellent reviews of different aspects of the GWAS method. Here, we provide a primer, an annotated overview of the GWAS method with particular reference to psychiatric genetics. We dissect the GWAS methodology into its components and provide a brief description with citations and links to reviews that cover the topic in detail. PMID:19895722

We describe a PCA-based genomescan approach to analyze genome-wide admixture structure, and introduce wavelet transform analysis as a method for estimating the time of admixture. We test the wavelet transform method with simulations and apply it to genome-wide SNP data from eight admixed human populations. The wavelet transform method offers better resolution than existing methods for dating admixture, and can be applied to either SNP or sequence data from humans or other species. PMID:21352535

We describe a PCA-based genomescan approach to analyze genome-wide admixture structure, and introduce wavelet transform analysis as a method for estimating the time of admixture. We test the wavelet transform method with simulations and apply it to genome-wide SNP data from eight admixed human populations. The wavelet transform method offers better resolution than existing methods for dating admixture, and can be applied to either SNP or sequence data from humans or other species. PMID:21352535

We present a prospective genome-wide regulatory element database for the sea urchin embryo and the modified chromosome capture-related methodology used to create it. The method we developed is termed GRIP-seq for genome-wide regulatory element immunoprecipitation and combines features of chromosome conformation capture, chromatin immunoprecipitation, and paired-end next-generation sequencing with molecular steps that enrich for active cis-regulatory elements associated with basal transcriptional machinery. The first GRIP-seq database, available to the community, comes from S. purpuratus 24 hpf embryos and takes advantage of the extremely well-characterized cis-regulatory elements in this system for validation. In addition, using the GRIP-seq database, we identify and experimentally validate a novel, intronic cis-regulatory element at the onecut locus. We find GRIP-seq signal sensitively identifies active cis-regulatory elements with a high signal-to-noise ratio for both distal and intronic elements. This promising GRIP-seq protocol has the potential to address a rate-limiting step in resolving comprehensive, predictive network models in all systems. PMID:27389984

The practice of Ayurveda, the traditional medicine of India, is based on the concept of three major constitutional types (Vata, Pitta and Kapha) defined as “Prakriti”. To the best of our knowledge, no study has convincingly correlated genomic variations with the classification of Prakriti. In the present study, we performed genome-wide SNP (single nucleotide polymorphism) analysis (Affymetrix, 6.0) of 262 well-classified male individuals (after screening 3416 subjects) belonging to three Prakritis. We found 52 SNPs (p ≤ 1 × 10−5) were significantly different between Prakritis, without any confounding effect of stratification, after 106 permutations. Principal component analysis (PCA) of these SNPs classified 262 individuals into their respective groups (Vata, Pitta and Kapha) irrespective of their ancestry, which represent its power in categorization. We further validated our finding with 297 Indian population samples with known ancestry. Subsequently, we found that PGM1 correlates with phenotype of Pitta as described in the ancient text of Caraka Samhita, suggesting that the phenotypic classification of India’s traditional medicine has a genetic basis; and its Prakriti-based practice in vogue for many centuries resonates with personalized medicine. PMID:26511157

We initiated a genome-wide search for genes predisposing to schizophrenia by ascertaining 9 families, each containing three to five cases of schizophrenia. The 9 pedigrees were initially genotyped with 329 polymorphic DNA loci distributed throughout the genome. Assuming either autosomal dominant or recessive inheritance, 254 DNA loci yielded lod scores less than -2.0 at {theta} = 0.0, 101 DNA markers gave lod scores less than -2.0 at {theta} = 0.05, while 5 DNA loci produced maximum lod scores greater than 1: D4S35, D14S17, D15S1, D22S84, and D22S55. Of the DNA markers yielding lod scores greater than 1, D4S35 and D22S55 also were suggestive of linkage when the Affected-Pedigree-Member method was used. The families were then genotyped with four highly polymorphic simple sequence repeat markers; possible linkage diminished with DNA markers mapping nearby D4S35, while suggestive evidence of linkage remained with loci in the region of D22S55. Although follow-up investigation of these chromosomal regions may be warranted, our linkage results should be viewed as preliminary observations, as 35 unaffected persons are not past the age of risk. 90 refs., 3 tabs.

The role of positive selection in human evolution remains controversial. On the one hand, scans for positive selection have identified hundreds of candidate loci, and the genome-wide patterns of polymorphism show signatures consistent with frequent positive selection. On the other hand, recent studies have argued that many of the candidate loci are false positives and that most genome-wide signatures of adaptation are in fact due to reduction of neutral diversity by linked deleterious mutations, known as background selection. Here we analyze human polymorphism data from the 1000 Genomes Project and detect signatures of positive selection once we correct for the effects of background selection. We show that levels of neutral polymorphism are lower near amino acid substitutions, with the strongest reduction observed specifically near functionally consequential amino acid substitutions. Furthermore, amino acid substitutions are associated with signatures of recent adaptation that should not be generated by background selection, such as unusually long and frequent haplotypes and specific distortions in the site frequency spectrum. We use forward simulations to argue that the observed signatures require a high rate of strongly adaptive substitutions near amino acid changes. We further demonstrate that the observed signatures of positive selection correlate better with the presence of regulatory sequences, as predicted by the ENCODE Project Consortium, than with the positions of amino acid substitutions. Our results suggest that adaptation was frequent in human evolution and provide support for the hypothesis of King and Wilson that adaptive divergence is primarily driven by regulatory changes. PMID:24619126

Background We performed a genome-widescan of 27,578 CpG loci covering 14,475 genes to identify differentially methylated loci (DML) in colorectal carcinoma (CRC). Methods We used Illumina's Infinium methylation assay in paired DNA samples extracted from 24 fresh frozen CRC tissues and their corresponding normal colon tissues from 24 consecutive diagnosed patients at a tertiary medical center. Results We found a total of 627 DML in CRC covering 513 genes, of which 535 are novel DML covering 465 genes. We also validated the Illumina Infinium methylation data for top-ranking genes by non-bisulfite conversion q-PCR-based methyl profiler assay in a subset of the same samples. We also carried out integration of genome-wide copy number and expression microarray along with methylation profiling to see the functional effect of methylation. Gene Set Enrichment Analysis (GSEA) showed that among the major "gene sets" that are hypermethylated in CRC are the sets: "inhibition of adenylate cyclase activity by G-protein signaling", "Rac guanyl-nucleotide exchange factor activity", "regulation of retinoic acid receptor signaling pathway" and "estrogen receptor activity". Two-level nested cross validation showed that DML-based predictive models may offer reasonable sensitivity (around 89%), specificity (around 95%), positive predictive value (around 95%) and negative predictive value (around 89%), suggesting that these markers may have potential clinical application. Conclusion Our genome-wide methylation study in CRC clearly supports most of the previous findings; additionally we found a large number of novel DML in CRC tissue. If confirmed in future studies, these findings may lead to identification of genomic markers for potential clinical application. PMID:21699707

Aberrant cytosine 5-methylation underlies many deregulated elements of cancer. Among paired non-small cell lung cancers (NSCLC), we sought to profile DNA 5-methyl-cytosine features which may underlie genome-wide deregulation. In one of the more dense interrogations of the methylome, we sampled 1.2 million CpG sites from twenty-four NSCLC tumor (T)-non-tumor (NT) pairs using a methylation-sensitive restriction enzyme- based HELP-microarray assay. We found 225,350 differentially methylated (DM) sites in adenocarcinomas versus adjacent non-tumor tissue that vary in frequency across genomic compartment, particularly notable in gene bodies (GB; p<2.2E-16). Further, when DM was coupled to differential transcriptome (DE) in the same samples, 37,056 differential loci in adenocarcinoma emerged. Approximately 90% of the DM-DE relationships were non-canonical; for example, promoter DM associated with DE in the same direction. Of the canonical changes noted, promoter (PR) DM loci with reciprocal changes in expression in adenocarcinomas included HBEGF, AGER, PTPRM, DPT, CST1, MELK; DM GB loci with concordant changes in expression included FOXM1, FERMT1, SLC7A5, and FAP genes. IPA analyses showed adenocarcinoma-specific promoter DMxDE overlay identified familiar lung cancer nodes [tP53, Akt] as well as less familiar nodes [HBEGF, NQO1, GRK5, VWF, HPGD, CDH5, CTNNAL1, PTPN13, DACH1, SMAD6, LAMA3, AR]. The unique findings from this study include the discovery of numerous candidate The unique findings from this study include the discovery of numerous candidate methylation sites in both PR and GB regions not previously identified in NSCLC, and many non-canonical relationships to gene expression. These DNA methylation features could potentially be developed as risk or diagnostic biomarkers, or as candidate targets for newer methylation locus-targeted preventive or therapeutic agents. PMID:26683690

Aberrant cytosine 5-methylation underlies many deregulated elements of cancer. Among paired non-small cell lung cancers (NSCLC), we sought to profile DNA 5-methyl-cytosine features which may underlie genome-wide deregulation. In one of the more dense interrogations of the methylome, we sampled 1.2 million CpG sites from twenty-four NSCLC tumor (T)–non-tumor (NT) pairs using a methylation-sensitive restriction enzyme- based HELP-microarray assay. We found 225,350 differentially methylated (DM) sites in adenocarcinomas versus adjacent non-tumor tissue that vary in frequency across genomic compartment, particularly notable in gene bodies (GB; p<2.2E-16). Further, when DM was coupled to differential transcriptome (DE) in the same samples, 37,056 differential loci in adenocarcinoma emerged. Approximately 90% of the DM-DE relationships were non-canonical; for example, promoter DM associated with DE in the same direction. Of the canonical changes noted, promoter (PR) DM loci with reciprocal changes in expression in adenocarcinomas included HBEGF, AGER, PTPRM, DPT, CST1, MELK; DM GB loci with concordant changes in expression included FOXM1, FERMT1, SLC7A5, and FAP genes. IPA analyses showed adenocarcinoma-specific promoter DMxDE overlay identified familiar lung cancer nodes [tP53, Akt] as well as less familiar nodes [HBEGF, NQO1, GRK5, VWF, HPGD, CDH5, CTNNAL1, PTPN13, DACH1, SMAD6, LAMA3, AR]. The unique findings from this study include the discovery of numerous candidate The unique findings from this study include the discovery of numerous candidate methylation sites in both PR and GB regions not previously identified in NSCLC, and many non-canonical relationships to gene expression. These DNA methylation features could potentially be developed as risk or diagnostic biomarkers, or as candidate targets for newer methylation locus-targeted preventive or therapeutic agents. PMID:26683690

We propose to use sparsely sampled line scans with a sparsity-based reconstruction method to obtain images in a wide field of view (WFOV) multifocal scanning microscope. In the WFOV microscope, we used a holographically generated irregular focus grid to scan the sample in one dimension and then reconstructed the sample image from line scans by measuring the transmission of the foci through the sample during scanning. The line scans were randomly spaced with average spacing larger than the Nyquist sampling requirement, and the image was recovered with sparsity-based reconstruction techniques. With this scheme, the acquisition data can be significantly reduced and the restriction for equally spaced foci positions can be removed, indicating simpler experimental requirement. We built a prototype system and demonstrated the effectiveness of the reconstruction by recovering microscopic images of a U.S. Air Force target and an onion skin cell microscope slide with 40, 60, and 80% missing data with respect to the Nyquist sampling requirement.

The structure of the human brain is highly heritable, and is thought to be influenced by many common genetic variants, many of which are currently unknown. Recent advances in neuroimaging and genetics have allowed collection of both highly detailed structural brain scans and genome-wide genotype information. This wealth of information presents a new opportunity to find the genes influencing brain structure. Here we explore the relation between 448,293 single nucleotide polymorphisms in each of 31,622 voxels of the entire brain across 740 elderly subjects (mean age±s.d.: 75.52±6.82 years; 438 male) including subjects with Alzheimer's disease, Mild Cognitive Impairment, and healthy elderly controls from the Alzheimer's Disease Neuroimaging Initiative (ADNI). We used tensor-based morphometry to measure individual differences in brain structure at the voxel level relative to a study-specific template based on healthy elderly subjects. We then conducted a genome-wide association at each voxel to identify genetic variants of interest. By studying only the most associated variant at each voxel, we developed a novel method to address the multiple comparisons problem and computational burden associated with the unprecedented amount of data. No variant survived the strict significance criterion, but several genes worthy of further exploration were identified, including CSMD2 and CADPS2. These genes have high relevance to brain structure. This is the first voxelwise genomewide association study to our knowledge, and offers a novel method to discover genetic influences on brain structure. PMID:20171287

Schizophrenia is one of the major psychiatric disorders. It is a disorder of complex inheritance, involving both heritable and environmental factors. DNA methylation is an inheritable epigenetic modification that stably alters gene expression. We reasoned that genetic modifications that are a result of environmental stimuli could also make a contribution. We have performed 26 high-resolution genome-wide methylation array analyses to determine the methylation status of 27,627 CpG islands and compared the data between patients and healthy controls. Methylation profiles of DNAs were analyzed in six pools: 220 schizophrenia patients; 220 age-matched healthy controls; 110 female schizophrenia patients; 110 age-matched healthy females; 110 male schizophrenia patients; 110 age-matched healthy males. We also investigated the methylation status of 20 individual patient DNA samples (eight females and 12 males. We found significant differences in the methylation profile between schizophrenia and control DNA pools. We found new candidate genes that principally participate in apoptosis, synaptic transmission and nervous system development (GABRA2, LIN7B, CASP3). Methylation profiles differed between the genders. In females, the most important genes participate in apoptosis and synaptic transmission (XIAP, GABRD, OXT, KRT7), whereas in the males, the implicated genes in the molecular pathology of the disease were DHX37, MAP2K2, FNDC4 and GIPC1. Data from the individual methylation analyses confirmed, the gender-specific pools results. Our data revealed major differences in methylation profiles between schizophrenia patients and controls and between male and female patients. The dysregulated activity of the candidate genes could play a role in schizophrenia pathogenesis. PMID:25937794

Schizophrenia is one of the major psychiatric disorders. It is a disorder of complex inheritance, involving both heritable and environmental factors. DNA methylation is an inheritable epigenetic modification that stably alters gene expression. We reasoned that genetic modifications that are a result of environmental stimuli could also make a contribution. We have performed 26 high-resolution genome-wide methylation array analyses to determine the methylation status of 27,627 CpG islands and compared the data between patients and healthy controls. Methylation profiles of DNAs were analyzed in six pools: 220 schizophrenia patients; 220 age-matched healthy controls; 110 female schizophrenia patients; 110 age-matched healthy females; 110 male schizophrenia patients; 110 age-matched healthy males. We also investigated the methylation status of 20 individual patient DNA samples (eight females and 12 males. We found significant differences in the methylation profile between schizophrenia and control DNA pools. We found new candidate genes that principally participate in apoptosis, synaptic transmission and nervous system development (GABRA2, LIN7B, CASP3). Methylation profiles differed between the genders. In females, the most important genes participate in apoptosis and synaptic transmission (XIAP, GABRD, OXT, KRT7), whereas in the males, the implicated genes in the molecular pathology of the disease were DHX37, MAP2K2, FNDC4 and GIPC1. Data from the individual methylation analyses confirmed, the gender-specific pools results. Our data revealed major differences in methylation profiles between schizophrenia patients and controls and between male and female patients. The dysregulated activity of the candidate genes could play a role in schizophrenia pathogenesis. PMID:25937794

A proposed telescope would afford high resolution over a narrow field of view (<0.10 ) while scanning over a total field of view nominally 16 wide without need to slew the entire massive telescope structure. The telescope design enables resolution of a 1-m-wide object in a 50- km-wide area of the surface of the Earth as part of a 200-km-wide area field of view monitored from an orbit at an altitude of 700 km. The conceptual design of this telescope could also be adapted to other applications both terrestrial and extraterrestrial in which there are requirements for telescopes that afford both wide- and narrow-field capabilities. In the proposed telescope, the scanning would be effected according to a principle similar to that of the Arecibo radio telescope, in which the primary mirror is stationary with respect to the ground and a receiver is moved across the focal surface of the primary mirror. The proposed telescope would comprise (1) a large spherical primary mirror that would afford high resolution over a narrow field of view and (2) a small displaceable optical relay segment that would be pivoted about the center of an aperture stop to effect the required scanning (see figure). Taken together, both comprise a scanning narrow-angle telescope that does not require slewing the telescope structure. In normal operation, the massive telescope structure would stare at a fixed location on the ground. The inner moveable relay optic would be pivoted to scan the narrower field of view over the wider one, making it possible to retain a fixed telescope orientation, while obtaining high-resolution images over multiple target areas during an interval of 3 to 4 minutes in the intended orbit. The pivoting relay segment of the narrow-angle telescope would include refractive and reflective optical elements, including two aspherical mirrors, to counteract the spherical aberration of the primary mirror. Overall, the combination of the primary mirror and the smaller relay optic

An effort to develop large-aperture, wide-angle-scanning reflectarray antennas for microwave radar and communication systems is underway. In an antenna of this type as envisioned, scanning of the radiated or incident microwave beam would be effected through mechanical rotation of the passive (reflective) patch antenna elements, using microelectromechanical systems (MEMS) stepping rotary actuators typified by piezoelectric micromotors. It is anticipated that the cost, mass, and complexity of such an antenna would be less than, and the reliability greater than, those of an electronically scanned phased-array antenna of comparable beam-scanning capability and angular resolution. In the design and operation of a reflectarray, one seeks to position and orient an array of passive patch elements in a geometric pattern such that, through constructive interference of the reflections from them, they collectively act as an efficient single reflector of radio waves within a desired frequency band. Typically, the patches lie in a common plane and radiation is incident upon them from a feed horn.

Gene duplication is a key factor contributing to phenotype diversity across and within species. Although the availability of complete genomes has led to the extensive study of genomic duplications, the dynamics and variability of gene duplications mediated by retrotransposition are not well understood. Here, we predict mRNA retrotransposition and use comparative genomics to investigate their origin and variability across primates. Analyzing seven anthropoid primate genomes, we found a similar number of mRNA retrotranspositions (∼7,500 retrocopies) in Catarrhini (Old Word Monkeys, including humans), but a surprising large number of retrocopies (∼10,000) in Platyrrhini (New World Monkeys), which may be a by-product of higher long interspersed nuclear element 1 activity in these genomes. By inferring retrocopy orthology, we dated most of the primate retrocopy origins, and estimated a decrease in the fixation rate in recent primate history, implying a smaller number of species-specific retrocopies. Moreover, using RNA-Seq data, we identified approximately 3,600 expressed retrocopies. As expected, most of these retrocopies are located near or within known genes, present tissue-specific and even species-specific expression patterns, and no expression correlation to their parental genes. Taken together, our results provide further evidence that mRNA retrotransposition is an active mechanism in primate evolution and suggest that retrocopies may not only introduce great genetic variability between lineages but also create a large reservoir of potentially functional new genomic loci in primate genomes. PMID:26224704

Gene duplication is a key factor contributing to phenotype diversity across and within species. Although the availability of complete genomes has led to the extensive study of genomic duplications, the dynamics and variability of gene duplications mediated by retrotransposition are not well understood. Here, we predict mRNA retrotransposition and use comparative genomics to investigate their origin and variability across primates. Analyzing seven anthropoid primate genomes, we found a similar number of mRNA retrotranspositions (∼7,500 retrocopies) in Catarrhini (Old Word Monkeys, including humans), but a surprising large number of retrocopies (∼10,000) in Platyrrhini (New World Monkeys), which may be a by-product of higher long interspersed nuclear element 1 activity in these genomes. By inferring retrocopy orthology, we dated most of the primate retrocopy origins, and estimated a decrease in the fixation rate in recent primate history, implying a smaller number of species-specific retrocopies. Moreover, using RNA-Seq data, we identified approximately 3,600 expressed retrocopies. As expected, most of these retrocopies are located near or within known genes, present tissue-specific and even species-specific expression patterns, and no expression correlation to their parental genes. Taken together, our results provide further evidence that mRNA retrotransposition is an active mechanism in primate evolution and suggest that retrocopies may not only introduce great genetic variability between lineages but also create a large reservoir of potentially functional new genomic loci in primate genomes. PMID:26224704

As a well known fact, organisms form larger and complex multimodular (composite or chimeric) and mostly multi-functional proteins through gene fusion of two or more individual genes which have independent evolution histories and functions. We call each of these components a module. The existence of multimodular proteins may improves the efficiency in gene regulation and in cellular functions, and thus may give the host organism advantages in adaptation to environments. Analysis of all gene fusions in present-day organisms should allow us to examine the patterns of gene fusion in context with cellular functions, to trace back the evolution processes from the ancient smaller and uni-functional proteins to the present-day larger and complex multi-functional proteins, and to estimate the minimal number of ancestor proteins that existed in the last common ancestor for all life on earth. Although many multimodular proteins have been experimentally known, identification of gene fusion events systematically at genome scale had not been possible until recently when large number of completed genome sequences have been becoming available. In addition, technical difficulties for such analysis also exist due to the complexity of this biological and evolutionary process. We report from this study a new strategy to computationally identify multimodular proteins using completed genome sequences and the results surveyed from 22 organisms with the data from over 40 organisms to be presented during the meeting. Additional information is contained in the original extended abstract.

A near-field Cassegrain reflector (NFCR) is an effective way to magnify a small phased array into a much larger aperture antenna for limited scan applications. Traditionally, the pattern analysis of NFCR is based on a plane wave approach. This approach simplifies the computation tremendously, but fails to provide design information about the most critical component of the whole antenna system, namely, the feed array. Here, each element in the feed array is considered individually and its diffraction pattern from the subreflector is computed by GTD. The field contributions from all elements are superimposed at the curved main reflector surface, and a physical optics integration is performed to obtain the secondary pattern. Beam-waveguide-fed Cassegrain reflector (BFCR) antennas are increasingly being used in space communication applications. Using a shooting and bouncing ray approach based on geometrical optics and aperture integration, the far-field pattern of the BFCR is calculated. This method is computationally efficient and is not restricted by the number of reflecting surfaces in the antenna configuration. The diffraction loss in the beam waveguide structure is calculated separately by the conventional near-field physical optics integration. The segmented mirror antenna is designed for the radiometer application on the planned NASA Earth Science Geostationary Platforms in the 1990s. The antenna consists of two parts: a regular parabolic dish of 5 m in diameter which converts the radiation from feeds into a collimated beam, and a movable mirror that redirects the beam to a prescribed scan direction. The mirror is composed of 28 segmented planar conducting plates, mostly one square meter in size. Based on a physical optics analysis, we have analyzed the secondary pattern of the antenna. For frequencies between 50 and 230 GHz, and for a scan range of +/- 8^circ (270 beamwidths scan at 230 GHz), the worst calculated beam efficiency is 95%. To cover such a wide

By offering images with high spatial resolution and unique optical absorption contrast, optical-resolution photoacoustic microscopy (OR-PAM) has gained increasing attention in biomedical research. Recent developments in OR-PAM have improved its imaging speed, but have sacrificed either the detection sensitivity or field of view or both. We have developed a wide-field fast-scanning OR-PAM by using a water-immersible MEMS scanning mirror (MEMS-ORPAM). Made of silicon with a gold coating, the MEMS mirror plate can reflect both optical and acoustic beams. Because it uses an electromagnetic driving force, the whole MEMS scanning system can be submerged in water. In MEMS-ORPAM, the optical and acoustic beams are confocally configured and simultaneously steered, which ensures uniform detection sensitivity. A B-scan imaging speed as high as 400 Hz can be achieved over a 3 mm scanning range. A diffraction-limited lateral resolution of 2.4 μm in water and a maximum imaging depth of 1.1 mm in soft tissue have been experimentally determined. Using the system, we imaged the flow dynamics of both red blood cells and carbon particles in a mouse ear in vivo. By using Evans blue dye as the contrast agent, we also imaged the flow dynamics of lymphatic vessels in a mouse tail in vivo. The results show that MEMS-OR-PAM could be a powerful tool for studying highly dynamic and time-sensitive biological phenomena.

We describe modifications of a pulsed rotating supersonic beam source that improve performance, particularly increasing the beam density and sharpening the pulse profiles. As well as providing the familiar virtues of a supersonic molecular beam (high intensity, narrowed velocity distribution, and drastic cooling of rotation and vibration), the rotating source enables scanning the translational velocity over a wide range. Thereby, beams of any atom or molecule available as a gas can be slowed or speeded. Using Xe beams in the slowing mode, we have obtained lab speeds down to about 40 ± 5 m/s with density near 1011 cm-3 and in the speeding mode lab speeds up to about 660 m/s and density near 1014 cm-3. We discuss some congenial applications. Providing low lab speeds can markedly enhance experiments using electric or magnetic fields to deflect, steer, or further slow polar or paramagnetic molecules. The capability to scan molecular speeds facilitates merging velocities with a codirectional partner beam, enabling study of collisions at very low relative kinetic energies, without requiring either beam to be slow.

A surface analysis system has been newly developed with combination of ultrahigh vacuum scanning electron microscope (SEM) and wide-movable scanning tunneling microscope (STM). The basic performance is experimentally demonstrated. These SEM and STM images are clear enough to obtain details of surface structures. The STM unit moves horizontally over several millimeters by sliding motion of PZT actuators. The motion resolution is proved to be submicrometers. The STM tip mounted on another PZT scanner can be guided to a specific object on the sample surface during SEM observation. In the observation of a Si(111) surface rapidly cooled from high temperature, the STM tip was accurately guided to an isolated atomic step and slightly moved along it during SEM observation. The STM observation shows an asymmetry of the (7x7)-transformed region along the step between the upper and lower terraces. (7x7) bands continuously formed along the edge of terraces, while (7x7) domains distributed on the terraces slightly far from the step. These experiments show the wide-movable STM unit resolves a gap of observation area between SEM and STM and the system enables a specific object found in the SEM image to be observed easily by STM.

To increase plumage color uniformity and understand the genetic background of Korean chickens, we performed a genome-wide association study of different plumage color in Korean native chickens. We analyzed 60K SNP chips on 279 chickens with GEMMA methods for GWAS and estimated the genetic heritability for plumage color. The estimated heritability suggests that plumage coloration is a polygenic trait. We found new loci associated with feather pigmentation at the genome-wide level and from the results infer that there are additional genetic effect for plumage color. The results will be used for selecting and breeding chicken for plumage color uniformity. PMID:25049737

Background: Epigenome-wide association scans (EWAS) are under way for many complex human traits, but EWAS power has not been fully assessed. We investigate power of EWAS to detect differential methylation using case-control and disease-discordant monozygotic (MZ) twin designs with genome-wide DNA methylation arrays. Methods and Results: We performed simulations to estimate power under the case-control and discordant MZ twin EWAS study designs, under a range of epigenetic risk effect sizes and conditions. For example, to detect a 10% mean methylation difference between affected and unaffected subjects at a genome-wide significance threshold of P = 1 × 10−6, 98 MZ twin pairs were required to reach 80% EWAS power, and 112 cases and 112 controls pairs were needed in the case-control design. We also estimated the minimum sample size required to reach 80% EWAS power under both study designs. Our analyses highlighted several factors that significantly influenced EWAS power, including sample size, epigenetic risk effect size, the variance of DNA methylation at the locus of interest and the correlation in DNA methylation patterns within the twin sample. Conclusions: We provide power estimates for array-based DNA methylation EWAS under case-control and disease-discordant MZ twin designs, and explore multiple factors that impact on EWAS power. Our results can help guide EWAS experimental design and interpretation for future epigenetic studies. PMID:25972603

Genome-wide association studies (GWAS) have been extensively used to study common complex diseases such as coronary artery disease (CAD), revealing 153 suggestive CAD loci, of which at least 46 have been validated as having genome-wide significance. However, these loci collectively explain <10% of the genetic variance in CAD. Thus, we must address the key question of what factors constitute the remaining 90% of CAD heritability. We review possible limitations of GWAS, and contextually consider some candidate CAD loci identified by this method. Looking ahead, we propose systems genetics as a complementary approach to unlocking the CAD heritability and etiology. Systems genetics builds network models of relevant molecular processes by combining genetic and genomic datasets to ultimately identify key “drivers” of disease. By leveraging systems-based genetic approaches, we can help reveal the full genetic basis of common complex disorders, enabling novel diagnostic and therapeutic opportunities. PMID:25720628

High-throughput sequencing technologies have allowed many gene locus-level molecular biology assays to become genome-wide profiling methods. DNA-cleaving enzymes such as DNase I have been used to probe accessible chromatin. The accessible regions contain functional regulatory sites, including promoters, insulators and enhancers. Deep sequencing of DNase-seq libraries and computational analysis of the cut profiles have been used to infer protein occupancy in the genome at the nucleotide level, a method introduced as 'digital genomic footprinting'. The approach has been proposed as an attractive alternative to the analysis of transcription factors (TFs) by chromatin immunoprecipitation followed by sequencing (ChIP-seq), and in theory it should overcome antibody issues, poor resolution and batch effects. Recent reports point to limitations of the DNase-based genomic footprinting approach and call into question the scope of detectable protein occupancy, especially for TFs with short-lived chromatin binding. The genomics community is grappling with issues concerning the utility of genomic footprinting and is reassessing the proposed approaches in terms of robust deliverables. Here we summarize the consensus as well as different views emerging from recent reports, and we describe the remaining issues and hurdles for genomic footprinting. PMID:26914206

The development of the dorsal vessel in Drosophila is one of the first systems in which key mechanisms regulating cardiogenesis have been defined in great detail at the genetic and molecular level. Due to evolutionary conservation, these findings have also provided major inputs into studies of cardiogenesis in vertebrates. Many of the major components that control Drosophila cardiogenesis were discovered based on candidate gene approaches and their functions were defined by employing the outstanding genetic tools and molecular techniques available in this system. More recently, approaches have been taken that aim to interrogate the entire genome in order to identify novel components and describe genomic features that are pertinent to the regulation of heart development. Apart from classical forward genetic screens, the availability of the thoroughly annotated Drosophila genome sequence made new genome-wide approaches possible, which include the generation of massive numbers of RNA interference (RNAi) reagents that were used in forward genetic screens, as well as studies of the transcriptomes and proteomes of the developing heart under normal and experimentally manipulated conditions. Moreover, genome-wide chromatin immunoprecipitation experiments have been performed with the aim to define the full set of genomic binding sites of the major cardiogenic transcription factors, their relevant target genes, and a more complete picture of the regulatory network that drives cardiogenesis. This review will give an overview on these genome-wide approaches to Drosophila heart development and on computational analyses of the obtained information that ultimately aim to provide a description of this process at the systems level. PMID:27294102

We examined the role of common genetic variation in schizophrenia in a genome-wide association study of substantial size: a stage 1 discovery sample of 21,856 individuals of European ancestry and a stage 2 replication sample of 29,839 independent subjects. The combined stage 1 and 2 analysis yielded genome-wide significant associations with schizophrenia for seven loci, five of which are new (1p21.3, 2q32.3, 8p23.2, 8q21.3 and 10q24.32-q24.33) and two of which have been previously implicated (6p21.32-p22.1 and 18q21.2). The strongest new finding (P = 1.6 × 10−11) was with rs1625579 within an intron of a putative primary transcript for MIR137 (microRNA 137), a known regulator of neuronal development. Four other schizophrenia loci achieving genome-wide significance contain predicted targets of MIR137, suggesting MIR137-mediated dysregulation as a previously unknown etiologic mechanism in schizophrenia. In a joint analysis with a bipolar disorder sample (16,374 affected individuals and 14,044 controls), three loci reached genome-wide significance: CACNA1C (rs4765905, P = 7.0 × 10−9), ANK3 (rs10994359, P = 2.5 × 10−8) and the ITIH3-ITIH4 region (rs2239547, P = 7.8 × 10−9). PMID:21926974

Genome-Wide Association Studies shed light on the identification of genes underlying human diseases and agriculturally important traits. This potential has been shadowed by false positive findings. The Mixed Linear Model (MLM) method is flexible enough to simultaneously incorporate population struct...

The past decades have witnessed a surge of discoveries revealing RNA regulation as a central player in cellular processes. RNAs are regulated by RNA-binding proteins (RBPs) at all post-transcriptional stages, including splicing, transportation, stabilization and translation. Defects in the functions of these RBPs underlie a broad spectrum of human pathologies. Systematic identification of RBP functional targets is among the key biomedical research questions and provides a new direction for drug discovery. The advent of cross-linking immunoprecipitation coupled with high-throughput sequencing (genome-wide CLIP) technology has recently enabled the investigation of genome-wide RBP–RNA binding at single base-pair resolution. This technology has evolved through the development of three distinct versions: HITS-CLIP, PAR-CLIP and iCLIP. Meanwhile, numerous bioinformatics pipelines for handling the genome-wide CLIP data have also been developed. In this review, we discuss the genome-wide CLIP technology and focus on bioinformatics analysis. Specifically, we compare the strengths and weaknesses, as well as the scopes, of various bioinformatics tools. To assist readers in choosing optimal procedures for their analysis, we also review experimental design and procedures that affect bioinformatics analyses. PMID:25958398

Soybean aphid is the most damaging insect pest of soybean in the Upper Midwest and is primarily controlled by insecticides. Soybean aphid resistance (i.e., Rag genes) has been documented in some soybean lines at chromosomes 6, 7, 13, and 16, but more sources of resistance are needed. Genome-wide ass...

MicroRNAs (miRNAs) are small non-coding RNAs that play essential roles in plant growth and development. We conducted a genome-wide survey of maize miRNA genes, characterizing their structure, expression, and evolution. Computational approaches based on homology and secondary structure modeling ident...

Natural selection can act on all the expressed genes of an individual, leaving signatures of genetic differentiation or diversity at many loci across the genome. New power to assay these genome-wide effects of selection comes from associating multi-locus patterns of polymorphism with gene expression and function. Here, we performed one of the first genome-wide surveys in a marine species, comparing purple sea urchins, Strongylocentrotus purpuratus, from two distant locations along the species' wide latitudinal range. We examined 9112 polymorphic loci from upstream non-coding and coding regions of genes for signatures of selection with respect to gene function and tissue- and ontogenetic gene expression. We found that genetic differentiation (F(ST)) varied significantly across functional gene classes. The strongest enrichment occurred in the upstream regions of E3 ligase genes, enzymes known to regulate protein abundance during development and environmental stress. We found enrichment for high heterozygosity in genes directly involved in immune response, particularly NALP genes, which mediate pro-inflammatory signals during bacterial infection. We also found higher heterozygosity in immune genes in the southern population, where disease incidence and pathogen diversity are greater. Similar to the major histocompatibility complex in mammals, balancing selection may enhance genetic diversity in the innate immune system genes of this invertebrate. Overall, our results show that how genome-wide polymorphism data coupled with growing databases on gene function and expression can combine to detect otherwise hidden signals of selection in natural populations. PMID:21993504

Genome-wide linkage analysis using multiple traits and statistical software packages is a tedious process which requires a significant amount of manual file manipulation. Different linkage analysis programs require different input file formats, making the task of analyzing data with multiple methods even more time-consuming. We have developed a software tool, AUTOGSCAN, that automates file formatting, the running of statistical analyses, and the summarizing of resulting statistics for whole genomescans with a push of a button, using several independent, and often idiosyncratic, statistical software packages such as MERLIN, SOLAR and GENEHUNTER. We also describe a program, ANALYZE, designed to run qualitative linkage analysis with several different statistical strategies and programs to efficiently screen for linkage and linkage disequilibrium for a given discrete trait. The ANALYZE program can also be used by AUTOGSCAN in a genome-wide sense. PMID:15836805

Here we present a genome-wide method for de novo identification of enhancer regions. This approach enables massively parallel empirical investigation of DNA sequences that mediate transcriptional activation and provides a platform for discovery of regulatory modules capable of driving context-specific gene expression. The method links fragmented genomic DNA to the transcription of randomer molecule identifiers and measures the functional enhancer activity of the library by massively parallel sequencing. We transfected a Drosophila melanogaster library into S2 cells in normoxia and hypoxia, and assayed 4,599,881 genomic DNA fragments in parallel. The locations of the enhancer regions strongly correlate with genes up-regulated after hypoxia and previously described enhancers. Novel enhancer regions were identified and integrated with RNAseq data and transcription factor motifs to describe the hypoxic response on a genome-wide basis as a complex regulatory network involving multiple stress-response pathways. This work provides a novel method for high-throughput assay of enhancer activity and the genome-scale identification of 31 hypoxia-activated enhancers in Drosophila. PMID:26713262

Many genes can play a role in multiple biological processes or molecular functions. Identifying multifunctional genes at the genome-wide level and studying their properties can shed light upon the complexity of molecular events that underpin cellular functioning, thereby leading to a better understanding of the functional landscape of the cell. However, to date, genome-wide analysis of multifunctional genes (and the proteins they encode) has been limited. Here we introduce a computational approach that uses known functional annotations to extract genes playing a role in at least two distinct biological processes. We leverage functional genomics data sets for three organisms—H. sapiens, D. melanogaster, and S. cerevisiae—and show that, as compared to other annotated genes, genes involved in multiple biological processes possess distinct physicochemical properties, are more broadly expressed, tend to be more central in protein interaction networks, tend to be more evolutionarily conserved, and are more likely to be essential. We also find that multifunctional genes are significantly more likely to be involved in human disorders. These same features also hold when multifunctionality is defined with respect to molecular functions instead of biological processes. Our analysis uncovers key features about multifunctional genes, and is a step towards a better genome-wide understanding of gene multifunctionality. PMID:26436655

Many genes can play a role in multiple biological processes or molecular functions. Identifying multifunctional genes at the genome-wide level and studying their properties can shed light upon the complexity of molecular events that underpin cellular functioning, thereby leading to a better understanding of the functional landscape of the cell. However, to date, genome-wide analysis of multifunctional genes (and the proteins they encode) has been limited. Here we introduce a computational approach that uses known functional annotations to extract genes playing a role in at least two distinct biological processes. We leverage functional genomics data sets for three organisms--H. sapiens, D. melanogaster, and S. cerevisiae--and show that, as compared to other annotated genes, genes involved in multiple biological processes possess distinct physicochemical properties, are more broadly expressed, tend to be more central in protein interaction networks, tend to be more evolutionarily conserved, and are more likely to be essential. We also find that multifunctional genes are significantly more likely to be involved in human disorders. These same features also hold when multifunctionality is defined with respect to molecular functions instead of biological processes. Our analysis uncovers key features about multifunctional genes, and is a step towards a better genome-wide understanding of gene multifunctionality. PMID:26436655

Alternative splicing is a highly regulated process which generates transcriptome and proteome diversity through the skipping or inclusion of exons within gene loci. Identification of aberrant alternative splicing associated with human diseases has become feasible with the development of new genomic technologies and powerful bioinformatics. We have previously reported genome-wide gene alterations in the neocortex of a well-characterized cohort of Alzheimer's disease (AD) patients and matched elderly controls using a commercial exon microarray platform [1]. Here, we provide detailed description of analyses aimed at identifying differential alternative splicing events associated with AD. PMID:26484111

The genome structure, ancestry and instability of the brewing yeast strains have received considerable attention. The hybrid nature of brewing lager yeast strains provides adaptive potential but yields genome instability which can adversely affect fermentation performance. The requirement to differentiate between production strains and assess master cultures for genomic instability has led to significant adoption of specialized molecular tool kits by the industry. Furthermore, the development of genome-wide transcriptional and protein expression technologies has generated significant interest from brewers. The opportunity presented to explore, and the concurrent requirement to understand both, the constraints and potential of their strains to generate existing and new products during fermentation is discussed. PMID:17879324

The plant metabolome is the readout of plant physiological status and is regarded as the bridge between the genome and the phenome of plants. Unraveling the natural variation and the underlying genetic basis of plant metabolism has received increasing interest from plant biologists. Enabled by the recent advances in high-throughput profiling and genotyping technologies, metabolite-based genome-wide association study (mGWAS) has emerged as a powerful alternative forward genetics strategy to dissect the genetic and biochemical bases of metabolism in model and crop plants. In this review, recent progress and applications of mGWAS in understanding the genetic control of plant metabolism and in interactive functional genomics and metabolomics are presented. Further directions and perspectives of mGWAS in plants are also discussed. PMID:25637954

Determination of cellular DNA damage has so far been limited to global assessment of genome integrity whereas nucleotide-level mapping has been restricted to specific loci by the use of specific primers. Therefore, only limited DNA sequences can be studied and novel regions of genomic instability can hardly be discovered. Using a well-characterized yeast model, we describe a straightforward strategy to map genome-wide DNA strand breaks without compromising nucleotide-level resolution. This technique, termed “damaged DNA immunoprecipitation” (dDIP), uses immunoprecipitation and the terminal deoxynucleotidyl transferase-mediated dUTP-biotin end-labeling (TUNEL) to capture DNA at break sites. When used in combination with microarray or next-generation sequencing technologies, dDIP will allow researchers to map genome-wide DNA strand breaks as well as other types of DNA damage and to establish a clear profiling of altered genes and/or intergenic sequences in various experimental conditions. This mapping technique could find several applications for instance in the study of aging, genotoxic drug screening, cancer, meiosis, radiation and oxidative DNA damage. PMID:21364894

Predicting resource allocation between cell processes is the primary step towards decoding the evolutionary constraints governing bacterial growth under various conditions. Quantitative prediction at genome-scale remains a computational challenge as current methods are limited by the tractability of the problem or by simplifying hypotheses. Here, we show that the constraint-based modeling method Resource Balance Analysis (RBA), calibrated using genome-wide absolute protein quantification data, accurately predicts resource allocation in the model bacterium Bacillus subtilis for a wide range of growth conditions. The regulation of most cellular processes is consistent with the objective of growth rate maximization except for a few suboptimal processes which likely integrate more complex objectives such as coping with stressful conditions and survival. As a proof of principle by using simulations, we illustrated how calibrated RBA could aid rational design of strains for maximizing protein production, offering new opportunities to investigate design principles in prokaryotes and to exploit them for biotechnological applications. PMID:26498510

Background Phenomena such as incomplete lineage sorting, horizontal gene transfer, gene duplication and subsequent sub- and neo-functionalisation can result in distinct local phylogenetic relationships that are discordant with species phylogeny. In order to assess the possible biological roles for these subdivisions, they must first be identified and characterised, preferably on a large scale and in an automated fashion. Results We developed Saguaro, a combination of a Hidden Markov Model (HMM) and a Self Organising Map (SOM), to characterise local phylogenetic relationships among aligned sequences using cacti, matrices of pair-wise distance measures. While the HMM determines the genomic boundaries from aligned sequences, the SOM hypothesises new cacti in an unsupervised and iterative fashion based on the regions that were modelled least well by existing cacti. After testing the software on simulated data, we demonstrate the utility of Saguaro by testing two different data sets: (i) 181 Dengue virus strains, and (ii) 5 primate genomes. Saguaro identifies regions under lineage-specific constraint for the first set, and genomic segments that we attribute to incomplete lineage sorting in the second dataset. Intriguingly for the primate data, Saguaro also classified an additional ~3% of the genome as most incompatible with the expected species phylogeny. A substantial fraction of these regions was found to overlap genes associated with both the innate and adaptive immune systems. Conclusions Saguaro detects distinct cacti describing local phylogenetic relationships without requiring any a priori hypotheses. We have successfully demonstrated Saguaro’s utility with two contrasting data sets, one containing many members with short sequences (Dengue viral strains: n = 181, genome size = 10,700 nt), and the other with few members but complex genomes (related primate species: n = 5, genome size = 3 Gb), suggesting that the software is applicable to a wide variety of

Nucleosomes contribute to compacting the genome into the nucleus and regulate the physical access of regulatory proteins to DNA either directly or through the epigenetic modifications of the histone tails. Precise mapping of nucleosome positioning across the genome is, therefore, essential to understanding the genome regulation. In recent years, several experimental protocols have been developed for this purpose that include the enzymatic digestion, chemical cleavage or immunoprecipitation of chromatin followed by next-generation sequencing of the resulting DNA fragments. Here, we compare the performance and resolution of these methods from the initial biochemical steps through the alignment of the millions of short-sequence reads to a reference genome to the final computational analysis to generate genome-wide maps of nucleosome occupancy. Because of the lack of a unified protocol to process data sets obtained through the different approaches, we have developed a new computational tool (NUCwave), which facilitates their analysis, comparison and assessment and will enable researchers to choose the most suitable method for any particular purpose. NUCwave is freely available at http://nucleosome.usal.es/nucwave along with a step-by-step protocol for its use. PMID:25296770

Many economically important traits in plant breeding have low heritability or are difficult to measure. For these traits, genomic selection has attractive features and may boost genetic gains. Our goal was to evaluate alternative scenarios to implement genomic selection for yield components in soybean (Glycine max L. merr). We used a nested association panel with cross validation to evaluate the impacts of training population size, genotyping density, and prediction model on the accuracy of genomic prediction. Our results indicate that training population size was the factor most relevant to improvement in genome-wide prediction, with greatest improvement observed in training sets up to 2000 individuals. We discuss assumptions that influence the choice of the prediction model. Although alternative models had minor impacts on prediction accuracy, the most robust prediction model was the combination of reproducing kernel Hilbert space regression and BayesB. Higher genotyping density marginally improved accuracy. Our study finds that breeding programs seeking efficient genomic selection in soybeans would best allocate resources by investing in a representative training set. PMID:27317786

Genome-wide association studies (GWAS) are being conducted at an unprecedented rate in population-based cohorts and have increased our understanding of the pathophysiology of complex disease. The recent application of GWAS to clinic-based cohorts has also yielded genetic predictors of clinical outcomes. Regardless of context, the practical utility of this information will ultimately depend upon the quality of the original data. Quality control (QC) procedures for GWAS are computationally intensive, operationally challenging, and constantly evolving. With each new dataset, new realities are discovered about GWAS data and best practices continue to be developed. The Genomics Workgroup of the National Human Genome Research Institute (NHGRI) funded electronic Medical Records and Genomics (eMERGE) network has invested considerable effort in developing strategies for QC of these data. The lessons learned by this group will be valuable for other investigators dealing with large scale genomic datasets. Here we enumerate some of the challenges in QC of GWAS data and describe the approaches that the eMERGE network is using for quality assurance in GWAS data, thereby minimizing potential bias and error in GWAS results. In this protocol we discuss common issues associated with QC of GWAS data, including data file formats, software packages for data manipulation and analysis, sex chromosome anomalies, sample identity, sample relatedness, population substructure, batch effects, and marker quality. We propose best practices and discuss areas of ongoing and future research. PMID:21234875

Many economically important traits in plant breeding have low heritability or are difficult to measure. For these traits, genomic selection has attractive features and may boost genetic gains. Our goal was to evaluate alternative scenarios to implement genomic selection for yield components in soybean (Glycine max L. merr). We used a nested association panel with cross validation to evaluate the impacts of training population size, genotyping density, and prediction model on the accuracy of genomic prediction. Our results indicate that training population size was the factor most relevant to improvement in genome-wide prediction, with greatest improvement observed in training sets up to 2000 individuals. We discuss assumptions that influence the choice of the prediction model. Although alternative models had minor impacts on prediction accuracy, the most robust prediction model was the combination of reproducing kernel Hilbert space regression and BayesB. Higher genotyping density marginally improved accuracy. Our study finds that breeding programs seeking efficient genomic selection in soybeans would best allocate resources by investing in a representative training set. PMID:27317786

Background Methylation of CpG dinucleotides is a fundamental mechanism of epigenetic regulation in eukaryotic genomes. Development of methods for rapid genomewide methylation profiling will greatly facilitate both hypothesis and discovery driven research in the field of epigenetics. In this regard, a single molecule approach to methylation profiling offers several unique advantages that include elimination of chemical DNA modification steps and PCR amplification. Results A single molecule approach is presented for the discernment of methylation profiles, based on optical mapping. We report results from a series of pilot studies demonstrating the capabilities of optical mapping as a platform for methylation profiling of whole genomes. Optical mapping was used to discern the methylation profile from both an engineered and wild type Escherichia coli. Furthermore, the methylation status of selected loci within the genome of human embryonic stem cells was profiled using optical mapping. Conclusion The optical mapping platform effectively detects DNA methylation patterns. Due to single molecule detection, optical mapping offers significant advantages over other technologies. This advantage stems from obviation of DNA modification steps, such as bisulfite treatment, and the ability of the platform to assay repeat dense regions within mammalian genomes inaccessible to techniques using array-hybridization technologies. PMID:18667073

A key challenge in genomics is to identify genetic variants that distinguish patients with different survival time following diagnosis or treatment. While the log-rank test is widely used for this purpose, nearly all implementations of the log-rank test rely on an asymptotic approximation that is not appropriate in many genomics applications. This is because: the two populations determined by a genetic variant may have very different sizes; and the evaluation of many possible variants demands highly accurate computation of very small p-values. We demonstrate this problem for cancer genomics data where the standard log-rank test leads to many false positive associations between somatic mutations and survival time. We develop and analyze a novel algorithm, Exact Log-rank Test (ExaLT), that accurately computes the p-value of the log-rank statistic under an exact distribution that is appropriate for any size populations. We demonstrate the advantages of ExaLT on data from published cancer genomics studies, finding significant differences from the reported p-values. We analyze somatic mutations in six cancer types from The Cancer Genome Atlas (TCGA), finding mutations with known association to survival as well as several novel associations. In contrast, standard implementations of the log-rank test report dozens-hundreds of likely false positive associations as more significant than these known associations. PMID:25950620

Schizophrenia is a common, clinically heterogeneous disorder associated with lifelong morbidity and early mortality. Several genetic variants associated with schizophrenia have been identified, but the majority of the heritability remains unknown. In this study, we report on a case-control sample of Ashkenazi Jews (AJ), a founder population that may provide additional insights into genetic etiology of schizophrenia. We performed a genome-wide association analysis (GWAS) of 592 cases and 505 controls of AJ ancestry ascertained in the US. Subsequently, we performed a meta-analysis with an Israeli AJ sample of 913 cases and 1640 controls, followed by a meta-analysis and polygenic risk scoring using summary results from Psychiatric GWAS Consortium 2 schizophrenia study. The U.S. AJ sample showed strong evidence of polygenic inheritance (pseudo-R(2) ∼9.7%) and a SNP-heritability estimate of 0.39 (P = 0.00046). We found no genome-wide significant associations in the U.S. sample or in the combined US/Israeli AJ meta-analysis of 1505 cases and 2145 controls. The strongest AJ specific associations (P-values in 10(-6) -10(-7) range) were in the 22q 11.2 deletion region and included the genes TBX1, GLN1, and COMT. Supportive evidence (meta P genome-wide significant findings, including the HLA region, CNTN4, IMMP2L, and GRIN2A. The meta-analysis of the U.S. sample with the PGC2 results provided initial genome-wide significant evidence for six new loci. Among the novel potential susceptibility genes is PEPD, a gene involved in proline metabolism, which is associated with a Mendelian disorder characterized by developmental delay and cognitive deficits. PMID:26198764

This paper provides details on the necessary steps to assess and control data in genome wide association studies (GWAS) using genotype information on a large number of genetic markers for large number of individuals. Due to varied study designs and genotyping platforms between multiple sites/projects as well as potential genotyping errors, it is important to ensure high quality data. Scripts and directions are provided to facilitate others in this process.

Pathological shifts of the human microbiome are characteristic of many diseases, including chronic periodontitis. To date, there is limited evidence on host genetic risk loci associated with periodontal pathogen colonization. We conducted a genome-wide association (GWA) study among 1,020 white participants of the Atherosclerosis Risk in Communities Study, whose periodontal diagnosis ranged from healthy to severe chronic periodontitis, and for whom "checkerboard" DNA-DNA hybridization quantification of 8 periodontal pathogens was performed. We examined 3 traits: "high red" and "high orange" bacterial complexes, and "high" Aggregatibacter actinomycetemcomitans (Aa) colonization. Genotyping was performed on the Affymetrix 6.0 platform. Imputation to 2.5 million markers was based on HapMap II-CEU, and a multiple-test correction was applied (genome-wide threshold of p < 5 × 10(-8)). We detected no genome-wide significant signals. However, 13 loci, including KCNK1, FBXO38, UHRF2, IL33, RUNX2, TRPS1, CAMTA1, and VAMP3, provided suggestive evidence (p < 5 × 10(-6)) of association. All associations reported for "red" and "orange" complex microbiota, but not for Aa, had the same effect direction in a second sample of 123 African-American participants. None of these polymorphisms was associated with periodontitis diagnosis. Investigations replicating these findings may lead to an improved understanding of the complex nature of host-microbiome interactions that characterizes states of health and disease. PMID:22699663

Background Depression is a heritable trait that exists on a continuum of varying severity and duration. Yet, the search for genetic variants associated with depression has had few successes. We exploit the entire continuum of depression to find common variants for depressive symptoms. Methods In this genome-wide association study, we combined the results of 17 population-based studies assessing depressive symptoms with the Center for Epidemiological Studies Depression Scale. Replication of the independent top hits (p < 1 × 10−5) was performed in five studies assessing depressive symptoms with other instruments. In addition, we performed a combined meta-analysis of all 22 discovery and replication studies. Results The discovery sample comprised 34,549 individuals (mean age of 66.5) and no loci reached genome-wide significance (lowest p = 1.05 × 10−7). Seven independent single nucleotide polymorphisms were considered for replication. In the replication set (n = 16,709), we found suggestive association of one single nucleotide polymorphism with depressive symptoms (rs161645, 5q21, p = 9.19 × 10−3). This 5q21 region reached genome-wide significance (p = 4.78 × 10−8) in the overall meta-analysis combining discovery and replication studies (n = 51,258). Conclusions The results suggest that only a large sample comprising more than 50,000 subjects may be sufficiently powered to detect genes for depressive symptoms. PMID:23290196

Objective To report the genome-wide significant and/or replicable risk variants for alcohol dependence and explore their potential biological functions. Methods We searched in PubMed for all genome-wide association studies (GWASs) of alcohol dependence. The following three types of the results were extracted: (1) genome-wide significant associations in an individual sample, the combined samples, or the meta-analysis (p<5×10−8); (2) top-ranked associations in an individual sample (p<10−5) that were nominally replicated in other samples (p<0.05); and (3) nominally replicable associations across at least three independent GWAS samples (p<0.05). These results were meta-analyzed. cis-eQTLs in human, RNA expression in rat and mouse brain and bioinformatics properties of all of these risk variants were analyzed. Results The variants located within ADH cluster were significantly associated with alcohol dependence at genome-wide level (p<5×10−8) in at least one sample. Some associations with the ADH cluster were replicable across six independent GWAS samples. The variants located within or near SERINC2, KIAA0040, MREG-PECR or PKNOX2 were significantly associated with alcohol dependence at genome-wide level (p<5×10−8) in meta-analysis or combined samples, and these associations were replicable across at least one sample. The associations with the variants within NRD1, GPD1L-CMTM8 or MAP3K9-PCNX were suggestive (5×10−8

Background Even before having its genome sequence published in 2004, Kluyveromyces lactis had long been considered a model organism for studies in genetics and physiology. Research on Kluyveromyces lactis is quite advanced and this yeast species is one of the few with which it is possible to perform formal genetic analysis. Nevertheless, until now, no complete metabolic functional annotation has been performed to the proteins encoded in the Kluyveromyces lactis genome. Results In this work, a new metabolic genome-wide functional re-annotation of the proteins encoded in the Kluyveromyces lactis genome was performed, resulting in the annotation of 1759 genes with metabolic functions, and the development of a methodology supported by merlin (software developed in-house). The new annotation includes novelties, such as the assignment of transporter superfamily numbers to genes identified as transporter proteins. Thus, the genes annotated with metabolic functions could be exclusively enzymatic (1410 genes), transporter proteins encoding genes (301 genes) or have both metabolic activities (48 genes). The new annotation produced by this work largely surpassed the Kluyveromyces lactis currently available annotations. A comparison with KEGG’s annotation revealed a match with 844 (~90%) of the genes annotated by KEGG, while adding 850 new gene annotations. Moreover, there are 32 genes with annotations different from KEGG. Conclusions The methodology developed throughout this work can be used to re-annotate any yeast or, with a little tweak of the reference organism, the proteins encoded in any sequenced genome. The new annotation provided by this study offers basic knowledge which might be useful for the scientific community working on this model yeast, because new functions have been identified for the so-called metabolic genes. Furthermore, it served as the basis for the reconstruction of a compartmentalized, genome-scale metabolic model of Kluyveromyces lactis, which is

Over the two past decades, a significant number of studies have observed animal growth traits to examine animal genetic mechanisms due to their ease of measurement and high heritability. Chicken which has a significant impact on fundamental biology is a major source of protein worldwide, making it an ideal model for examining animal growth trait development. The genetic mechanisms of chicken growth traits have been studied using quantitative trait loci mapping through genome-scan and candidate gene approaches, genome-wide association studies (GWAS), comparative genomic strategies, microRNA (miRNA) regulation of growth development analysis, and epigenomic analysis. This review focuses on chicken GWAS and miRNA regulation of growth traits. Several recently published GWAS reports showed that most genome-wide significant single nucleotide polymorphisms are located on chromosomes 1 and 4 in chickens. Chicken growth, particularly skeletal muscle growth and development, is greatly regulated by miRNA. Using dwarf and normal chickens, let-7b was found to be involved in determining chicken dwarf phenotypes by regulating growth hormone receptor gene expression. PMID:24082823

In the analysis of DNA sequences on related individuals, most methods strive to incorporate as much information as possible, with little or no attention paid to the issue of statistical significance. For example, a modern workstation can easily handle the computations needed to perform a large-scale genome-wide inheritance-by-descent (IBD) scan, but accurate assessment of the significance of that scan is often hindered by inaccurate approximations and computationally intensive simulation. To address these issues, we developed gLOD—a test of co-segregation that, for large samples, models chromosome-specific IBD statistics as a collection of stationary Gaussian processes. With this simple model, the parametric bootstrap yields an accurate and rapid assessment of significance—the genome-wide corrected P-value. Furthermore, we show that (i) under the null hypothesis, the limiting distribution of the gLOD is the standard Gumbel distribution; (ii) our parametric bootstrap simulator is approximately 40 000 times faster than gene-dropping methods, and it is more powerful than methods that approximate the adjusted P-value; and, (iii) the gLOD has the same statistical power as the widely used maximum Kong and Cox LOD. Thus, our approach gives researchers the ability to determine quickly and accurately the significance of most large-scale IBD scans, which may contain multiple traits, thousands of families and tens of thousands of DNA sequences. PMID:27245422

In the analysis of DNA sequences on related individuals, most methods strive to incorporate as much information as possible, with little or no attention paid to the issue of statistical significance. For example, a modern workstation can easily handle the computations needed to perform a large-scale genome-wide inheritance-by-descent (IBD) scan, but accurate assessment of the significance of that scan is often hindered by inaccurate approximations and computationally intensive simulation. To address these issues, we developed gLOD-a test of co-segregation that, for large samples, models chromosome-specific IBD statistics as a collection of stationary Gaussian processes. With this simple model, the parametric bootstrap yields an accurate and rapid assessment of significance-the genome-wide corrected P-value. Furthermore, we show that (i) under the null hypothesis, the limiting distribution of the gLOD is the standard Gumbel distribution; (ii) our parametric bootstrap simulator is approximately 40 000 times faster than gene-dropping methods, and it is more powerful than methods that approximate the adjusted P-value; and, (iii) the gLOD has the same statistical power as the widely used maximum Kong and Cox LOD. Thus, our approach gives researchers the ability to determine quickly and accurately the significance of most large-scale IBD scans, which may contain multiple traits, thousands of families and tens of thousands of DNA sequences. PMID:27245422

Tourette's syndrome (TS) is a developmental disorder that has one of the highest familial recurrence rates among neuropsychiatric diseases with complex inheritance. However, the identification of definitive TS susceptibility genes remains elusive. Here, we report the first genome-wide association study (GWAS) of TS in 1285 cases and 4964 ancestry-matched controls of European ancestry, including two European-derived population isolates, Ashkenazi Jews from North America and Israel and French Canadians from Quebec, Canada. In a primary meta-analysis of GWAS data from these European ancestry samples, no markers achieved a genome-wide threshold of significance (P<5 × 10(-8)); the top signal was found in rs7868992 on chromosome 9q32 within COL27A1 (P=1.85 × 10(-6)). A secondary analysis including an additional 211 cases and 285 controls from two closely related Latin American population isolates from the Central Valley of Costa Rica and Antioquia, Colombia also identified rs7868992 as the top signal (P=3.6 × 10(-7) for the combined sample of 1496 cases and 5249 controls following imputation with 1000 Genomes data). This study lays the groundwork for the eventual identification of common TS susceptibility variants in larger cohorts and helps to provide a more complete understanding of the full genetic architecture of this disorder. PMID:22889924

Tourette Syndrome (TS) is a developmental disorder that has one of the highest familial recurrence rates among neuropsychiatric diseases with complex inheritance. However, the identification of definitive TS susceptibility genes remains elusive. Here, we report the first genome-wide association study (GWAS) of TS in 1285 cases and 4964 ancestry-matched controls of European ancestry, including two European-derived population isolates, Ashkenazi Jews from North America and Israel, and French Canadians from Quebec, Canada. In a primary meta-analysis of GWAS data from these European ancestry samples, no markers achieved a genome-wide threshold of significance (p<5 × 10−8); the top signal was found in rs7868992 on chromosome 9q32 within COL27A1 (p=1.85 × 10−6). A secondary analysis including an additional 211 cases and 285 controls from two closely-related Latin-American population isolates from the Central Valley of Costa Rica and Antioquia, Colombia also identified rs7868992 as the top signal (p=3.6 × 10−7 for the combined sample of 1496 cases and 5249 controls following imputation with 1000 Genomes data). This study lays the groundwork for the eventual identification of common TS susceptibility variants in larger cohorts and helps to provide a more complete understanding of the full genetic architecture of this disorder. PMID:22889924

To identify novel psoriasis susceptibility loci, we carried out a meta-analysis of two recent genome-wide association studies 1,2, yielding a discovery sample of 1,831 cases and 2,546 controls. 102 of the most promising loci in the discovery analysis were followed up in a three-stage replication study using 4,064 cases and 4,685 controls from Michigan, Toronto, Newfoundland, and Germany. Association at a genome-wide level of significance for the combined discovery and replication samples was found for three genomic regions. One contains NOS2 (rs4795067, p = 4 × 10−11), another contains FBXL19 (rs10782001, p = 9 × 10−10), and a third contains PSMA6 and NFKBIA (rs12586317, p = 2 × 10−8). All three loci were also strongly associated with the subphenotypes of psoriatic arthritis and purely cutaneous psoriasis. Finally, we confirmed a recently identified3 association signal near RNF114. PMID:20953189

Human longevity and healthy aging show moderate heritability (20–50%). We conducted a meta-analysis of genome-wide association studies from nine studies from the Cohorts for Heart and Aging Research in Genomic Epidemiology Consortium for two outcomes: a) all-cause mortality and b) survival free of major disease or death. No single nucleotide polymorphism (SNP) was a genome-wide significant predictor of either outcome (p < 5 × 10−8). We found fourteen independent SNPs that predicted risk of death, and eight SNPs that predicted event-free survival (p < 10−5). These SNPs are in or near genes that are highly expressed in the brain (HECW2, HIP1, BIN2, GRIA1), genes involved in neural development and function (KCNQ4, LMO4, GRIA1, NETO1) and autophagy (ATG4C), and genes that are associated with risk of various diseases including cancer and Alzheimer’s disease. In addition to considerable overlap between the traits, pathway and network analysis corroborated these findings. These findings indicate that variation in genes involved in neurological processes may be an important factor in regulating aging free of major disease and achieving longevity. PMID:21782286

It is not well known whether genetic markers identified through genome-wide association studies (GWAS) confer similar or different risks across people of different ancestry. We screened a regularly updated catalog of all published GWAS curated at the NHGRI website for GWAS-identified associations that had reached genome-wide significance (p ≤ 5 × 10(-8)) in at least one major ancestry group (European, Asian, African) and for which replication data were available for comparison in at least two different major ancestry groups. These groups were compared for the correlation between and differences in risk allele frequencies and genetic effects' estimates. Data on 108 eligible GWAS-identified associations with a total of 900 datasets (European, n = 624; Asian, n = 217; African, n = 60) were analyzed. Risk-allele frequencies were modestly correlated between ancestry groups, with >10% absolute differences in 75-89% of the three pairwise comparisons of ancestry groups. Genetic effect (odds ratio) point estimates between ancestry groups correlated modestly (pairwise comparisons' correlation coefficients: 0.20-0.33) and point estimates of risks were opposite in direction or differed more than twofold in 57%, 79%, and 89% of the European versus Asian, European versus African, and Asian versus African comparisons, respectively. The modest correlations, differing risk estimates, and considerable between-association heterogeneity suggest that differential ancestral effects can be anticipated and genomic risk markers may need separate further evaluation in different ancestry groups. PMID:22183176

We propose a minimal protocol for exhaustive genome-wide association interaction analysis that involves screening for epistasis over large-scale genomic data combining strengths of different methods and statistical tools. The different steps of this protocol are illustrated on a real-life data application for Alzheimer's disease (AD) (2259 patients and 6017 controls from France). Particularly, in the exhaustive genome-wide epistasis screening we identified AD-associated interacting SNPs-pair from chromosome 6q11.1 (rs6455128, the KHDRBS2 gene) and 13q12.11 (rs7989332, the CRYL1 gene) (p = 0.006, corrected for multiple testing). A replication analysis in the independent AD cohort from Germany (555 patients and 824 controls) confirmed the discovered epistasis signal (p = 0.036). This signal was also supported by a meta-analysis approach in 5 independent AD cohorts that was applied in the context of epistasis for the first time. Transcriptome analysis revealed negative correlation between expression levels of KHDRBS2 and CRYL1 in both the temporal cortex (β = −0.19, p = 0.0006) and cerebellum (β = −0.23, p < 0.0001) brain regions. This is the first time a replicable epistasis associated with AD was identified using a hypothesis free screening approach. PMID:24958192

Asperger Syndrome (AS) is a neurodevelopmental condition characterized by impairments in social interaction and communication, alongside the presence of unusually repetitive, restricted interests and stereotyped behaviour. Individuals with AS have no delay in cognitive and language development. It is a subset of Autism Spectrum Conditions (ASC), which are highly heritable and has a population prevalence of approximately 1%. Few studies have investigated the genetic basis of AS. To address this gap in the literature, we performed a genome-wide pooled DNA association study to identify candidate loci in 612 individuals (294 cases and 318 controls) of Caucasian ancestry, using the Affymetrix GeneChip Human Mapping version 6.0 array. We identified 11 SNPs that had a p-value below 1x10-5. These SNPs were independently genotyped in the same sample. Three of the SNPs (rs1268055, rs7785891 and rs2782448) were nominally significant, though none remained significant after Bonferroni correction. Two of our top three SNPs (rs7785891 and rs2782448) lie in loci previously implicated in ASC. However, investigation of the three SNPs in the ASC genome-wide association dataset from the Psychiatric Genomics Consortium indicated that these three SNPs were not significantly associated with ASC. The effect sizes of the variants were modest, indicating that our study was not sufficiently powered to identify causal variants with precision. PMID:26176695

Domestication of soybeans occurred under the intense human-directed selections aimed at developing high-yielding lines. Tracing the domestication history and identifying the genes underlying soybean domestication require further exploration. Here, we developed a high-throughput NJAU 355 K SoySNP array and used this array to study the genetic variation patterns in 367 soybean accessions, including 105 wild soybeans and 262 cultivated soybeans. The population genetic analysis suggests that cultivated soybeans have tended to originate from northern and central China, from where they spread to other regions, accompanied with a gradual increase in seed weight. Genome-widescanning for evidence of artificial selection revealed signs of selective sweeps involving genes controlling domestication-related agronomic traits including seed weight. To further identify genomic regions related to seed weight, a genome-wide association study (GWAS) was conducted across multiple environments in wild and cultivated soybeans. As a result, a strong linkage disequilibrium region on chromosome 20 was found to be significantly correlated with seed weight in cultivated soybeans. Collectively, these findings should provide an important basis for genomic-enabled breeding and advance the study of functional genomics in soybean. PMID:26856884

Domestication of soybeans occurred under the intense human-directed selections aimed at developing high-yielding lines. Tracing the domestication history and identifying the genes underlying soybean domestication require further exploration. Here, we developed a high-throughput NJAU 355 K SoySNP array and used this array to study the genetic variation patterns in 367 soybean accessions, including 105 wild soybeans and 262 cultivated soybeans. The population genetic analysis suggests that cultivated soybeans have tended to originate from northern and central China, from where they spread to other regions, accompanied with a gradual increase in seed weight. Genome-widescanning for evidence of artificial selection revealed signs of selective sweeps involving genes controlling domestication-related agronomic traits including seed weight. To further identify genomic regions related to seed weight, a genome-wide association study (GWAS) was conducted across multiple environments in wild and cultivated soybeans. As a result, a strong linkage disequilibrium region on chromosome 20 was found to be significantly correlated with seed weight in cultivated soybeans. Collectively, these findings should provide an important basis for genomic-enabled breeding and advance the study of functional genomics in soybean. PMID:26856884

High-throughput genetic screens have exponentially increased the functional annotation of the genome over the past 10 years. Likewise, genome-scale efforts to map DNA methylation, chromatin state and occupancy, messenger RNA expression patterns, and disease-associated genetic polymorphisms, and proteome-wide efforts to map protein-protein interactions, have also created vast resources of data. An emerging trend involves combining multiple types of data, referred to as integrative screening. Examples include papers that report integrated data generated from large-scale RNA interference screens on the Wnt/beta-catenin pathway with either genotypic or proteomic data in colorectal cancer. These studies demonstrate the power of data integration to generate focused, validated data sets and to identify high-confidence candidate genes for follow-up experiments. We present the ongoing evolution and new strategies for the integrative screening approach with respect to understanding and treating human disease. PMID:19436058

We determined genome-wide nucleosome occupancies in mouse embryonic stem cells and their neural progenitor and embryonic fibroblast counterparts to assess features associated with nucleosome positioning during lineage commitment. Cell-type- and protein-specific binding preferences of transcription factors to sites with either low (Myc, Klf4 and Zfx) or high (Nanog, Oct4 and Sox2) nucleosome occupancy as well as complex patterns for CTCF were identified. Nucleosome-depleted regions around transcription start and transcription termination sites were broad and more pronounced for active genes, with distinct patterns for promoters classified according to CpG content or histone methylation marks. Throughout the genome, nucleosome occupancy was correlated with certain histone methylation or acetylation modifications. In addition, the average nucleosome repeat length increased during differentiation by 5-7 base pairs, with local variations for specific regions. Our results reveal regulatory mechanisms of cell differentiation that involve nucleosome repositioning. PMID:23085715

The success of modern maize breeding has been demonstrated by remarkable increases in productivity over the last four decades. However, the underlying genetic changes correlated with these gains remain largely unknown. We report here the sequencing of 278 temperate maize inbred lines from different stages of breeding history, including deep resequencing of 4 lines with known pedigree information. The results show that modern breeding has introduced highly dynamic genetic changes into the maize genome. Artificial selection has affected thousands of targets, including genes and non-genic regions, leading to a reduction in nucleotide diversity and an increase in the proportion of rare alleles. Genetic changes during breeding happen rapidly, with extensive variation (SNPs, indels and copy-number variants (CNVs)) occurring, even within identity-by-descent regions. Our genome-wide assessment of genetic changes during modern maize breeding provides new strategies as well as practical targets for future crop breeding and biotechnology. PMID:22660547

Genomic DNA methylation functions to repress gene expression by interfering with transcription factor binding and/or recruiting repressive chromatin machinery. Recent data support contribution of regulated DNA methylation to embryonic pluripotency, development, and tissue differentiation; this important epigenetic mark is chemically stable yet enzymatically reversible-and heritable through the germline. Importantly, all the major components involved in dynamic DNA methylation are conserved in zebrafish, including the factors that "write, read, and erase" this mark. Therefore, the zebrafish has become an excellent model for studying most biological processes associated with DNA methylation in mammals. Here we briefly review the zebrafish model for studying DNA methylation and describe a series of methods for performing genome-wide DNA methylation analysis. We address and provide methods for methylated DNA immunoprecipitation followed by sequencing (MeDIP-Seq), bisulfite sequencing (BS-Seq), and reduced representation bisulfite sequencing (RRBS-Seq). PMID:27443935

Genome-wide association studies (GWAS) are being conducted at an unprecedented rate in population-based cohorts and have increased our understanding of the pathophysiology of complex disease. Regardless of context, the practical utility of this information will ultimately depend upon the quality of the original data. Quality control (QC) procedures for GWAS are computationally intensive, operationally challenging, and constantly evolving. Here we enumerate some of the challenges in QC of GWAS data and describe the approaches that the electronic MEdical Records and Genomics (eMERGE) network is using for quality assurance in GWAS data, thereby minimizing potential bias and error in GWAS results. We discuss common issues associated with QC of GWAS data, including data file formats, software packages for data manipulation and analysis, sex chromosome anomalies, sample identity, sample relatedness, population substructure, batch effects, and marker quality. We propose best practices and discuss areas of ongoing and future research. PMID:21234875

We performed a multistage genome-wide association study of melanoma. In a discovery cohort of 1804 melanoma cases and 1026 controls, we identified loci at chromosomes 15q13.1 (HERC2/OCA2 region) and 16q24.3 (MC1R) regions that reached genome-wide significance within this study and also found strong evidence for genetic effects on susceptibility to melanoma from markers on chromosome 9p21.3 in the p16/ARF region and on chromosome 1q21.3 (ARNT/LASS2/ANXA9 region). The most significant single-nucleotide polymorphisms (SNPs) in the 15q13.1 locus (rs1129038 and rs12913832) lie within a genomic region that has profound effects on eye and skin color; notably, 50% of variability in eye color is associated with variation in the SNP rs12913832. Because eye and skin colors vary across European populations, we further evaluated the associations of the significant SNPs after carefully adjusting for European substructure. We also evaluated the top 10 most significant SNPs by using data from three other genome-widescans. Additional in silico data provided replication of the findings from the most significant region on chromosome 1q21.3 rs7412746 (P = 6 × 10(-10)). Together, these data identified several candidate genes for additional studies to identify causal variants predisposing to increased risk for developing melanoma. PMID:21926416

Understanding how genomes encode complex cellular and organismal behaviors has become the outstanding challenge of modern genetics. Unlike classical screening methods, analysis of genetic variation that occurs naturally in wild populations can enable rapid, genome-scale mapping of genotype to phenotype with a medium-throughput experimental design. Here we describe the results of the first genome-wide association study (GWAS) used to identify novel loci underlying trait variation in a microbial eukaryote, harnessing wild isolates of the filamentous fungus Neurospora crassa. We genotyped each of a population of wild Louisiana strains at 1 million genetic loci genome-wide, and we used these genotypes to map genetic determinants of microbial communication. In N. crassa, germinated asexual spores (germlings) sense the presence of other germlings, grow toward them in a coordinated fashion, and fuse. We evaluated germlings of each strain for their ability to chemically sense, chemotropically seek, and undergo cell fusion, and we subjected these trait measurements to GWAS. This analysis identified one gene, NCU04379 (cse-1, encoding a homolog of a neuronal calcium sensor), at which inheritance was strongly associated with the efficiency of germling communication. Deletion of cse-1 significantly impaired germling communication and fusion, and two genes encoding predicted interaction partners of CSE1 were also required for the communication trait. Additionally, mining our association results for signaling and secretion genes with a potential role in germling communication, we validated six more previously unknown molecular players, including a secreted protease and two other genes whose deletion conferred a novel phenotype of increased communication and multi-germling fusion. Our results establish protein secretion as a linchpin of germling communication in N. crassa and shed light on the regulation of communication molecules in this fungus. Our study demonstrates the power

Understanding how genomes encode complex cellular and organismal behaviors has become the outstanding challenge of modern genetics. Unlike classical screening methods, analysis of genetic variation that occurs naturally in wild populations can enable rapid, genome-scale mapping of genotype to phenotype with a medium-throughput experimental design. Here we describe the results of the first genome-wide association study (GWAS) used to identify novel loci underlying trait variation in a microbial eukaryote, harnessing wild isolates of the filamentous fungus Neurospora crassa. We genotyped each of a population of wild Louisiana strains at 1 million genetic loci genome-wide, and we used these genotypes to map genetic determinants of microbial communication. In N. crassa, germinated asexual spores (germlings) sense the presence of other germlings, grow toward them in a coordinated fashion, and fuse. We evaluated germlings of each strain for their ability to chemically sense, chemotropically seek, and undergo cell fusion, and we subjected these trait measurements to GWAS. This analysis identified one gene, NCU04379 (cse-1, encoding a homolog of a neuronal calcium sensor), at which inheritance was strongly associated with the efficiency of germling communication. Deletion of cse-1 significantly impaired germling communication and fusion, and two genes encoding predicted interaction partners of CSE1 were also required for the communication trait. Additionally, mining our association results for signaling and secretion genes with a potential role in germling communication, we validated six more previously unknown molecular players, including a secreted protease and two other genes whose deletion conferred a novel phenotype of increased communication and multi-germling fusion. Our results establish protein secretion as a linchpin of germling communication in N. crassa and shed light on the regulation of communication molecules in this fungus. Our study demonstrates the power

While genetic screens have identified many genes essential for neurite outgrowth, they have been limited in their ability to identify neural genes that also have earlier critical roles in the gastrula, or neural genes for which maternally contributed RNA compensates for gene mutations in the zygote. To address this, we developed methods to screen the Drosophila genome using RNA-interference (RNAi) on primary neural cells and present the results of the first full-genome RNAi screen in neurons. We used live-cell imaging and quantitative image analysis to characterize the morphological phenotypes of fluorescently labelled primary neurons and glia in response to RNAi-mediated gene knockdown. From the full genome screen, we focused our analysis on 104 evolutionarily conserved genes that when downregulated by RNAi, have morphological defects such as reduced axon extension, excessive branching, loss of fasciculation, and blebbing. To assist in the phenotypic analysis of the large data sets, we generated image analysis algorithms that could assess the statistical significance of the mutant phenotypes. The algorithms were essential for the analysis of the thousands of images generated by the screening process and will become a valuable tool for future genome-wide screens in primary neurons. Our analysis revealed unexpected, essential roles in neurite outgrowth for genes representing a wide range of functional categories including signalling molecules, enzymes, channels, receptors, and cytoskeletal proteins. We also found that genes known to be involved in protein and vesicle trafficking showed similar RNAi phenotypes. We confirmed phenotypes of the protein trafficking genes Sec61alpha and Ran GTPase using Drosophila embryo and mouse embryonic cerebral cortical neurons, respectively. Collectively, our results showed that RNAi phenotypes in primary neural culture can parallel in vivo phenotypes, and the screening technique can be used to identify many new genes that have

Our current understanding of speciation is often based on considering a relatively small number of genes, sometimes in isolation of one another. Here, we describe a possible emergent genome process involving the aggregate effect of many genes contributing to the evolution of reproductive isolation across the speciation continuum. When a threshold number of divergently selected mutations of modest to low fitness effects accumulate between populations diverging with gene flow, nonlinear transitions can occur in which levels of adaptive differentiation, linkage disequilibrium, and reproductive isolation dramatically increase. In effect, the genomes of the populations start to "congeal" into distinct entities representing different species. At this stage, reproductive isolation changes from being a characteristic of specific, divergently selected genes to a property of the genome. We examine conditions conducive to such genome-wide congealing (GWC), describe how to empirically test for GWC, and highlight a putative empirical example involving Rhagoletis fruit flies. We conclude with cautious optimism that the models and concepts discussed here, once extended to large numbers of neutral markers, may provide a framework for integrating information from genomescans, selection experiments, quantitative trait loci mapping, association studies, and natural history to develop a deeper understanding of the genomics of speciation. PMID:25149256

The covalent DNA modification of cytosine at position 5 (5-methylcytosine; 5mC) has emerged as an important epigenetic mark most commonly present in the context of CpG dinucleotides in mammalian cells. In pluripotent stem cells and plants, it is also found in non-CpG and CpNpG contexts, respectively. 5mC has important implications in a diverse set of biological processes, including transcriptional regulation. Aberrant DNA methylation has been shown to be associated with a wide variety of human ailments and thus is the focus of active investigation. Methods used for detecting DNA methylation have revolutionized our understanding of this epigenetic mark and provided new insights into its role in diverse biological functions. Here we describe recent technological advances in genome-wide DNA methylation analysis and discuss their relative utility and drawbacks, providing specific examples from studies that have used these technologies for genome-wide DNA methylation analysis to address important biological questions. Finally, we discuss a newly identified covalent DNA modification, 5-hydroxymethylcytosine (5hmC), and speculate on its possible biological function, as well as describe a new methodology that can distinguish 5hmC from 5mC. PMID:20964631

Metabolic syndrome (METS) is a disorder of energy utilization and storage and increases the risk of developing cardiovascular disease and diabetes. To identify the genetic risk factors of METS, we carried out a genome-wide association study (GWAS) for 2,657 cases and 5,917 controls in Korean populations. As a result, we could identify 2 single nucleotide polymorphisms (SNPs) with genome-wide significance level p-values (<5 × 10-8), 8 SNPs with genome-wide suggestive p-values (5 × 10-8 ≤ p < 1 × 10-5), and 2 SNPs of more functional variants with borderline p-values (5 × 10-5 ≤ p < 1 × 10-4). On the other hand, the multiple correction criteria of conventional GWASs exclude false-positive loci, but simultaneously, they discard many true-positive loci. To reconsider the discarded true-positive loci, we attempted to include the functional variants (nonsynonymous SNPs [nsSNPs] and expression quantitative trait loci [eQTL]) among the top 5,000 SNPs based on the proportion of phenotypic variance explained by genotypic variance. In total, 159 eQTLs and 18 nsSNPs were presented in the top 5,000 SNPs. Although they should be replicated in other independent populations, 6 eQTLs and 2 nsSNP loci were located in the molecular pathways of LPL, APOA5, and CHRM2, which were the significant or suggestive loci in the METS GWAS. Conclusively, our approach using the conventional GWAS, reconsidering functional variants and pathway-based interpretation, suggests a useful method to understand the GWAS results of complex traits and can be expanded in other genomewide association studies. PMID:25705157

Human fertility is a complex trait determined by gene-environment interactions in which genetic factors represent a significant component. To better understand inter-individual variability in fertility, we performed one of the first genome-wide association studies (GWAS) of common fertility phenotypes, lifetime number of pregnancies and number of children in a developing country population. The fertility phenotype data and DNA samples were obtained at baseline recruitment from individuals participating in a large prospective cohort study in Bangladesh. GWAS analyses of fertility phenotypes were conducted among 1,686 married women. One SNP on chromosome 4 was non-significantly associated with number of children at P <10-7 and number of pregnancies at P <10-6. This SNP is located in a region without a gene within 1 Mb. One SNP on chromosome 6 was non-significantly associated with extreme number of children at P <10-6. The closest gene to this SNP is HDGFL1, a hepatoma-derived growth factor. When we excluded hormonal contraceptive users, a SNP on chromosome 5 was non-significantly associated at P <10-5 for number of children and number of pregnancies. This SNP is located near C5orf64, an open reading frame, and ZSWIM6, a zinc ion binding gene. We also estimated the heritability of these phenotypes from our genotype data using GCTA (Genome-wide Complex Trait Analysis) for number of children (hg2 = 0.149, SE = 0.24, p-value = 0.265) and number of pregnancies (hg2 = 0.007, SE = 0.22, p-value = 0.487). Our genome-wide association study and heritability estimates of number of pregnancies and number of children in Bangladesh did not confer strong evidence of common variants for parity variation. However, our results suggest that future studies may want to consider the role of 3 notable SNPs in their analysis. PMID:25742292

Obsessive-compulsive disorder (OCD) is a common, debilitating neuropsychiatric illness with complex genetic etiology. The International OCD Foundation Genetics Collaborative (IOCDF-GC) is a multi-national collaboration established to discover the genetic variation predisposing to OCD. A set of individuals affected with DSM-IV OCD, a subset of their parents, and unselected controls, were genotyped with several different Illumina SNP microarrays. After extensive data cleaning, 1465 cases, 5557 ancestry-matched controls and 400 complete trios remained, with a common set of 469,410 autosomal and 9657 X-chromosome single nucleotide polymorphisms (SNPs). Ancestry-stratified case-control association analyses were conducted for three genetically-defined subpopulations and combined in two meta-analyses, with and without the trio-based analysis. In the case-control analysis, the lowest two P-values were located within DLGAP1 (P=2.49 × 10(-6) and P=3.44 × 10(-6)), a member of the neuronal postsynaptic density complex. In the trio analysis, rs6131295, near BTBD3, exceeded the genome-wide significance threshold with a P-value=3.84 × 10(-8). However, when trios were meta-analyzed with the case-control samples, the P-value for this variant was 3.62 × 10(-5), losing genome-wide significance. Although no SNPs were identified to be associated with OCD at a genome-wide significant level in the combined trio-case-control sample, a significant enrichment of methylation QTLs (P<0.001) and frontal lobe expression quantitative trait loci (eQTLs) (P=0.001) was observed within the top-ranked SNPs (P<0.01) from the trio-case-control analysis, suggesting these top signals may have a broad role in gene expression in the brain, and possibly in the etiology of OCD. PMID:22889921

Objective To examine genome-wide 5hmC distribution in osteoarthritic (OA) and normal chondrocytes to investigate the effect on OA-specific gene expression. Methods Cartilage was obtained from OA patients undergoing total knee arthroplasty or control patients undergoing anterior cruciate ligament reconstruction. Genome-wide sequencing of 5hmC-enriched DNA (5hmC-seq) was performed for a small cohort of normal and OA chondrocytes to identify differentially hydroxymethylated regions (DhMRs) in OA chondrocytes. 5hmC-seq data was intersected with global OA gene expression data to define subsets of genes and pathways potentially affected by increased 5hmC levels in OA chondrocytes. Results 70591 DhMRs were identified in OA chondrocytes compared to normal chondrocytes, 44288 (63%) of which were increased in OA chondrocytes. The majority of DhMRs (66%) were gained in gene bodies. Increased DhMRs were observed in ~50% of genes previously implicated in OA pathology including MMP3, LRP5, GDF5 and COL11A1. Furthermore, analyses of gene expression data revealed gene body gain of 5hmC appears to be preferentially associated with activated but not repressed genes in OA chondrocytes. Conclusion This study provides the first genome-wide profiling of 5hmC distribution in OA chondrocytes. We had previously reported a global increase in 5hmC levels in OA chondrocytes. Gain of 5hmC in the gene body is found to be characteristic of activated genes in OA chondrocytes, highlighting the influence of 5hmC as an epigenetic mark in OA. In addition, this study identifies multiple OA-associated genes that are potentially regulated either singularly by gain of DNA hydroxymethylation or in combination with loss of DNA methylation. PMID:25940674

Human fertility is a complex trait determined by gene-environment interactions in which genetic factors represent a significant component. To better understand inter-individual variability in fertility, we performed one of the first genome-wide association studies (GWAS) of common fertility phenotypes, lifetime number of pregnancies and number of children in a developing country population. The fertility phenotype data and DNA samples were obtained at baseline recruitment from individuals participating in a large prospective cohort study in Bangladesh. GWAS analyses of fertility phenotypes were conducted among 1,686 married women. One SNP on chromosome 4 was non-significantly associated with number of children at P <10(-7) and number of pregnancies at P <10(-6). This SNP is located in a region without a gene within 1 Mb. One SNP on chromosome 6 was non-significantly associated with extreme number of children at P <10(-6). The closest gene to this SNP is HDGFL1, a hepatoma-derived growth factor. When we excluded hormonal contraceptive users, a SNP on chromosome 5 was non-significantly associated at P <10(-5) for number of children and number of pregnancies. This SNP is located near C5orf64, an open reading frame, and ZSWIM6, a zinc ion binding gene. We also estimated the heritability of these phenotypes from our genotype data using GCTA (Genome-wide Complex Trait Analysis) for number of children (hg2 = 0.149, SE = 0.24, p-value = 0.265) and number of pregnancies (hg2 = 0.007, SE = 0.22, p-value = 0.487). Our genome-wide association study and heritability estimates of number of pregnancies and number of children in Bangladesh did not confer strong evidence of common variants for parity variation. However, our results suggest that future studies may want to consider the role of 3 notable SNPs in their analysis. PMID:25742292

Personality traits are complex phenotypes related to psychosomatic health. Individually, various gene finding methods have not achieved much success in finding genetic variants associated with personality traits. We performed a meta-analysis of four genome-wide linkage scans (N=6149 subjects) of five basic personality traits assessed with the NEO Five-Factor Inventory. We compared the significant regions from the meta-analysis of linkage scans with the results of a meta-analysis of genome-wide association studies (GWAS) (N∼17 000). We found significant evidence of linkage of neuroticism to chromosome 3p14 (rs1490265, LOD=4.67) and to chromosome 19q13 (rs628604, LOD=3.55); of extraversion to 14q32 (ATGG002, LOD=3.3); and of agreeableness to 3p25 (rs709160, LOD=3.67) and to two adjacent regions on chromosome 15, including 15q13 (rs970408, LOD=4.07) and 15q14 (rs1055356, LOD=3.52) in the individual scans. In the meta-analysis, we found strong evidence of linkage of extraversion to 4q34, 9q34, 10q24 and 11q22, openness to 2p25, 3q26, 9p21, 11q24, 15q26 and 19q13 and agreeableness to 4q34 and 19p13. Significant evidence of association in the GWAS was detected between openness and rs677035 at 11q24 (P-value=2.6 × 10(-06), KCNJ1). The findings of our linkage meta-analysis and those of the GWAS suggest that 11q24 is a susceptible locus for openness, with KCNJ1 as the possible candidate gene. PMID:23211697

Adolescent idiopathic scoliosis(AIS)is a polygenic disease. Genome-wide association studies(GWASs)have been performed for a lot of polygenic diseases. For AIS, we conducted GWAS and identified the first AIS locus near LBX1. After the discovery, we have extended our study by increasing the numbers of subjects and SNPs. In total, our Japanese GWAS has identified four susceptibility genes. GWASs for AIS have also been performed in the USA and China, which identified one and three susceptibility genes, respectively. Here we review GWASs in Japan and abroad and functional analysis to clarify the pathomechanism of AIS. PMID:27013625

Genome-wide association (GWA) studies have identified a large number of single-nucleotide polymorphisms (SNPs) associated with disease phenotypes. As most GWA studies have been performed primarily in populations of European descent, this review examines the issues involved in extending consideration of GWA studies to diverse worldwide populations. Although challenges exist with such issues as imputation, admixture, and replication, investigation of diverse populations in GWA studies has significant potential to advance the project of mapping the genetic determinants of complex diseases for the human population as a whole. PMID:20395969

The number of obese patients is increasing in Japan, due to the westernization of lifestyle. Obesity, especially visceral fat obesity, is important for the development of metabolic syndrome. Genetic factors are important for the development of obesity as well as environmental factors. Importance of genetic factors of fat distribution is also reported. Recent genome-wide association studies (GWASs) have revealed the obesity and fat distribution-related polymorphisms. GWAS will highlight a better understanding of the underlying molecular mechanisms in the regulation of obesity and distribution of body fat. PMID:23631198

Over the past several years, the field of reproductive medicine has witnessed great advances in genome-wide association studies (GWASs) of polycystic ovary syndrome (PCOS), leading to identification of several promising genes involved in hormone action, type 2 diabetes, and cell proliferation. This review summarizes the key findings and discusses their potential implications with regard to genetic mechanisms of PCOS. Limitations of GWAS are evaluated, emphasizing the understanding of the reasons for variability in results between individual studies. Root causes of misinterpretations of GWASs are also addressed. Finally, the impact of GWAS on future directions of multi- and interdisciplinary studies is discussed. PMID:27513023

Human gait is a complex neurological and musculoskeletal function, of which the genetic basis remains largely unknown. To determine the influence of common genetic variants on gait parameters, we studied 2,946 participants of the Rotterdam Study, a population-based cohort of unrelated elderly individuals. We assessed 30 gait parameters using an electronic walkway, which yielded seven independent gait domains after principal component analysis. Genotypes of participants were imputed to the 1,000 Genomes reference panel for generating genetic relationship matrices to estimate heritability of gait parameters, and for subsequent genome-wide association scans (GWASs) to identify specific variants. Gait domains with the highest age- and sex-adjusted heritability were Variability (h (2) = 61%), Rhythm (37%), and Tandem (32%). For other gait domains, heritability estimates attenuated after adjustment for height and weight. Genome-wide association scans identified a variant on 1p22.3 that was significantly associated with single support time, a variable from the Rhythm domain (rs72953990; N = 2,946; β [SE] = 0.0069 (0.0012), p = 2.30×10(-8)). This variant did not replicate in an independent sample (N = 362; p = .78). In conclusion, human gait has highly heritable components that are explained by common genetic variation, which are partly attributed to height and weight. Collaborative efforts are needed to identify robust single variant associations for the heritable parameters. PMID:26219847

Over the past decade, genome-wide association studies (GWAS) have considerably improved our understanding of the genetic basis of kidney function and disease. Population-based studies, used to investigate traits that define chronic kidney disease (CKD), have identified >50 genomic regions in which common genetic variants associate with estimated glomerular filtration rate or urinary albumin-to-creatinine ratio. Case-control studies, used to study specific CKD aetiologies, have yielded risk loci for specific kidney diseases such as IgA nephropathy and membranous nephropathy. In this Review, we summarize important findings from GWAS and clinical and experimental follow-up studies. We also compare risk allele frequency, effect sizes, and specificity in GWAS of CKD-defining traits and GWAS of specific CKD aetiologies and the implications for study design. Genomic regions identified in GWAS of CKD-defining traits can contain causal genes for monogenic kidney diseases. Population-based research on kidney function traits can therefore generate insights into more severe forms of kidney diseases. Experimental follow-up studies have begun to identify causal genes and variants, which are potential therapeutic targets, and suggest mechanisms underlying the high allele frequency of causal variants. GWAS are thus a useful approach to advance knowledge in nephrology. PMID:27477491

We surveyed gene–gene interactions (epistasis) in human body mass index (BMI) in four European populations (n<1200) via exhaustive pair-wise genomescans where interactions were computed as F ratios by testing a linear regression model fitting two single-nucleotide polymorphisms (SNPs) with interactions against the one without. Before the association tests, BMI was corrected for sex and age, normalised and adjusted for relatedness. Neither single SNPs nor SNP interactions were genome-wide significant in either cohort based on the consensus threshold (P=5.0E−08) and a Bonferroni corrected threshold (P=1.1E−12), respectively. Next we compared sub genome-wide significant SNP interactions (P<5.0E−08) across cohorts to identify common epistatic signals, where SNPs were annotated to genes to test for gene ontology (GO) enrichment. Among the epistatic genes contributing to the commonly enriched GO terms, 19 were shared across study cohorts of which 15 are previously published genome-wide association loci, including CDH13 (cadherin 13) associated with height and SORCS2 (sortilin-related VPS10 domain containing receptor 2) associated with circulating insulin-like growth factor 1 and binding protein 3. Interactions between the 19 shared epistatic genes and those involving BMI candidate loci (P<5.0E−08) were tested across cohorts and found eight replicated at the SNP level (P<0.05) in at least one cohort, which were further tested and showed limited replication in a separate European population (n>5000). We conclude that genome-wide analysis of epistasis in multiple populations is an effective approach to provide new insights into the genetic regulation of BMI but requires additional efforts to confirm the findings. PMID:22333899

DNA methylation has a crucial role in cancer biology. In the present study, a genome-wide analysis of DNA methylation in hepatoblastoma (HB) tissues was performed to verify differential methylation levels between HB and normal tissues. As alpha-fetoprotein (AFP) has a critical role in HB, AFP methylation levels were also detected using pyrosequencing. Normal and HB liver tissue samples (frozen tissue) were obtained from patients with HB. Genome-wide analysis of DNA methylation in these tissues was performed using an Infinium HumanMethylation450 BeadChip, and the results were confirmed with reverse transcription-quantitative polymerase chain reaction. The Infinium HumanMethylation450 BeadChip demonstrated distinctively less methylation in HB tissues than in non-tumor tissues. In addition, methylation enrichment was observed in positions near the transcription start site of AFP, which exhibited lower methylation levels in HB tissues than in non-tumor liver tissues. Lastly, a significant negative correlation was observed between AFP messenger RNA expression and DNA methylation percentage, using linear Pearson's R correlation coefficients. The present results demonstrate differential methylation levels between HB and normal tissues, and imply that aberrant methylation of AFP in HB could reflect HB development. Expansion of these findings could provide useful insight into HB biology. PMID:27446465

Domesticated Asian rice (Oryza sativa) is one of the oldest domesticated crop species in the world, having fed more people than any other plant in human history. We report the patterns of DNA sequence variation in rice and its wild ancestor, O. rufipogon, across 111 randomly chosen gene fragments, and use these to infer the evolutionary dynamics that led to the origins of rice. There is a genome-wide excess of high-frequency derived single nucleotide polymorphisms (SNPs) in O. sativa varieties, a pattern that has not been reported for other crop species. We developed several alternative models to explain contemporary patterns of polymorphisms in rice, including a (i) selectively neutral population bottleneck model, (ii) bottleneck plus migration model, (iii) multiple selective sweeps model, and (iv) bottleneck plus selective sweeps model. We find that a simple bottleneck model, which has been the dominant demographic model for domesticated species, cannot explain the derived nucleotide polymorphism site frequency spectrum in rice. Instead, a bottleneck model that incorporates selective sweeps, or a more complex demographic model that includes subdivision and gene flow, are more plausible explanations for patterns of variation in domesticated rice varieties. If selective sweeps are indeed the explanation for the observed nucleotide data of domesticated rice, it suggests that strong selection can leave its imprint on genome-wide polymorphism patterns, contrary to expectations that selection results only in a local signature of variation. PMID:17907810

The dynamic interplay between DNA-binding proteins and nucleosomes underlies essential nuclear processes such as transcription, replication, and DNA repair. Manifestations of this interplay include the assembly, eviction, and replacement of nucleosomes. Hence, measurements of nucleosome turnover kinetics can lead to insights into the regulation of dynamic chromatin processes. In this chapter, we describe a genome-wide method for measuring nucleosome turnover that uses metabolic labeling followed by capture of newly synthesized histones, which we have termed Covalent Attachment of Tagged Histones to Capture and Identify Turnover (CATCH-IT). Although CATCH-IT can be used with any genome-wide mapping procedure, high-resolution profiling is attainable using paired-end sequencing of native chromatin. Our protocol also includes an efficient Solexa DNA sequencing library preparation protocol that can be used for single base-pair resolution mapping of both nucleosome and subnucleosomal particles. We not only describe the use of these protocols in the context of a Drosophila cell line but also provide the necessary changes for adaptation to other model systems. PMID:22929769

The need for research investigating DNA methylation (DNAm) in clinical studies has increased, leading to the evolution of new analytic methods to improve accuracy and reproducibility of the interpretation of results from these studies. The purpose of this article is to provide clinical researchers with a summary of the major data processing steps routinely applied in clinical studies investigating genome-wide DNAm using the Illumina HumanMethylation 450K BeadChip. In most studies, the primary goal of employing DNAm analysis is to identify differential methylation at CpG sites among phenotypic groups. Experimental design considerations are crucial at the onset to minimize bias from factors related to sample processing and avoid confounding experimental variables with non-biological batch effects. Although there are currently no de facto standard methods for analyzing these data, we review the major steps in processing DNAm data recommended by several research studies. We describe several variations available for clinical researchers to process, analyze, and interpret DNAm data. These insights are applicable to most types of genome-wide DNAm array platforms and will be applicable for the next generation of DNAm array technologies (e.g., the 850K array). Selection of the DNAm analytic pipeline followed by investigators should be guided by the research question and supported by recently published methods. PMID:27127542

This is the first report performing the whole genome SNP scanning of snow sheep (Ovis nivicola). Samples of snow sheep (n = 18) collected in six different regions of the Republic of Sakha (Yakutia) from 64° to 71° N. For SNP genotyping, we applied Ovine 50K SNP BeadChip (Illumina, United States), designed for domestic sheep. The total number of genotyped SNPs (call rate 90%) was 47796 (88.1% of total SNPs), wherein 1006 SNPs were polymorphic (2.1%). Principal component analysis (PCA) showed the clear differentiation within the species O. nivicola: studied individuals were distributed among five distinct arrays corresponding to the geographical locations of sampling points. Our results demonstrate that the DNA chip designed for domestic sheep can be successfully used to study the allele pool and the genetic structure of snow sheep populations. PMID:27599514

Evolution is typically thought to proceed through divergence of genes, proteins, and ultimately phenotypes1-3. However, similar traits might also evolve convergently in unrelated taxa due to similar selection pressures4,5. Adaptive phenotypic convergence is widespread in nature, and recent results from a handful of genes have suggested that this phenomenon is powerful enough to also drive recurrent evolution at the sequence level6-9. Where homoplasious substitutions do occur these have long been considered the result of neutral processes. However, recent studies have demonstrated that adaptive convergent sequence evolution can be detected in vertebrates using statistical methods that model parallel evolution9,10 although the extent to which sequence convergence between genera occurs across genomes is unknown. Here we analyse genomic sequence data in mammals that have independently evolved echolocation and show for the first time that convergence is not a rare process restricted to a handful of loci but is instead widespread, continuously distributed and commonly driven by natural selection acting on a small number of sites per locus. Systematic analyses of convergent sequence evolution in 805,053 amino acids within 2,326 orthologous coding gene sequences compared across 22 mammals (including four new bat genomes) revealed signatures consistent with convergence in nearly 200 loci. Strong and significant support for convergence among bats and the dolphin was seen in numerous genes linked to hearing or deafness, consistent with an involvement in echolocation. Surprisingly we also found convergence in many genes linked to vision: the convergent signal of many sensory genes was robustly correlated with the strength of natural selection. This first attempt to detect genome-wide convergent sequence evolution across divergent taxa reveals the phenomenon to be much more pervasive than previously recognised. PMID:24005325

Evolution is typically thought to proceed through divergence of genes, proteins and ultimately phenotypes. However, similar traits might also evolve convergently in unrelated taxa owing to similar selection pressures. Adaptive phenotypic convergence is widespread in nature, and recent results from several genes have suggested that this phenomenon is powerful enough to also drive recurrent evolution at the sequence level. Where homoplasious substitutions do occur these have long been considered the result of neutral processes. However, recent studies have demonstrated that adaptive convergent sequence evolution can be detected in vertebrates using statistical methods that model parallel evolution, although the extent to which sequence convergence between genera occurs across genomes is unknown. Here we analyse genomic sequence data in mammals that have independently evolved echolocation and show that convergence is not a rare process restricted to several loci but is instead widespread, continuously distributed and commonly driven by natural selection acting on a small number of sites per locus. Systematic analyses of convergent sequence evolution in 805,053 amino acids within 2,326 orthologous coding gene sequences compared across 22 mammals (including four newly sequenced bat genomes) revealed signatures consistent with convergence in nearly 200 loci. Strong and significant support for convergence among bats and the bottlenose dolphin was seen in numerous genes linked to hearing or deafness, consistent with an involvement in echolocation. Unexpectedly, we also found convergence in many genes linked to vision: the convergent signal of many sensory genes was robustly correlated with the strength of natural selection. This first attempt to detect genome-wide convergent sequence evolution across divergent taxa reveals the phenomenon to be much more pervasive than previously recognized. PMID:24005325

Motivation: The sequencing of the human genome has made it possible to identify an informative set of >1 million single nucleotide polymorphisms (SNPs) across the genome that can be used to carry out genome-wide association studies (GWASs). The availability of massive amounts of GWAS data has necessitated the development of new biostatistical methods for quality control, imputation and analysis issues including multiple testing. This work has been successful and has enabled the discovery of new associations that have been replicated in multiple studies. However, it is now recognized that most SNPs discovered via GWAS have small effects on disease susceptibility and thus may not be suitable for improving health care through genetic testing. One likely explanation for the mixed results of GWAS is that the current biostatistical analysis paradigm is by design agnostic or unbiased in that it ignores all prior knowledge about disease pathobiology. Further, the linear modeling framework that is employed in GWAS often considers only one SNP at a time thus ignoring their genomic and environmental context. There is now a shift away from the biostatistical approach toward a more holistic approach that recognizes the complexity of the genotype–phenotype relationship that is characterized by significant heterogeneity and gene–gene and gene–environment interaction. We argue here that bioinformatics has an important role to play in addressing the complexity of the underlying genetic basis of common human diseases. The goal of this review is to identify and discuss those GWAS challenges that will require computational methods. Contact: jason.h.moore@dartmouth.edu PMID:20053841

A limitation of many genome-wide association studies (GWA) in animal breeding is that there are many loci with small effect sizes; thus, larger sample sizes (N) are required to guarantee suitable power of detection. To increase sample size, results from different GWA can be combined in a meta-analys...

Secondary contact between divergent populations or incipient species may result in the exchange and introgression of genomic material. We develop a simple DNA sequence measure, called Gmin, which is designed to identify genomic regions experiencing introgression in a secondary contact model. Gmin is defined as the ratio of the minimum between-population number of nucleotide differences in a genomic window to the average number of between-population differences. Although it is conceptually simple, one advantage of Gmin is that it is computationally inexpensive relative to model-based methods for detecting gene flow and it scales easily to the level of whole-genome analysis. We compare the sensitivity and specificity of Gmin to those of the widely used index of population differentiation, FST, and suggest a simple statistical test for identifying genomic outliers. Extensive computer simulations demonstrate that Gmin has both greater sensitivity and specificity for detecting recent introgression than does FST. Furthermore, we find that the sensitivity of Gmin is robust with respect to both the population mutation and recombination rates. Finally, a scan of Gmin across the X chromosome of Drosophila melanogaster identifies candidate regions of introgression between sub-Saharan African and cosmopolitan populations that were previously missed by other methods. These results show that Gmin is a biologically straightforward, yet powerful, alternative to FST, as well as to more computationally intensive model-based methods for detecting gene flow. PMID:25874895

Objective Although twin and family studies have shown Attention Deficit/Hyperactivity Disorder (ADHD) to be highly heritable, genetic variants influencing the trait at a genome-wide significant level have yet to be identified. As prior genome-wide association scans (GWAS) have not yielded significant results, we conducted a meta-analysis of existing studies to boost statistical power. Method We used data from four projects: a) the Children’s Hospital of Philadelphia (CHOP), b) phase I of the International Multicenter ADHD Genetics project (IMAGE), c) phase II of IMAGE (IMAGE II), and d) the Pfizer funded study from the University of California, Los Angeles, Washington University and the Massachusetts General Hospital (PUWMa). The final sample size consisted of 2,064 trios, 896 cases and 2,455 controls. For each study, we imputed HapMap SNPs, computed association test statistics and transformed them to Z-scores, and then combined weighted Z-scores in a meta-analysis. Results No genome-wide significant associations were found, although an analysis of candidate genes suggests they may be involved in the disorder. Conclusions Given that ADHD is a highly heritable disorder, our negative results suggest that the effects of common ADHD risk variants must, individually, be very small or that other types of variants, e.g. rare ones, account for much of the disorder’s heritability. PMID:20732625

The optic nerve head is involved in many ophthalmic disorders, including common diseases such as myopia and open-angle glaucoma. Two of the most important parameters are the size of the optic disc area and the vertical cup-disc ratio (VCDR). Both are highly heritable but genetically largely undetermined. We performed a meta-analysis of genome-wide association (GWA) data to identify genetic variants associated with optic disc area and VCDR. The gene discovery included 7,360 unrelated individuals from the population-based Rotterdam Study I and Rotterdam Study II cohorts. These cohorts revealed two genome-wide significant loci for optic disc area, rs1192415 on chromosome 1p22 (p = 6.72×10−19) within 117 kb of the CDC7 gene and rs1900004 on chromosome 10q21.3-q22.1 (p = 2.67×10−33) within 10 kb of the ATOH7 gene. They revealed two genome-wide significant loci for VCDR, rs1063192 on chromosome 9p21 (p = 6.15×10−11) in the CDKN2B gene and rs10483727 on chromosome 14q22.3-q23 (p = 2.93×10−10) within 40 kbp of the SIX1 gene. Findings were replicated in two independent Dutch cohorts (Rotterdam Study III and Erasmus Rucphen Family study; N = 3,612), and the TwinsUK cohort (N = 843). Meta-analysis with the replication cohorts confirmed the four loci and revealed a third locus at 16q12.1 associated with optic disc area, and four other loci at 11q13, 13q13, 17q23 (borderline significant), and 22q12.1 for VCDR. ATOH7 was also associated with VCDR independent of optic disc area. Three of the loci were marginally associated with open-angle glaucoma. The protein pathways in which the loci of optic disc area are involved overlap with those identified for VCDR, suggesting a common genetic origin. PMID:20548946

The pathophysiology of antisocial personality disorder (ASPD) remains unclear. Although the most consistent biological finding is reduced grey matter volume in the frontal cortex, about 50% of the total liability to developing ASPD has been attributed to genetic factors. The contributing genes remain largely unknown. Therefore, we sought to study the genetic background of ASPD. We conducted a genome-wide association study (GWAS) and a replication analysis of Finnish criminal offenders fulfilling DSM-IV criteria for ASPD (N=370, N=5850 for controls, GWAS; N=173, N=3766 for controls and replication sample). The GWAS resulted in suggestive associations of two clusters of single-nucleotide polymorphisms at 6p21.2 and at 6p21.32 at the human leukocyte antigen (HLA) region. Imputation of HLA alleles revealed an independent association with DRB1*01:01 (odds ratio (OR)=2.19 (1.53-3.14), P=1.9 × 10(-5)). Two polymorphisms at 6p21.2 LINC00951-LRFN2 gene region were replicated in a separate data set, and rs4714329 reached genome-wide significance (OR=1.59 (1.37-1.85), P=1.6 × 10(-9)) in the meta-analysis. The risk allele also associated with antisocial features in the general population conditioned for severe problems in childhood family (β=0.68, P=0.012). Functional analysis in brain tissue in open access GTEx and Braineac databases revealed eQTL associations of rs4714329 with LINC00951 and LRFN2 in cerebellum. In humans, LINC00951 and LRFN2 are both expressed in the brain, especially in the frontal cortex, which is intriguing considering the role of the frontal cortex in behavior and the neuroanatomical findings of reduced gray matter volume in ASPD. To our knowledge, this is the first study showing genome-wide significant and replicable findings on genetic variants associated with any personality disorder. PMID:27598967

Eukaryotic chromosomes initiate DNA synthesis from multiple replication origins. The machinery that initiates DNA synthesis is highly conserved, but the sites where the replication initiation proteins bind have diverged significantly. Functional comparative genomics is an obvious approach to study the evolution of replication origins. However, to date, the Saccharomyces cerevisiae replication origin map is the only genome map available. Using an iterative approach that combines computational prediction and functional validation, we have generated a high-resolution genome-wide map of DNA replication origins in Kluyveromyces lactis. Unlike other yeasts or metazoans, K. lactis autonomously replicating sequences (KlARSs) contain a 50 bp consensus motif suggestive of a dimeric structure. This motif is necessary and largely sufficient for initiation and was used to dependably identify 145 of the up to 156 non-repetitive intergenic ARSs projected for the K. lactis genome. Though similar in genome sizes, K. lactis has half as many ARSs as its distant relative S. cerevisiae. Comparative genomic analysis shows that ARSs in K. lactis and S. cerevisiae preferentially localize to non-syntenic intergenic regions, linking ARSs with loci of accelerated evolutionary change. PMID:20485513

Parent-of-origin effects (POE) such as genomic imprinting influence growth and body composition in livestock, rodents, and humans. Here, we report the results of a genomescan to detect quantitative trait loci (QTL) with POE on growth and carcass traits in Angus × Brahman cattle crossbreds. We identified 24 POE-QTL on 15 Bos taurus autosomes (BTAs) of which six were significant at 5% genome-wide (GW) level and 18 at the 5% chromosome-wide (CW) significance level. Six QTL were paternally expressed while 15 were maternally expressed. Three QTL influencing post-weaning growth map to the proximal end of BTA2 (linkage region of 0-9 cM; genomic region of 5.0-10.8 Mb), for which only one imprinted ortholog is known so far in the human and mouse genomes, and therefore may potentially represent a novel imprinted region. The detected QTL individually explained 1.4 ∼ 5.1% of each trait's phenotypic variance. Comparative in silico analysis of bovine genomic locations show that 32 out of 1,442 known mammalian imprinted genes from human and mouse homologs map to the identified QTL regions. Although several of the 32 genes have been associated with quantitative traits in cattle, only two (GNAS and PEG3) have experimental proof of being imprinted in cattle. These results lend additional support to recent reports that POE on quantitative traits in mammals may be more common than previously thought, and strengthen the need to identify and experimentally validate cattle orthologs of imprinted genes so as to investigate their effects on quantitative traits. PMID:22303340

Parent-of-origin effects (POE) such as genomic imprinting influence growth and body composition in livestock, rodents, and humans. Here, we report the results of a genomescan to detect quantitative trait loci (QTL) with POE on growth and carcass traits in Angus × Brahman cattle crossbreds. We identified 24 POE–QTL on 15 Bos taurus autosomes (BTAs) of which six were significant at 5% genome-wide (GW) level and 18 at the 5% chromosome-wide (CW) significance level. Six QTL were paternally expressed while 15 were maternally expressed. Three QTL influencing post-weaning growth map to the proximal end of BTA2 (linkage region of 0–9 cM; genomic region of 5.0–10.8 Mb), for which only one imprinted ortholog is known so far in the human and mouse genomes, and therefore may potentially represent a novel imprinted region. The detected QTL individually explained 1.4 ∼ 5.1% of each trait’s phenotypic variance. Comparative in silico analysis of bovine genomic locations show that 32 out of 1,442 known mammalian imprinted genes from human and mouse homologs map to the identified QTL regions. Although several of the 32 genes have been associated with quantitative traits in cattle, only two (GNAS and PEG3) have experimental proof of being imprinted in cattle. These results lend additional support to recent reports that POE on quantitative traits in mammals may be more common than previously thought, and strengthen the need to identify and experimentally validate cattle orthologs of imprinted genes so as to investigate their effects on quantitative traits. PMID:22303340

De novo mutations (DNMs) are important in Autism Spectrum Disorder (ASD), but so far analyses have mainly been on the ~1.5% of the genome encoding genes. Here, we performed whole genome sequencing (WGS) of 200 ASD parent-child trios and characterized germline and somatic DNMs. We confirmed that the majority of germline DNMs (75.6%) originated from the father, and these increased significantly with paternal age only (p=4.2×10−10). However, when clustered DNMs (those within 20kb) were found in ASD, not only did they mostly originate from the mother (p=7.7×10−13), but they could also be found adjacent to de novo copy number variations (CNVs) where the mutation rate was significantly elevated (p=2.4×10−24). By comparing DNMs detected in controls, we found a significant enrichment of predicted damaging DNMs in ASD cases (p=8.0×10−9; OR=1.84), of which 15.6% (p=4.3×10−3) and 22.5% (p=7.0×10−5) were in the non-coding or genic non-coding, respectively. The non-coding elements most enriched for DNM were untranslated regions of genes, boundaries involved in exon-skipping and DNase I hypersensitive regions. Using microarrays and a novel outlier detection test, we also found aberrant methylation profiles in 2/185 (1.1%) of ASD cases. These same individuals carried independently identified DNMs in the ASD risk- and epigenetic- genes DNMT3A and ADNP. Our data begins to characterize different genome-wide DNMs, and highlight the contribution of non-coding variants, to the etiology of ASD. PMID:27525107

The bacterial composition of the human fecal microbiome is influenced by many lifestyle factors, notably diet. It is less clear, however, what role host genetics plays in dictating the composition of bacteria living in the gut. In this study, we examined the association of ~200K host genotypes with the relative abundance of fecal bacterial taxa in a founder population, the Hutterites, during two seasons (n = 91 summer, n = 93 winter, n = 57 individuals collected in both). These individuals live and eat communally, minimizing variation due to environmental exposures, including diet, which could potentially mask small genetic effects. Using a GWAS approach that takes into account the relatedness between subjects, we identified at least 8 bacterial taxa whose abundances were associated with single nucleotide polymorphisms in the host genome in each season (at genome-wide FDR of 20%). For example, we identified an association between a taxon known to affect obesity (genus Akkermansia) and a variant near PLD1, a gene previously associated with body mass index. Moreover, we replicate a previously reported association from a quantitative trait locus (QTL) mapping study of fecal microbiome abundance in mice (genus Lactococcus, rs3747113, P = 3.13 x 10-7). Finally, based on the significance distribution of the associated microbiome QTLs in our study with respect to chromatin accessibility profiles, we identified tissues in which host genetic variation may be acting to influence bacterial abundance in the gut. PMID:26528553

Cytosine DNA methylation (CDM) is a highly abundant, heritable but reversible chemical modification to the genome. Herein, a machine learning approach was applied to analyze the accumulation of epigenetic marks in methylomes of 152 ecotypes and 85 silencing mutants of Arabidopsis thaliana. In an information-thermodynamics framework, two measurements were used: (1) the amount of information gained/lost with the CDM changes I R and (2) the uncertainty of not observing a SNP L C R . We hypothesize that epigenetic marks are chromosomal footprints accounting for different ontogenetic and phylogenetic histories of individual populations. A machine learning approach is proposed to verify this hypothesis. Results support the hypothesis by the existence of discriminatory information (DI) patterns of CDM able to discriminate between individuals and between individual subpopulations. The statistical analyses revealed a strong association between the topologies of the structured population of Arabidopsis ecotypes based on I R and on LCR, respectively. A statistical-physical relationship between I R and L C R was also found. Results to date imply that the genome-wide distribution of CDM changes is not only part of the biological signal created by the methylation regulatory machinery, but ensures the stability of the DNA molecule, preserving the integrity of the genetic message under continuous stress from thermal fluctuations in the cell environment. PMID:27322251

The bacterial composition of the human fecal microbiome is influenced by many lifestyle factors, notably diet. It is less clear, however, what role host genetics plays in dictating the composition of bacteria living in the gut. In this study, we examined the association of ~200K host genotypes with the relative abundance of fecal bacterial taxa in a founder population, the Hutterites, during two seasons (n = 91 summer, n = 93 winter, n = 57 individuals collected in both). These individuals live and eat communally, minimizing variation due to environmental exposures, including diet, which could potentially mask small genetic effects. Using a GWAS approach that takes into account the relatedness between subjects, we identified at least 8 bacterial taxa whose abundances were associated with single nucleotide polymorphisms in the host genome in each season (at genome-wide FDR of 20%). For example, we identified an association between a taxon known to affect obesity (genus Akkermansia) and a variant near PLD1, a gene previously associated with body mass index. Moreover, we replicate a previously reported association from a quantitative trait locus (QTL) mapping study of fecal microbiome abundance in mice (genus Lactococcus, rs3747113, P = 3.13 x 10−7). Finally, based on the significance distribution of the associated microbiome QTLs in our study with respect to chromatin accessibility profiles, we identified tissues in which host genetic variation may be acting to influence bacterial abundance in the gut. PMID:26528553

Summary: High-throughput genotyping and sequencing technologies facilitate studies of complex genetic traits and provide new research opportunities. The increasing popularity of genome-wide association studies (GWAS) leads to the discovery of new associated loci and a better understanding of the genetic architecture underlying not only diseases, but also other monogenic and complex phenotypes. Several softwares are available for performing GWAS analyses, R environment being one of them. Results: We present cgmisc, an R package that enables enhanced data analysis and visualization of results from GWAS. The package contains several utilities and modules that complement and enhance the functionality of the existing software. It also provides several tools for advanced visualization of genomic data and utilizes the power of the R language to aid in preparation of publication-quality figures. Some of the package functions are specific for the domestic dog (Canis familiaris) data. Availability and implementation: The package is operating system-independent and is available from: https://github.com/cgmisc-team/cgmisc Contact: marcin.kierczak@imbim.uu.se Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26249815

Motivation: As genomics moves into the clinic, there has been much interest in using this medical data for research. At the same time the use of such data raises many privacy concerns. These circumstances have led to the development of various methods to perform genome-wide association studies (GWAS) on patient records while ensuring privacy. In particular, there has been growing interest in applying differentially private techniques to this challenge. Unfortunately, up until now all methods for finding high scoring SNPs in a differentially private manner have had major drawbacks in terms of either accuracy or computational efficiency. Results: Here we overcome these limitations with a substantially modified version of the neighbor distance method for performing differentially private GWAS, and thus are able to produce a more viable mechanism. Specifically, we use input perturbation and an adaptive boundary method to overcome accuracy issues. We also design and implement a convex analysis based algorithm to calculate the neighbor distance for each SNP in constant time, overcoming the major computational bottleneck in the neighbor distance method. It is our hope that methods such as ours will pave the way for more widespread use of patient data in biomedical research. Availability and implementation: A python implementation is available at http://groups.csail.mit.edu/cb/DiffPriv/. Contact: bab@csail.mit.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26769317

Will genome-wide association studies (GWAS) ‘work’ for pharmacogenetics research? This question was the topic of a staged debate, with pro and con sides, aimed to bring out the strengths and weaknesses of GWAS for pharmacogenetics studies. After a full day of seminars at the Fifth Statistical Analysis Workshop of the Pharmacogenetics Research Network, the lively debate was held – appropriately – at Goonies Comedy Club in Rochester (MN, USA). The pro side emphasized that the many GWAS successes for identifying genetic variants associated with disease risk show that it works; that the current genotyping platforms are efficient, with good imputation methods to fill in missing data; that its global assessment is always a success even if no significant associations are detected; and that genetic effects are likely to be large because humans have not evolved in a drug-therapy environment. By contrast, the con side emphasized that we have limited knowledge of the complexity of the genome; limited clinical phenotypes compromise studies; the likely multifactorial nature of drug response clouding the small genetic effects; and limitations of sample size and replication studies in pharmacogenetic studies. Lively and insightful discussions emphasized further research efforts that might benefit GWAS in pharmacogenetics. PMID:20235786

Human metapneumovirus (HMPV) has been described as an important etiologic agent of upper and lower respiratory tract infections, especially in young children and the elderly. Most of school-aged children might be introduced to HMPVs, and exacerbation with other viral or bacterial super-infection is common. However, our understanding of the molecular evolution of HMPVs remains limited. To address the comprehensive evolutionary dynamics of HMPVs, we report a genome-wide analysis of the eight genes (N, P, M, F, M2, SH, G, and L) using 103 complete genome sequences. Phylogenetic reconstruction revealed that the eight genes from one HMPV strain grouped into the same genetic group among the five distinct lineages (A1, A2a, A2b, B1, and B2). A few exceptions of phylogenetic incongruence might suggest past recombination events, and we detected possible recombination breakpoints in the F, SH, and G coding regions. The five genetic lineages of HMPVs shared quite remote common ancestors ranging more than 220 to 470 years of age with the most recent origins for the A2b sublineage. Purifying selection was common, but most protein genes except the F and M2-2 coding regions also appeared to experience episodic diversifying selection. Taken together, these suggest that the five lineages of HMPVs maintain their individual evolutionary dynamics and that recombination and selection forces might work on shaping the genetic diversity of HMPVs. PMID:27046055

Although the analysis of linkage disequilibrium (LD) plays a central role in many areas of population genetics, the sampling variance of LD is known to be very large with high sensitivity to numbers of nucleotide sites and individuals sampled. Here we show that a genome-wide analysis of the distribution of heterozygous sites within a single diploid genome can yield highly informative patterns of LD as a function of physical distance. The proposed statistic, the correlation of zygosity, is closely related to the conventional population-level measure of LD, but is agnostic with respect to allele frequencies and hence likely less prone to outlier artifacts. Application of the method to several vertebrate species leads to the conclusion that >80% of recombination events are typically resolved by gene-conversion-like processes unaccompanied by crossovers, with the average lengths of conversion patches being on the order of one to several kilobases in length. Thus, contrary to common assumptions, the recombination rate between sites does not scale linearly with distance, often even up to distances of 100 kb. In addition, the amount of LD between sites separated by <200 bp is uniformly much greater than can be explained by the conventional neutral model, possibly because of the nonindependent origin of mutations within this spatial scale. These results raise questions about the application of conventional population-genetic interpretations to LD on short spatial scales and also about the use of spatial patterns of LD to infer demographic histories. PMID:24948778

Cytosine DNA methylation (CDM) is a highly abundant, heritable but reversible chemical modification to the genome. Herein, a machine learning approach was applied to analyze the accumulation of epigenetic marks in methylomes of 152 ecotypes and 85 silencing mutants of Arabidopsis thaliana. In an information-thermodynamics framework, two measurements were used: (1) the amount of information gained/lost with the CDM changes IR and (2) the uncertainty of not observing a SNP LCR. We hypothesize that epigenetic marks are chromosomal footprints accounting for different ontogenetic and phylogenetic histories of individual populations. A machine learning approach is proposed to verify this hypothesis. Results support the hypothesis by the existence of discriminatory information (DI) patterns of CDM able to discriminate between individuals and between individual subpopulations. The statistical analyses revealed a strong association between the topologies of the structured population of Arabidopsis ecotypes based on IR and on LCR, respectively. A statistical-physical relationship between IR and LCR was also found. Results to date imply that the genome-wide distribution of CDM changes is not only part of the biological signal created by the methylation regulatory machinery, but ensures the stability of the DNA molecule, preserving the integrity of the genetic message under continuous stress from thermal fluctuations in the cell environment. PMID:27322251

The genome-wide association study (GWAS) has become an established scientific method that provides an unbiased screen for genetic loci potentially associated with phenotypes of clinical interest, such as chronic kidney disease (CKD). Thus, GWAS provides opportunities to gain new perspectives regarding the genetic architecture of CKD progression by identifying new candidate genes and targets for intervention. As such, it has become an important arm of translational science providing a complementary line of investigation to identify novel therapeutics to treat CKD. In this review, we describe the method and the challenges of performing GWAS in the pediatric CKD population. We also provide an overview of successful GWAS for kidney disease, and we discuss the established pediatric CKD cohorts in North America and Europe that are poised to identify genetic risk variants associated with CKD progression. PMID:26490952

Summary The use of ultrafast laser pulses in surgery has allowed for unprecedented precision with minimal collateral damage to surrounding tissues. For these reasons, ultrafast laser nanosurgery, as an injury model, has gained tremendous momentum in experimental biology ranging from in-vitro manipulations of subcellular structures to in-vivo studies in whole living organisms. For example, femtosecond laser nanosurgery on such model organism as the nematode Caenorhabditis elegans (C. elegans) has opened new opportunities for in-vivo nerve regeneration studies. Meanwhile, the development of novel microfluidic devices has brought the control in experimental environment to the level required for precise nanosurgery in various animal models. Merging microfluidics and laser nanosurgery has recently improved the specificities and increased the speed of laser surgeries enabling fast genome-wide screenings that can more readily decode the genetic map of various biological processes. PMID:19278850

Domestic animals are invaluable resources for study of the molecular architecture of complex traits. Although the mapping of quantitative trait loci (QTL) responsible for economically important traits in domestic animals has achieved remarkable results in recent decades, not all of the genetic variation in the complex traits has been captured because of the low density of markers used in QTL mapping studies. The genomewide association study (GWAS), which utilizes high-density single-nucleotide polymorphism (SNP), provides a new way to tackle this issue. Encouraging achievements in dissection of the genetic mechanisms of complex diseases in humans have resulted from the use of GWAS. At present, GWAS has been applied to the field of domestic animal breeding and genetics, and some advances have been made. Many genes or markers that affect economic traits of interest in domestic animals have been identified. In this review, advances in the use of GWAS in domestic animals are described. PMID:22958308

The comb, as a secondary sexual character, is an important trait in chicken. Indicators of comb length (CL), comb height (CH), and comb weight (CW) are often selected in production. DNA-based marker-assisted selection could help chicken breeders to accelerate genetic improvement for comb or related economic characters by early selection. Although a number of quantitative trait loci (QTL) and candidate genes have been identified with advances in molecular genetics, candidate genes underlying comb traits are limited. The aim of the study was to use genome-wide association (GWA) studies by 600 K Affymetrix chicken SNP arrays to detect genes that are related to comb, using an F2 resource population. For all comb characters, comb exhibited high SNP-based heritability estimates (0.61–0.69). Chromosome 1 explained 20.80% genetic variance, while chromosome 4 explained 6.89%. Independent univariate genome-wide screens for each character identified 127, 197, and 268 novel significant SNPs with CL, CH, and CW, respectively. Three candidate genes, VPS36, AR, and WNT11B, were determined to have a plausible function in all comb characters. These genes are important to the initiation of follicle development, gonadal growth, and dermal development, respectively. The current study provides the first GWA analysis for comb traits. Identification of the genetic basis as well as promising candidate genes will help us understand the underlying genetic architecture of comb development and has practical significance in breeding programs for the selection of comb as an index for sexual maturity or reproduction. PMID:27427764

Genome-wide association studies (GWAS) have emerged as the method of choice for identifying common variants affecting complex disease. In a GWAS, particular attention is placed, for obvious reasons, on single-nucleotide polymorphisms (SNPs) that exceed stringent genome-wide significance thresholds. However, it is expected that many SNPs with only nominal evidence of association (e.g., P < 0.05) truly influence disease. Efforts to extract additional biological information from entire GWAS datasets have primarily focused on pathway-enrichment analyses. However, these methods suffer from a number of limitations and typically fail to lead to testable hypotheses. To evaluate alternative approaches, we performed a systems-level analysis of GWAS data using weighted gene coexpression network analysis. A weighted gene coexpression network was generated for 1918 genes harboring SNPs that displayed nominal evidence of association (P ≤ 0.05) from a GWAS of bone mineral density (BMD) using microarray data on circulating monocytes isolated from individuals with extremely low or high BMD. Thirteen distinct gene modules were identified, each comprising coexpressed and highly interconnected GWAS genes. Through the characterization of module content and topology, we illustrate how network analysis can be used to discover disease-associated subnetworks and characterize novel interactions for genes with a known role in the regulation of BMD. In addition, we provide evidence that network metrics can be used as a prioritizing tool when selecting genes and SNPs for replication studies. Our results highlight the advantages of using systems-level strategies to add value to and inform GWAS. PMID:23316444

Contemporary Jews comprise an aggregate of ethno-religious communities whose worldwide members identify with each other through various shared religious, historical and cultural traditions. Historical evidence suggests common origins in the Middle East, followed by migrations leading to the establishment of communities of Jews in Europe, Africa and Asia, in what is termed the Jewish Diaspora. This complex demographic history imposes special challenges in attempting to address the genetic structure of the Jewish people. Although many genetic studies have shed light on Jewish origins and on diseases prevalent among Jewish communities, including studies focusing on uniparentally and biparentally inherited markers, genome-wide patterns of variation across the vast geographic span of Jewish Diaspora communities and their respective neighbours have yet to be addressed. Here we use high-density bead arrays to genotype individuals from 14 Jewish Diaspora communities and compare these patterns of genome-wide diversity with those from 69 Old World non-Jewish populations, of which 25 have not previously been reported. These samples were carefully chosen to provide comprehensive comparisons between Jewish and non-Jewish populations in the Diaspora, as well as with non-Jewish populations from the Middle East and north Africa. Principal component and structure-like analyses identify previously unrecognized genetic substructure within the Middle East. Most Jewish samples form a remarkably tight subcluster that overlies Druze and Cypriot samples but not samples from other Levantine populations or paired Diaspora host populations. In contrast, Ethiopian Jews (Beta Israel) and Indian Jews (Bene Israel and Cochini) cluster with neighbouring autochthonous populations in Ethiopia and western India, respectively, despite a clear paternal link between the Bene Israel and the Levant. These results cast light on the variegated genetic architecture of the Middle East, and trace the origins

Anorexia nervosa (AN) is a complex and heritable eating disorder characterized by dangerously low body weight. Neither candidate gene studies nor an initial genome-wide association study (GWAS) have yielded significant and replicated results. We performed a GWAS in 2907 cases with AN from 14 countries (15 sites) and 14 860 ancestrally matched controls as part of the Genetic Consortium for AN (GCAN) and the Wellcome Trust Case Control Consortium 3 (WTCCC3). Individual association analyses were conducted in each stratum and meta-analyzed across all 15 discovery data sets. Seventy-six (72 independent) single nucleotide polymorphisms were taken forward for in silico (two data sets) or de novo (13 data sets) replication genotyping in 2677 independent AN cases and 8629 European ancestry controls along with 458 AN cases and 421 controls from Japan. The final global meta-analysis across discovery and replication data sets comprised 5551 AN cases and 21 080 controls. AN subtype analyses (1606 AN restricting; 1445 AN binge-purge) were performed. No findings reached genome-wide significance. Two intronic variants were suggestively associated: rs9839776 (P=3.01 × 10(-7)) in SOX2OT and rs17030795 (P=5.84 × 10(-6)) in PPP3CA. Two additional signals were specific to Europeans: rs1523921 (P=5.76 × 10(-)(6)) between CUL3 and FAM124B and rs1886797 (P=8.05 × 10(-)(6)) near SPATA13. Comparing discovery with replication results, 76% of the effects were in the same direction, an observation highly unlikely to be due to chance (P=4 × 10(-6)), strongly suggesting that true findings exist but our sample, the largest yet reported, was underpowered for their detection. The accrual of large genotyped AN case-control samples should be an immediate priority for the field. PMID:24514567

The comb, as a secondary sexual character, is an important trait in chicken. Indicators of comb length (CL), comb height (CH), and comb weight (CW) are often selected in production. DNA-based marker-assisted selection could help chicken breeders to accelerate genetic improvement for comb or related economic characters by early selection. Although a number of quantitative trait loci (QTL) and candidate genes have been identified with advances in molecular genetics, candidate genes underlying comb traits are limited. The aim of the study was to use genome-wide association (GWA) studies by 600 K Affymetrix chicken SNP arrays to detect genes that are related to comb, using an F2 resource population. For all comb characters, comb exhibited high SNP-based heritability estimates (0.61-0.69). Chromosome 1 explained 20.80% genetic variance, while chromosome 4 explained 6.89%. Independent univariate genome-wide screens for each character identified 127, 197, and 268 novel significant SNPs with CL, CH, and CW, respectively. Three candidate genes, VPS36, AR, and WNT11B, were determined to have a plausible function in all comb characters. These genes are important to the initiation of follicle development, gonadal growth, and dermal development, respectively. The current study provides the first GWA analysis for comb traits. Identification of the genetic basis as well as promising candidate genes will help us understand the underlying genetic architecture of comb development and has practical significance in breeding programs for the selection of comb as an index for sexual maturity or reproduction. PMID:27427764

The recent successes of genome-wide association studies (GWAS) have renewed interest in genome-environment-wide interaction studies (GEWIS) to discover genetic factors that modulate penetrance of environmental exposures to human diseases. Indeed, gene-environment interactions (GxE), which have not been emphasized in the GWAS era, could be a source contributing to the missing heritability, a major bottleneck limiting continuing GWAS successes. In this manuscript, we describe a design and analytic strategy to focus on GxE using only exposed subjects, dubbed as e-GEWIS. Operationally, an e-GEWIS analysis is equivalent to a GWAS analysis on exposed subjects only, and it has actually been used in some earlier GWAS without being explicitly identified as such. Through both analytics and simulations, e-GEWIS have been shown better efficiency than the usual cross-product-based analysis of GxE interaction with both cases and controls (cc-GEWIS), and they have comparable efficiency to case-only analysis of GxE (c-GEWIS), with potentially smaller sample sizes. The formalization of e-GEWIS here provides a theoretical basis to legitimize this framework for routine investigation of GxE, for more efficient GxE study designs, and for improvement of reproducibility in replicating GEWIS findings. As an illustration, we apply e-GEWIS to a lung cancer GWAS dataset to perform a GEWIS, focusing on gene and smoking interaction. The e-GEWIS analysis successfully uncovered positive genetic associations on chromosome 15 among current smokers, suggesting a gene-smoking interaction. While this signal was detected earlier, the current finding here serves as a positive control in support of this e-GEWIS strategy. PMID:25694100

Genetic studies of autism over the past decade suggest a complex landscape of multiple genes. In the face of this heterogeneity, studies that include large extended pedigrees may offer valuable insights, as the relatively few susceptibility genes within single large families may be more easily discerned. This genome-wide screen of 70 families includes 20 large extended pedigrees of 6-9 generations, 6 moderate-sized families of 4-5 generations and 44 smaller families of 2-3 generations. The Center for Inherited Disease Research (CIDR) provided genotyping using the Illumina Linkage Panel 12, a 6K single-nucleotide polymorphism (SNP) platform. Results from 192 subjects with an autism spectrum disorder (ASD) and 461 of their relatives revealed genome-wide significance on chromosome 15q, with three possibly distinct peaks: 15q13.1-q14 (heterogeneity LOD (HLOD)=4.09 at 29 459 872 bp); 15q14-q21.1 (HLOD=3.59 at 36 837 208 bp); and 15q21.1-q22.2 (HLOD=5.31 at 55 629 733 bp). Two of these peaks replicate earlier findings. There were additional suggestive results on chromosomes 2p25.3-p24.1 (HLOD=1.87), 7q31.31-q32.3 (HLOD=1.97) and 13q12.11-q12.3 (HLOD=1.93). Affected subjects in families supporting the linkage peaks found in this study did not reveal strong evidence for distinct phenotypic subgroups. PMID:19455147

Anorexia nervosa (AN) is a complex and heritable eating disorder characterized by dangerously low body weight. Neither candidate gene studies nor an initial genomewide association study (GWAS) have yielded significant and replicated results. We performed a GWAS in 2,907 cases with AN from 14 countries (15 sites) and 14,860 ancestrally matched controls as part of the Genetic Consortium for AN (GCAN) and the Wellcome Trust Case Control Consortium 3 (WTCCC3). Individual association analyses were conducted in each stratum and meta-analyzed across all 15 discovery datasets. Seventy-six (72 independent) SNPs were taken forward for in silico (two datasets) or de novo (13 datasets) replication genotyping in 2,677 independent AN cases and 8,629 European ancestry controls along with 458 AN cases and 421 controls from Japan. The final global meta-analysis across discovery and replication datasets comprised 5,551 AN cases and 21,080 controls. AN subtype analyses (1,606 AN restricting; 1,445 AN binge-purge) were performed. No findings reached genome-wide significance. Two intronic variants were suggestively associated: rs9839776 (P=3.01×10-7) in SOX2OT and rs17030795 (P=5.84×10-6) in PPP3CA. Two additional signals were specific to Europeans: rs1523921 (P=5.76×10-6) between CUL3 and FAM124B and rs1886797 (P=8.05×10-6) near SPATA13. Comparing discovery to replication results, 76% of the effects were in the same direction, an observation highly unlikely to be due to chance (P=4×10-6), strongly suggesting that true findings exist but that our sample, the largest yet reported, was underpowered for their detection. The accrual of large genotyped AN case-control samples should be an immediate priority for the field. PMID:24514567

Anorexia nervosa (AN) is a complex and heritable eating disorder characterized by dangerously low body weight. Neither candidate gene studies nor an initial genomewide association study (GWAS) have yielded significant and replicated results. We performed a GWAS in 2,907 cases with AN from 14 countries (15 sites) and 14,860 ancestrally matched controls as part of the Genetic Consortium for AN (GCAN) and the Wellcome Trust Case Control Consortium 3 (WTCCC3). Individual association analyses were conducted in each stratum and meta-analyzed across all 15 discovery datasets. Seventy-six (72 independent) SNPs were taken forward for in silico (two datasets) or de novo (13 datasets) replication genotyping in 2,677 independent AN cases and 8,629 European ancestry controls along with 458 AN cases and 421 controls from Japan. The final global meta-analysis across discovery and replication datasets comprised 5,551 AN cases and 21,080 controls. AN subtype analyses (1,606 AN restricting; 1,445 AN binge-purge) were performed. No findings reached genome-wide significance. Two intronic variants were suggestively associated: rs9839776 (P=3.01×10−7) in SOX2OT and rs17030795 (P=5.84×10−6) in PPP3CA. Two additional signals were specific to Europeans: rs1523921 (P=5.76×10−6) between CUL3 and FAM124B and rs1886797 (P=8.05×10−6) near SPATA13. Comparing discovery to replication results, 76% of the effects were in the same direction, an observation highly unlikely to be due to chance (P= 4×10−6), strongly suggesting that true findings exist but that our sample, the largest yet reported, was underpowered for their detection. The accrual of large genotyped AN case-control samples should be an immediate priority for the field. PMID:21079607

Aims To understand the role of ancestral genomic background in substance dependence (SD) genome-wide association studies (GWAS), we analyzed population diversity at genetic loci associated with SD traits and evaluated its effect on GWAS outcomes. Materials & methods We investigated 24 genes with variants associated with SD by GWAS; and 82 loci with putative subordinate roles with respect to SD-associated genes. Results We observed high ancestry-related frequency differences in common functional alleles in GWAS relevant genes and their interactive partners. Common functional alleles with high frequency differences demonstrated significant effects on the GWAS outcomes. Conclusion Population differences in SD GWAS outcomes seem not to be influenced by general variation across the genome, but by ancestry-related local haplotype structures at SD-associated loci. PMID:26267224

Background To explain the vastly different phenotypes exhibited by the same organism under different conditions, it is essential that we understand how the organism's genes are coordinately regulated. While there are many excellent tools for predicting sequences encoding proteins or RNA genes, few algorithms exist to predict regulatory sequences on a genomewide scale with no prior information. Results To identify motifs involved in the control of transcription, an algorithm was developed that searches upstream of operons for improbably frequent dimers. The algorithm was applied to the B. subtilis genome, which is predicted to encode for approximately 200 DNA binding proteins. The dimers found to be over-represented could be clustered into 317 distinct groups, each thought to represent a class of motifs uniquely recognized by some transcription factor. For each cluster of dimers, a representative weight matrix was derived and scored over the regions upstream of the operons to predict the sites recognized by the cluster's factor, and a putative regulon of the operons immediately downstream of the sites was inferred. The distribution in number of operons per predicted regulon is comparable to that for well characterized transcription factors. The most highly over-represented dimers matched σA, the T-box, and σW sites. We have evidence to suggest that at least 52 of our clusters of dimers represent actual regulatory motifs, based on the groups' weight matrix matches to experimentally characterized sites, the functional similarity of the component operons of the groups' regulons, and the positional biases of the weight matrix matches. All predictions are assigned a significance value, and thresholds are set to avoid false positives. Where possible, we examine our false negatives, drawing examples from known regulatory motifs and regulons inferred from RNA expression data. Conclusions We have demonstrated that in the case of B. subtilis our algorithm allows for the

Gilles de la Tourette Syndrome (TS) is a familial, neuropsychiatric disorder characterized by chronic, intermittent motor and vocal tics. In addition to tics, affected individuals frequently display symptoms such as attention-deficit hyperactivity disorder and/or obsessive compulsive disorder. Genetic analyses of family data have suggested that susceptibility to the disorder is most likely due to a single genetic locus with a dominant mode of transmission and reduced penetrance. In the search for genetic linkage for TS, we have collected well-characterized pedigrees with multiple affected individuals on whom extensive diagnostic evaluations have been done. The first stage of our study is to scan the genome systematically using a panel of uniformly spaced (10 to 20 cM), highly polymorphic, microsatellite markers on 5 families segregating TS. To date, 290 markers have been typed and 3,660 non-overlapping cM of the genome have been excluded for possible linkage under the assumption of genetic homogeneity. Because of the possibility of locus heterogeneity overall summed exclusion is not considered tantamount to absolute exclusion of a disease locus in that region. The results from each family are carefully evaluated and a positive lod score in a single family is followed up by typing closely linked markers. Linkage to TS was examined by two-point analysis using the following genetic model: single autosomal dominant gene with gene frequency .003 and maximum penetrance of .99. An age-of-onset correction is included using a linear function increasing from age 2 years to 21 years. A small rate of phenocopies is also incorporated into the model. Only individuals with TS or CMT according to DSM III-R criteria were regarded as affected for the purposes of this summary. Additional markers are being tested to provide coverage at 5 cM intervals. Moreover, we are currently analyzing the data non-parametrically using the Affected-Pedigree-Member Method of linkage analysis.

The overdominant model of heterosis explains the superior phenotype of hybrids by synergistic allelic interaction within heterozygous loci. To map such genetic variation in yeast, we used a population doubling time dataset of Saccharomyces cerevisiae 16 × 16 diallel and searched for major contributing heterotic trait loci (HTL). Heterosis was observed for the majority of hybrids, as they surpassed their best parent growth rate. However, most of the local heterozygous loci identified by genomescan were surprisingly underdominant, i.e., reduced growth. We speculated that in these loci adverse effects on growth resulted from incompatible allelic interactions. To test this assumption, we eliminated these allelic interactions by creating hybrids with local hemizygosity for the underdominant HTLs, as well as for control random loci. Growth of hybrids was indeed elevated for most hemizygous to HTL genes but not for control genes, hence validating the results of our genomescan. Assessing the consequences of local heterozygosity by reciprocal hemizygosity and allele replacement assays revealed the influence of genetic background on the underdominant effects of HTLs. Overall, this genome-wide study on a multi-parental hybrid population provides a strong argument against single gene overdominance as a major contributor to heterosis, and favors the dominance complementation model. PMID:26967146

A genome-wide association study (GWAS) typically is focused on detecting marginal genetic effects. However, many complex traits are likely to be the result of the interplay of genes and environmental factors. These SNPs may have a weak marginal effect and thus unlikely to be detected from a scan of marginal effects, but may be detectable in a gene-environment (G × E) interaction analysis. However, a genome-wide interaction scan (GWIS) using a standard test of G × E interaction is known to have low power, particularly when one corrects for testing multiple SNPs. Two 2-step methods for GWIS have been previously proposed, aimed at improving efficiency by prioritizing SNPs most likely to be involved in a G × E interaction using a screening step. For a quantitative trait, these include a method that screens on marginal effects [Kooperberg and Leblanc, 2008] and a method that screens on variance heterogeneity by genotype [Paré et al., 2010] In this paper, we show that the Paré et al. approach has an inflated false-positive rate in the presence of an environmental marginal effect, and we propose an alternative that remains valid. We also propose a novel 2-step approach that combines the two screening approaches, and provide simulations demonstrating that the new method can outperform other GWIS approaches. Application of this method to a G × Hispanic-ethnicity scan for childhood lung function reveals a SNP near the MARCO locus that was not identified by previous marginal-effect scans. PMID:27230133

Purpose Sharing study data within the research community generates tension between two important goods: promoting scientific goals and protecting the privacy interests of study participants. The present study was designed to explore the perceptions, beliefs, and attitudes of research participants and possible future participants regarding genome-wide association studies (GWAS) and repository-based research. Methods Focus group sessions with (1) current research participants, (2) surrogate decision-makers, and (3) three age-defined cohorts (18–34 years, 35–50, >50). Results Participants expressed a variety of opinions about the acceptability of wide sharing of genetic and phenotypic information for research purposes through large, publicly accessible data repositories. Most believed that making de-identified study data available to the research community is a social good that should be pursued. Privacy and confidentiality concerns were common, though they would not necessarily preclude participation. Many participants voiced reservations about sharing data with for-profit organizations. Conclusions Trust is central in participants’ views regarding GWAS data sharing. Further research is needed to develop governance models that enact the values of stewardship. PMID:20535021

Genomescans represent powerful approaches to investigate the action of natural selection on the genetic variation of natural populations and to better understand local adaptation. This is very useful, for example, in the field of conservation biology and evolutionary biology. Thanks to Next Generation Sequencing, genomic resources are growing exponentially, improving genomescan analyses in non-model species. Thousands of SNPs called using Reduced Representation Sequencing are increasingly used in genomescans. Besides, genome sequences are also becoming increasingly available, allowing better processing of short-read data, offering physical localization of variants, and improving haplotype reconstruction and data imputation. Ultimately, genome sequences are also becoming the raw material for selection inferences. Here, we discuss how the increasing availability of such genomic resources, notably genome sequences, influences the detection of signals of selection. Mainly, increasing data density and having the information of physical linkage data expand genomescans by (i) improving the overall quality of the data, (ii) helping the reconstruction of demographic history for the population studied to decrease false-positive rates and (iii) improving the statistical power of methods to detect the signal of selection. Of particular importance, the availability of a high-quality reference genome can improve the detection of the signal of selection by (i) allowing matching the potential candidate loci to linked coding regions under selection, (ii) rapidly moving the investigation to the gene and function and (iii) ensuring that the highly variable regions of the genomes that include functional genes are also investigated. For all those reasons, using reference genomes in genomescan analyses is highly recommended. PMID:26562485

Susceptibility genes for Alzheimer's disease are proving to be highly challenging to detect and verify. Population heterogeneity may be a significant confounding factor contributing to this difficulty. To increase the power for disease susceptibility gene detection we conducted a genome-wide genetic linkage screen using individuals from the relatively isolated, genetically homogeneous, Amish population. Our genome linkage analysis used a 407 microsatellite marker map (average density 7 cM) to search for autosomal genes linked to dementia in five Amish families from four Midwestern U.S. counties. Our highest two-point lod score (3.01) was observed at marker D4S1548 on chromosome 4q31. Five other regions (10q22, 3q28, 11p13, 4q28, 19p13) also demonstrated suggestive linkage with markers having two-point lod scores >2.0. While two of these regions are novel (4q31 and 11p13), the other regions lie close to regions identified in previous genomescans in other populations. Our results identify regions of the genome that may harbor genes involved in a subset of dementia patients, in particular the North American Amish community. PMID:16389594

The capacity to identify immunogens for vaccine development by genome-wide screening has been markedly enhanced by the availability of complete microbial genome sequences coupled to rapid proteomic and bioinformatic analysis. Critical to this genome-wide screening is in vivo testing in the context o...

Polycomb group (PcG) complexes PRC1 and PRC2 are well known for silencing specific developmental genes. PRC2 is a methyltransferase targeting histone H3K27 and producing H3K27me3, essential for stable silencing. Less well known but quantitatively much more important is the genome-wide role of PRC2 that dimethylates ∼70% of total H3K27. We show that H3K27me2 occurs in inverse proportion to transcriptional activity in most non-PcG target genes and intergenic regions and is governed by opposing roaming activities of PRC2 and complexes containing the H3K27 demethylase UTX. Surprisingly, loss of H3K27me2 results in global transcriptional derepression proportionally greatest in silent or weakly transcribed intergenic and genic regions and accompanied by an increase of H3K27ac and H3K4me1. H3K27me2 therefore sets a threshold that prevents random, unscheduled transcription all over the genome and even limits the activity of highly transcribed genes. PRC1-type complexes also have global roles. Unexpectedly, we find a pervasive distribution of histone H2A ubiquitylated at lysine 118 (H2AK118ub) outside of canonical PcG target regions, dependent on the RING/Sce subunit of PRC1-type complexes. We show, however, that H2AK118ub does not mediate the global PRC2 activity or the global repression and is predominantly produced by a new complex involving L(3)73Ah, a homolog of mammalian PCGF3. PMID:25986499

Genome-wide motif searches identified 1134 genes in the lettuce reference genome of cv. Salinas that are potentially involved in pathogen recognition, of which 385 were predicted to encode nucleotide binding-leucine rich repeat receptor (NLR) proteins. Using a maximum-likelihood approach, we grouped the NLRs into 25 multigene families and 17 singletons. Forty-one percent of these NLR-encoding genes belong to three families, the largest being RGC16 with 62 genes in cv. Salinas. The majority of NLR-encoding genes are located in five major resistance clusters (MRCs) on chromosomes 1, 2, 3, 4, and 8 and cosegregate with multiple disease resistance phenotypes. Most MRCs contain primarily members of a single NLR gene family but a few are more complex. MRC2 spans 73 Mb and contains 61 NLRs of six different gene families that cosegregate with nine disease resistance phenotypes. MRC3, which is 25 Mb, contains 22 RGC21 genes and colocates with Dm13. A library of 33 transgenic RNA interference tester stocks was generated for functional analysis of NLR-encoding genes that cosegregated with disease resistance phenotypes in each of the MRCs. Members of four NLR-encoding families, RGC1, RGC2, RGC21, and RGC12 were shown to be required for 16 disease resistance phenotypes in lettuce. The general composition of MRCs is conserved across different genotypes; however, the specific repertoire of NLR-encoding genes varied particularly of the rapidly evolving Type I genes. These tester stocks are valuable resources for future analyses of additional resistance phenotypes. PMID:26449254

According to current estimates there exist about 20,000 pseudogenes in a mammalian genome. The vast majority of these are disabled and nonfunctional copies of protein-coding genes which, therefore, evolve neutrally. Recent findings that a Makorin1 pseudogene, residing on mouse Chromosome 5, is, indeed, in vivo vital and also evolutionarily preserved, encouraged us to conduct a genome-wide survey for other functional pseudogenes in human, mouse, and chimpanzee. We identify to our knowledge the first examples of conserved pseudogenes common to human and mouse, originating from one duplication predating the human-mouse species split and having evolved as pseudogenes since the species split. Functionality is one possible way to explain the apparently contradictory properties of such pseudogene pairs, i.e., high conservation and ancient origin. The hypothesis of functionality is tested by comparing expression evidence and synteny of the candidates with proper test sets. The tests suggest potential biological function. Our candidate set includes a small set of long-lived pseudogenes whose unknown potential function is retained since before the human-mouse species split, and also a larger group of primate-specific ones found from human-chimpanzee searches. Two processed sequences are notable, their conservation since the human-mouse split being as high as most protein-coding genes; one is derived from the protein Ataxin 7-like 3 (ATX7NL3), and one from the Spinocerebellar ataxia type 1 protein (ATX1). Our approach is comparative and can be applied to any pair of species. It is implemented by a semi-automated pipeline based on cross-species BLAST comparisons and maximum-likelihood phylogeny estimations. To separate pseudogenes from protein-coding genes, we use standard methods, utilizing in-frame disablements, as well as a probabilistic filter based on Ka/Ks ratios. PMID:16680195

We report on a 6-month-old girl with two apparent cell lines; one with trisomy 21, and the other with paternal genome-wide uniparental isodisomy (GWUPiD), identified using single nucleotide polymorphism (SNP) based microarray and microsatellite analysis of polymorphic loci. The patient has Beckwith-Wiedemann syndrome (BWS) due to paternal uniparental disomy (UPD) at chromosome location 11p15 (UPD 11p15), which was confirmed through methylation analysis. Hyperinsulinemic hypoglycemia is present, which is associated with paternal UPD 11p15.5; and she likely has medullary nephrocalcinosis, which is associated with paternal UPD 20, although this was not biochemically confirmed. Angelman syndrome (AS) analysis was negative but this testing is not completely informative; she has no specific features of AS. Clinical features of this patient include: dysmorphic features consistent with trisomy 21, tetralogy of Fallot, hemihypertrophy, swirled skin hyperpigmentation, hepatoblastoma, and Wilms tumor. Her karyotype is 47,XX,+21[19]/46,XX[4], and microarray results suggest that the cell line with trisomy 21 is biparentally inherited and represents 40-50% of the genomic material in the tested specimen. The difference in the level of cytogenetically detected mosaicism versus the level of mosaicism observed via microarray analysis is likely caused by differences in the test methodologies. While a handful of cases of mosaic paternal GWUPiD have been reported, this patient is the only reported case that also involves trisomy 21. Other GWUPiD patients have presented with features associated with multiple imprinted regions, as does our patient. PMID:26219535

RNA molecules of all types fold into complex secondary and tertiary structures that are important for their function and regulation. Structural and catalytic RNAs such as ribosomal RNA (rRNA) and transfer RNA (tRNA) are central players in protein synthesis, and only function through their proper folding into intricate three-dimensional structures. Studies of messenger RNA (mRNA) regulation have also revealed that structural elements embedded within these RNA species are important for the proper regulation of their total level in the transcriptome. More recently, the discovery of microRNAs (miRNAs) and long non-coding RNAs (lncRNAs) has shed light on the importance of RNA structure to genome, transcriptome, and proteome regulation. Due to the relatively small number, high conservation, and importance of structural and catalytic RNAs to all life, much early work in RNA structure analysis mapped out a detailed view of these molecules. Computational and physical methods were used in concert with enzymatic and chemical structure probing to create high-resolution models of these fundamental biological molecules. However, the recent expansion in our knowledge of the importance of RNA structure to coding and regulatory RNAs has left the field in need of faster and scalable methods for high-throughput structural analysis. To address this, nuclease and chemical RNA structure probing methodologies have been adapted for genome-wide analysis. These methods have been deployed to globally characterize thousands of RNA structures in a single experiment. Here, we review these experimental methodologies for high-throughput RNA structure determination and discuss the insights gained from each approach. PMID:27256381

Genome-wide motif searches identified 1134 genes in the lettuce reference genome of cv. Salinas that are potentially involved in pathogen recognition, of which 385 were predicted to encode nucleotide binding-leucine rich repeat receptor (NLR) proteins. Using a maximum-likelihood approach, we grouped the NLRs into 25 multigene families and 17 singletons. Forty-one percent of these NLR-encoding genes belong to three families, the largest being RGC16 with 62 genes in cv. Salinas. The majority of NLR-encoding genes are located in five major resistance clusters (MRCs) on chromosomes 1, 2, 3, 4, and 8 and cosegregate with multiple disease resistance phenotypes. Most MRCs contain primarily members of a single NLR gene family but a few are more complex. MRC2 spans 73 Mb and contains 61 NLRs of six different gene families that cosegregate with nine disease resistance phenotypes. MRC3, which is 25 Mb, contains 22 RGC21 genes and colocates with Dm13. A library of 33 transgenic RNA interference tester stocks was generated for functional analysis of NLR-encoding genes that cosegregated with disease resistance phenotypes in each of the MRCs. Members of four NLR-encoding families, RGC1, RGC2, RGC21, and RGC12 were shown to be required for 16 disease resistance phenotypes in lettuce. The general composition of MRCs is conserved across different genotypes; however, the specific repertoire of NLR-encoding genes varied particularly of the rapidly evolving Type I genes. These tester stocks are valuable resources for future analyses of additional resistance phenotypes. PMID:26449254

Background Endothelial growth factors including angiopoietin-2 (Ang-2), its soluble receptor Tie-2 (sTie-2) and hepatocyte growth factor (HGF) play important roles in angiogenesis, vascular remodeling, local tumor growth and metastatic potential of various cancers. Circulating levels of these biomarkers have a heritable component (between 13% and 56%), but the underlying genetic variation influencing these biomarker levels is largely unknown. Methods and Results We performed a genome-wide association study for circulating Ang-2, sTie-2, and HGF in 3571 Framingham Heart Study (FHS) participants and assessed replication of the top hits for Ang-2 and sTie-2 in 3184 participants of the Study of Health in Pomerania (SHIP). In multivariable-adjusted models, sTie-2 and HGF concentrations were associated with single nucleotide polymorphisms (SNPs) in the genes encoding the respective biomarkers (top p=2.40×10−65 [rs2273720] and 3.64×10−19 [rs5745687], respectively). Likewise, rs2442517 in the MCPH1 gene (in which the Ang-2 gene is embedded) was associated with Ang-2 levels (p=5.05×10−8 in FHS and 8.39×10−5 in SHIP). Furthermore, SNPs in the AB0 gene were associated with sTie-2 (top SNP rs8176693 with p=1.84×10−33 in FHS; p=2.53×10−30 in SHIP) and Ang-2 (rs8176746 with p=2.07×10−8 in FHS; p=0.001 in SHIP) levels on a genome-wide significant level. The top genetic loci explained between 1.7% (Ang-2) and 11.2% (sTie-2) of the inter-individual variation in biomarker levels. Conclusions Genetic variation contributes to the inter-individual variation in growth factor levels and explains a modest proportion of circulating HGF, Ang-2, and Tie-2. This may potentially contribute to the familial susceptibility to cancer, a premise that warrants further studies. PMID:25552591

Background Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. Results This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. Conclusions We

Genome-wide association study used to be a dream of geneticists years ago, but now it came true. Since the first paper reported the finding of genetic variation contributing to human age-related macular degeneration by genome-wide association study in 2005, a numbers of whole genome studies have been published. The present paper reviewed some common comments in whole genome association study on complex diseases, including achievements of genome-wide association studies on complex traits or diseases, principles of study design, selection of genetic marker in genome, and comparisons of different commercial products for whole genome association study. Finally a newly defined genetic variation, copy number variation, was briefly introduced. This paper also summarized the shortcomings of current genome-wide association studies and perspectives of its future. PMID:18424408

The genetic structure of sheep reflects their domestication and subsequent formation into discrete breeds. Understanding genetic structure is essential for achieving genetic improvement through genome-wide association studies, genomic selection and the dissection of quantitative traits. After identi...

An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions fr...

Recently, Wu and colleagues [1] proposed two novel statistics for genome-wide interaction analysis using case/control or case-only data. In computer simulations, their proposed case/control statistic outperformed competing approaches, including the fast-epistasis option in PLINK and logistic regression analysis under the correct model; however, reasons for its superior performance were not fully explored. Here we investigate the theoretical properties and performance of Wu et al.'s proposed statistics and explain why, in some circumstances, they outperform competing approaches. Unfortunately, we find minor errors in the formulae for their statistics, resulting in tests that have higher than nominal type 1 error. We also find minor errors in PLINK's fast-epistasis and case-only statistics, although theory and simulations suggest that these errors have only negligible effect on type 1 error. We propose adjusted versions of all four statistics that, both theoretically and in computer simulations, maintain correct type 1 error rates under the null hypothesis. We also investigate statistics based on correlation coefficients that maintain similar control of type 1 error. Although designed to test specifically for interaction, we show that some of these previously-proposed statistics can, in fact, be sensitive to main effects at one or both loci, particularly in the presence of linkage disequilibrium. We propose two new “joint effects” statistics that, provided the disease is rare, are sensitive only to genuine interaction effects. In computer simulations we find, in most situations considered, that highest power is achieved by analysis under the correct genetic model. Such an analysis is unachievable in practice, as we do not know this model. However, generally high power over a wide range of scenarios is exhibited by our joint effects and adjusted Wu statistics. We recommend use of these alternative or adjusted statistics and urge caution when using Wu et al

Epigenetic information is available from contemporary organisms, but is difficult to track back in evolutionary time. Here, we show that genome-wide epigenetic information can be gathered directly from next-generation sequence reads of DNA isolated from ancient remains. Using the genome sequence data generated from hair shafts of a 4000-yr-old Paleo-Eskimo belonging to the Saqqaq culture, we generate the first ancient nucleosome map coupled with a genome-wide survey of cytosine methylation levels. The validity of both nucleosome map and methylation levels were confirmed by the recovery of the expected signals at promoter regions, exon/intron boundaries, and CTCF sites. The top-scoring nucleosome calls revealed distinct DNA positioning biases, attesting to nucleotide-level accuracy. The ancient methylation levels exhibited high conservation over time, clustering closely with modern hair tissues. Using ancient methylation information, we estimated the age at death of the Saqqaq individual and illustrate how epigenetic information can be used to infer ancient gene expression. Similar epigenetic signatures were found in other fossil material, such as 110,000- to 130,000-yr-old bones, supporting the contention that ancient epigenomic information can be reconstructed from a deep past. Our findings lay the foundation for extracting epigenomic information from ancient samples, allowing shifts in epialleles to be tracked through evolutionary time, as well as providing an original window into modern epigenomics. PMID:24299735

Genome-wide association (GWA) studies based on GBLUP models are a common practice in animal breeding. However, effect sizes of GWA tests are small, requiring larger sample sizes to enhance power of detection of rare variants. Because of difficulties in increasing sample size in animal populations, one alternative is to implement a meta-analysis (MA), combining information and results from independent GWA studies. Although this methodology has been used widely in human genetics, implementation in animal breeding has been limited. Thus, we present methods to implement a MA of GWA, describing the proper approach to compute weights derived from multiple genomic evaluations based on animal-centric GBLUP models. Application to real datasets shows that MA increases power of detection of associations in comparison with population-level GWA, allowing for population structure and heterogeneity of variance components across populations to be accounted for. Another advantage of MA is that it does not require access to genotype data that is required for a joint analysis. Scripts related to the implementation of this approach, which consider the strength of association as well as the sign, are distributed and thus account for heterogeneity in association phase between QTL and SNPs. Thus, MA of GWA is an attractive alternative to summarizing results from multiple genomic studies, avoiding restrictions with genotype data sharing, definition of fixed effects and different scales of measurement of evaluated traits. PMID:26607299

The San Antonio Family Diabetes/Gallbladder Study was initiated to identify susceptibility genes for type 2 diabetes. Evidence was previously reported of linkage to diabetes on 10q with suggestive evidence on 3p and 9p in a genome-widescan of 440 individuals from 27 pedigrees ascertained through a single diabetic proband. Subsequently, the study was expanded to include 906 individuals from 39 extended Mexican-American pedigrees, two additional examination cycles approximately 5.3 and 7.6 years after baseline, and genotypes for a new set of genome-wide markers. Therefore, we completed a second genome-wide linkage scan. Using information from a participant's most recent exam, the prevalence of diabetes in nonprobands was 21.8%. We performed genome-wide variance components-based genetic analysis on the discrete trait diabetes using a liability model and on the quantitative Martingale residual obtained from modeling age of diabetes diagnosis using Cox proportional hazard models. Controlling for age and age(2), our strongest evidence for linkage to the trait diabetes and the quantitative Martingale residual was on chromosome 3p at marker D3S2406 with multipoint empirical logarithm of odds scores of 1.87 and 3.76, respectively. In summary, we report evidence for linkage to diabetes on chromosome 3p in a region previously identified in at least three independent populations. PMID:16123354

In the poultry industry, aggressive behaviour is a large animal welfare issue all over the world. To date, little is known about the underlying genetics of the aggressive behaviour. Here, we performed a genome-wide association study (GWAS) to explore the genetic mechanism associated with aggressive behaviour in chickens. The GWAS results showed that a total of 33 SNPs were associated with aggressive behaviour traits (P

Genome-wide association studies (GWASs) have been successful in detecting variants correlated with phenotypes of clinical interest. However, the power to detect these variants depends on the number of individuals whose phenotypes are collected, and for phenotypes that are difficult to collect, the sample size might be insufficient to achieve the desired statistical power. The phenotype of interest is often difficult to collect, whereas surrogate phenotypes or related phenotypes are easier to collect and have already been collected in very large samples. This paper demonstrates how we take advantage of these additional related phenotypes to impute the phenotype of interest or target phenotype and then perform association analysis. Our approach leverages the correlation structure between phenotypes to perform the imputation. The correlation structure can be estimated from a smaller complete dataset for which both the target and related phenotypes have been collected. Under some assumptions, the statistical power can be computed analytically given the correlation structure of the phenotypes used in imputation. In addition, our method can impute the summary statistic of the target phenotype as a weighted linear combination of the summary statistics of related phenotypes. Thus, our method is applicable to datasets for which we have access only to summary statistics and not to the raw genotypes. We illustrate our approach by analyzing associated loci to triglycerides (TGs), body mass index (BMI), and systolic blood pressure (SBP) in the Northern Finland Birth Cohort dataset. PMID:27292110

The Roma people, living throughout Europe and West Asia, are a diverse population linked by the Romani language and culture. Previous linguistic and genetic studies have suggested that the Roma migrated into Europe from South Asia about 1,000–1,500 years ago. Genetic inferences about Roma history have mostly focused on the Y chromosome and mitochondrial DNA. To explore what additional information can be learned from genome-wide data, we analyzed data from six Roma groups that we genotyped at hundreds of thousands of single nucleotide polymorphisms (SNPs). We estimate that the Roma harbor about 80% West Eurasian ancestry–derived from a combination of European and South Asian sources–and that the date of admixture of South Asian and European ancestry was about 850 years before present. We provide evidence for Eastern Europe being a major source of European ancestry, and North-west India being a major source of the South Asian ancestry in the Roma. By computing allele sharing as a measure of linkage disequilibrium, we estimate that the migration of Roma out of the Indian subcontinent was accompanied by a severe founder event, which appears to have been followed by a major demographic expansion after the arrival in Europe. PMID:23516520

Yeast RNA polymerase II (Pol II) terminates transcription of coding transcripts through the polyadenylation (pA) pathway and non-coding transcripts through the non-polyadenylation (non-pA) pathway. We have used PAR-CLIP to map the position of Pol II genome-wide in living yeast cells after depletion of components of either the pA or non-pA termination complexes. We show here that Ysh1, responsible for cleavage at the pA site, is required for efficient removal of Pol II from the template. Depletion of Ysh1 from the nucleus does not, however, lead to readthrough transcription. In contrast, depletion of the termination factor Nrd1 leads to widespread runaway elongation of non-pA transcripts. Depletion of Sen1 also leads to readthrough at non-pA terminators, but in contrast to Nrd1, this readthrough is less processive, or more susceptible to pausing. The data presented here provide delineation of in vivo Pol II termination regions and highlight differences in the sequences that signal termination of different classes of non-pA transcripts. PMID:25299594

The Roma people, living throughout Europe and West Asia, are a diverse population linked by the Romani language and culture. Previous linguistic and genetic studies have suggested that the Roma migrated into Europe from South Asia about 1,000-1,500 years ago. Genetic inferences about Roma history have mostly focused on the Y chromosome and mitochondrial DNA. To explore what additional information can be learned from genome-wide data, we analyzed data from six Roma groups that we genotyped at hundreds of thousands of single nucleotide polymorphisms (SNPs). We estimate that the Roma harbor about 80% West Eurasian ancestry-derived from a combination of European and South Asian sources-and that the date of admixture of South Asian and European ancestry was about 850 years before present. We provide evidence for Eastern Europe being a major source of European ancestry, and North-west India being a major source of the South Asian ancestry in the Roma. By computing allele sharing as a measure of linkage disequilibrium, we estimate that the migration of Roma out of the Indian subcontinent was accompanied by a severe founder event, which appears to have been followed by a major demographic expansion after the arrival in Europe. PMID:23516520

Background Commercial sheep raised for mutton grow faster than traditional Chinese sheep breeds. Here, we aimed to evaluate genetic selection among three different types of sheep breed: two well-known commercial mutton breeds and one indigenous Chinese breed. Results We first combined locus-specific branch lengths and di statistical methods to detect candidate regions targeted by selection in the three different populations. The results showed that the genetic distances reached at least medium divergence for each pairwise combination. We found these two methods were highly correlated, and identified many growth-related candidate genes undergoing artificial selection. For production traits, APOBR and FTO are associated with body mass index. For meat traits, ALDOA, STK32B and FAM190A are related to marbling. For reproduction traits, CCNB2 and SLC8A3 affect oocyte development. We also found two well-known genes, GHR (which affects meat production and quality) and EDAR (associated with hair thickness) were associated with German mutton merino sheep. Furthermore, four genes (POL, RPL7, MSL1 and SHISA9) were associated with pre-weaning gain in our previous genome-wide association study. Conclusions Our results indicated that combine locus-specific branch lengths and di statistical approaches can reduce the searching ranges for specific selection. And we got many credible candidate genes which not only confirm the results of previous reports, but also provide a suite of novel candidate genes in defined breeds to guide hybridization breeding. PMID:26083354

In the poultry industry, aggressive behaviour is a large animal welfare issue all over the world. To date, little is known about the underlying genetics of the aggressive behaviour. Here, we performed a genome-wide association study (GWAS) to explore the genetic mechanism associated with aggressive behaviour in chickens. The GWAS results showed that a total of 33 SNPs were associated with aggressive behaviour traits (P

Genome-Wide Association Studies shed light on the identification of genes underlying human diseases and agriculturally important traits. This potential has been shadowed by false positive findings. The Mixed Linear Model (MLM) method is flexible enough to simultaneously incorporate population structure and cryptic relationships to reduce false positives. However, its intensive computational burden is prohibitive in practice, especially for large samples. The newly developed algorithm, FaST-LMM, solved the computational problem, but requires that the number of SNPs be less than the number of individuals to derive a rank-reduced relationship. This restriction potentially leads to less statistical power when compared to using all SNPs. We developed a method to extract a small subset of SNPs and use them in FaST-LMM. This method not only retains the computational advantage of FaST-LMM, but also remarkably increases statistical power even when compared to using the entire set of SNPs. We named the method SUPER (Settlement of MLM Under Progressively Exclusive Relationship) and made it available within an implementation of the GAPIT software package. PMID:25247812

Selenium (Se) is an essential trace element in human nutrition, but its role in certain health conditions, particularly among Se sufficient populations, is controversial. A genome-wide association study (GWAS) of blood Se concentrations previously identified a locus at 5q14 near BHMT. We performed a GW meta-analysis of toenail Se concentrations, which reflect a longer duration of exposure than blood Se concentrations, including 4162 European descendants from four US cohorts. Toenail Se was measured using neutron activation analysis. We identified a GW-significant locus at 5q14 (P < 1 × 10−16), the same locus identified in the published GWAS of blood Se based on independent cohorts. The lead single-nucleotide polymorphism (SNP) explained ∼1% of the variance in toenail Se concentrations. Using GW-summary statistics from both toenail and blood Se, we observed statistical evidence of polygenic overlap (P < 0.001) and meta-analysis of results from studies of either trait (n = 9639) yielded a second GW-significant locus at 21q22.3, harboring CBS (P < 4 × 10−8). Proteins encoded by genes at 5q14 and 21q22.3 function in homocysteine (Hcy) metabolism, and index SNPs for each have previously been associated with betaine and Hcy levels in GWAS. Our findings show evidence of a genetic link between Se and Hcy pathways, both involved in cardiometabolic disease. PMID:25343990

The aim of this study was to evaluate a genomewide association study (GWAS) approach to identify single nucleotide polymorphisms (SNPs) associated with fertility traits (early puberty) in Nellore cattle (Bos indicus). Fifty-five Nellore cows were selected from a herd monitored for early puberty onset (positive pregnancy at 18 months of age). Extremes of this phenotype were selected; 30 and 25 individuals were pregnant and non-pregnant, respectively, at that age. DNA samples were genotyped using a high-density SNP chip (>777.000 SNP). GWAS using a case-control strategy highlighted a number of significant markers based on their proximity with the Bonferroni correction line. Results indicated that chromosomes 5, 6, 9, 10, and 22 were associated with the traits of interest. The most significant SNPs on these chromosomes were rs133039577, rs110013280, rs134702839, rs109551605, and rs41639155. Candidate genes, as well as quantitative trait loci (QTL) previously reported in the Ensembl and Cattle QTLdb databases, were further investigated. Analysis of the regions close to the SNP on chromosomes 9 and 10 revealed that four QTL had been previously classified under the reproduction category. In conclusion, we have identified SNPs in close proximity to genes associated with reproductive traits. Moreover, U6 spliceosomal RNA was present on three different chromosomes, which is possibly associated with age at first calving, suggesting that it might be a strong candidate for future studies. PMID:26909970

Alterations in DNA methylation frequently occur in hepatocellular cancer (HCC). We have previously demonstrated that hypermethylation in candidate genes can be detected in plasma DNA prior to HCC diagnosis. To identify with a genome-wide approach additional genes hypermethylated in HCC that could be used for more accurate analysis of plasma DNA for early diagnosis, we analyzed tumor and adjacent non-tumor tissues from 62 Taiwanese HCC cases using Illumina methylation arrays that screen 26,486 autosomal CpG sites. After Bonferroni adjustment, a total of 2,324 CpG sites significantly differed in methylation level, with 684 CpG sites significantly hypermethylated and 1,640 hypomethylated in tumor compared to non-tumor tissues. Array data were validated with pyrosequencing in a subset of 5 of these genes; correlation coefficients ranged from 0.92 to 0.97. Analysis of plasma DNA from 38 cases demonstrated that 37% to 63% of cases had detectable hypermethylated DNA (≥5% methylation) for these 5 genes individually. At least one of these genes was hypermethylated in 87% of cases, suggesting that measurement of DNA methylation in plasma samples is feasible. The panel of methylated genes indentified in the current study will be further tested in large cohort of prospectively collected samples to determine their utility as early biomarkers of hepatocellular carcinoma. PMID:22234943

Epigenetic changes as well as genetic changes are mechanisms of tumorigenesis. We aimed to identify novel genes that are silenced by DNA hypermethylation in hepatocellular carcinoma (HCC). We screened for genes with promoter DNA hypermethylation using a genome-wide methylation microarray analysis in primary HCC (the discovery set). The microarray analysis revealed that there were 2,670 CpG sites that significantly differed in regards to the methylation level between the tumor and non-tumor liver tissues; 875 were significantly hypermethylated and 1,795 were significantly hypomethylated in the HCC tumors compared to the non‑tumor tissues. Further analyses using methylation-specific PCR, combined with expression analysis, in the validation set of primary HCC showed that, in addition to three known tumor-suppressor genes (APC, CDKN2A, and GSTP1), eight genes (AKR1B1, GRASP, MAP9, NXPE3, RSPH9, SPINT2, STEAP4, and ZNF154) were significantly hypermethylated and downregulated in the HCC tumors compared to the non-tumor liver tissues. Our results suggest that epigenetic silencing of these genes may be associated with HCC. PMID:26883180

Selenium (Se) is an essential trace element in human nutrition, but its role in certain health conditions, particularly among Se sufficient populations, is controversial. A genome-wide association study (GWAS) of blood Se concentrations previously identified a locus at 5q14 near BHMT. We performed a GW meta-analysis of toenail Se concentrations, which reflect a longer duration of exposure than blood Se concentrations, including 4162 European descendants from four US cohorts. Toenail Se was measured using neutron activation analysis. We identified a GW-significant locus at 5q14 (P < 1 × 10(-16)), the same locus identified in the published GWAS of blood Se based on independent cohorts. The lead single-nucleotide polymorphism (SNP) explained ∼1% of the variance in toenail Se concentrations. Using GW-summary statistics from both toenail and blood Se, we observed statistical evidence of polygenic overlap (P < 0.001) and meta-analysis of results from studies of either trait (n = 9639) yielded a second GW-significant locus at 21q22.3, harboring CBS (P < 4 × 10(-8)). Proteins encoded by genes at 5q14 and 21q22.3 function in homocysteine (Hcy) metabolism, and index SNPs for each have previously been associated with betaine and Hcy levels in GWAS. Our findings show evidence of a genetic link between Se and Hcy pathways, both involved in cardiometabolic disease. PMID:25343990

Genome-wide association studies (GWASs) have been a significant technological advance in our ability to evaluate the genetic architecture of complex diseases such as primary biliary cirrhosis (PBC). To date, six large-scale studies have been performed that have identified 27 risk loci in addition to human leukocyte antigen (HLA) associated with PBC. The identified risk variants emphasize important disease concepts; namely, that disturbances in immunoregulatory pathways are important in the pathogenesis of PBC and that such perturbations are shared among a diverse number of autoimmune diseases-suggesting the risk architecture may confer a generalized propensity to autoimmunity not necessarily specific to PBC. Furthermore, the impact of non-HLA risk variants, particularly in genes involved with interleukin-12 signaling, and ethnic variation in conferring susceptibility to PBC have been highlighted. Although GWASs have been a critical stepping stone in understanding common genetic variation contributing to PBC, limitations pertaining to power, sample availability, and strong linkage disequilibrium across genes have left us with an incomplete understanding of the genetic underpinnings of disease pathogenesis. Future efforts to gain insight into this missing heritability, the genetic variation that contributes to important disease outcomes, and the functional consequences of associated variants will be critical if practical clinical translation is to be realized. PMID:26676814

Because human epidermal melanocytes (HEMs) provide critical protection against skin cancer, sunburn, and photoaging, a genome-wide perspective of gene expression in these cells is vital to understanding human skin physiology. In this study we performed high throughput sequencing of HEMs to obtain a complete data set of transcript sizes, abundances, and splicing. As expected, we found that melanocyte specific genes that function in pigmentation were among the highest expressed genes. We analyzed receptor, ion channel and transcription factor gene families to get a better understanding of the cell signalling pathways used by melanocytes. We also performed a comparative transcriptomic analysis of lightly versus darkly pigmented HEMs and found 16 genes differentially expressed in the two pigmentation phenotypes; of those, only one putative melanosomal transporter (SLC45A2) has known function in pigmentation. In addition, we found 166 genes with splice isoforms expressed exclusively in one pigmentation phenotype, 17 of which are genes involved in signal transduction. Our melanocyte transcriptome study provides a comprehensive view and may help identify novel pigmentation genes and potential pharmacological targets. PMID:25451175

The performance of the GeneScan algorithm for gene identification has been improved by incorporation of a directed iterative scanning procedure. Application is made here to the cases of bacterial and organnellar genomes. The sensitivity of gene identification was 100% in Plasmodium falciparum plastid-like genome (35 kb) and in 98% in the Mycoplasma genitalium genome (approximately 580 kb) and the Haemophilus influenzae Rd genome (approximately 1.8 Mb). Sensitivity was found to improve in both the Open Reading Frames (ORFs) which have been identified as genes (by homology or by other methods) and those that are classified as hypothetical. False positive assignments (at the nucleotide level) were 0.25% in H. influenzae genome and 0.3% in M. genitalium. There were no false positive assignments in the plastid-like genome. The agreement between the GeneScan predictions and GeneMark predictions of putative ORFs was 97% in M. genitalium genome and 86% in H. influenzae genome. In terms of an exact match between predicted genes/ORFs and the annotation in the databank, GeneScan performance was evaluated to be between 72% and 90% in different genomes. We predict five putative ORFs that were not annotated earlier in the GenBank files for both M. genitalium and H. influenzae genomes. Our preliminary analysis of the newly sequenced G + C rich genome of Mycobacterium tuberculosis H37Rv also shows comparable sensitivity (99%). PMID:10353188

A mammalian cell utilizes DNA methylation to modulate gene expression in response to environmental changes during development and differentiation. Aberrant DNA methylation changes as a correlate to diseased states like cancer, neurodegenerative conditions and cardiovascular diseases have been documented. Here we show genome-wide DNA methylation changes in macrophages infected with the pathogen M. tuberculosis. Majority of the affected genomic loci were hypermethylated in M. tuberculosis infected THP1 macrophages. Hotspots of differential DNA methylation were enriched in genes involved in immune response and chromatin reorganization. Importantly, DNA methylation changes were observed predominantly for cytosines present in non-CpG dinucleotide context. This observation was consistent with our previous finding that the mycobacterial DNA methyltransferase, Rv2966c, targets non-CpG dinucleotides in the host DNA during M. tuberculosis infection and reiterates the hypothesis that pathogenic bacteria use non-canonical epigenetic strategies during infection. PMID:27112593

A mammalian cell utilizes DNA methylation to modulate gene expression in response to environmental changes during development and differentiation. Aberrant DNA methylation changes as a correlate to diseased states like cancer, neurodegenerative conditions and cardiovascular diseases have been documented. Here we show genome-wide DNA methylation changes in macrophages infected with the pathogen M. tuberculosis. Majority of the affected genomic loci were hypermethylated in M. tuberculosis infected THP1 macrophages. Hotspots of differential DNA methylation were enriched in genes involved in immune response and chromatin reorganization. Importantly, DNA methylation changes were observed predominantly for cytosines present in non-CpG dinucleotide context. This observation was consistent with our previous finding that the mycobacterial DNA methyltransferase, Rv2966c, targets non-CpG dinucleotides in the host DNA during M. tuberculosis infection and reiterates the hypothesis that pathogenic bacteria use non-canonical epigenetic strategies during infection. PMID:27112593

The reproductive performance of bulls has a high impact on the beef cattle industry. Scrotal circumference (SC) is the most recorded reproductive trait in beef herds, and is used as a major selection criterion to improve precocity and fertility. The characterization of genomic regions affecting SC can contribute to the identification of diagnostic markers for reproductive performance and uncover molecular mechanisms underlying complex aspects of bovine reproductive biology. In this paper, we report a genome-widescan for chromosome segments explaining differences in SC, using data of 861 Nellore bulls (Bos indicus) genotyped for over 777,000 single nucleotide polymorphisms. Loci that excel from the genome background were identified on chromosomes 4, 6, 7, 10, 14, 18 and 21. The majority of these regions were previously found to be associated with reproductive and body size traits in cattle. The signal on chromosome 14 replicates the pleiotropic quantitative trait locus encompassing PLAG1 that affects male fertility in cattle and stature in several species. Based on intensive literature mining, SP4, MAGEL2, SH3RF2, PDE5A and SNAI2 are proposed as novel candidate genes for SC, as they affect growth and testicular size in other animal models. These findings contribute to linking reproductive phenotypes to gene functions, and may offer new insights on the molecular biology of male fertility. PMID:24558400

Genome-wide association studies (GWAS) have been developed as a practical method to identify genetic loci associated with disease by scanning multiple markers across the genome. Significant advances in the genetics of complex diseases have been made owing to advances in genotyping technologies, the progress of projects such as HapMap and 1000G and the emergence of genetics as a collaborative discipline. Because of its great potential to be used in parallel by multiple collaborators, it is important to adhere to strict protocols assuring data quality and analyses. Quality control analyses must be applied to each sample and each single-nucleotide polymorphism (SNP). The software package PLINK is capable of performing the whole range of necessary quality control tests. Genotype imputation has also been developed to substantially increase the power of GWAS methodology. Imputation permits the investigation of associations at genetic markers that are not directly genotyped. Results of individual GWAS reports can be combined through meta-analysis. Finally, next-generation sequencing (NGS) has gained popularity in recent years through its capacity to analyse a much greater number of markers across the genome. Although NGS platforms are capable of examining a higher number of SNPs compared with GWA studies, the results obtained by NGS require careful interpretation, as their biological correlation is incompletely understood. In this article, we will discuss the basic features of such protocols. PMID:25709812

Laboratory red blood cell (RBC) measurements are clinically important, heritable and differ among ethnic groups. To identify genetic variants that contribute to RBC phenotypes in African Americans (AAs), we conducted a genome-wide association study in up to ∼16 500 AAs. The alpha-globin locus on chromosome 16pter [lead SNP rs13335629 in ITFG3 gene; P < 1E−13 for hemoglobin (Hgb), RBC count, mean corpuscular volume (MCV), MCH and MCHC] and the G6PD locus on Xq28 [lead SNP rs1050828; P < 1E − 13 for Hgb, hematocrit (Hct), MCV, RBC count and red cell distribution width (RDW)] were each associated with multiple RBC traits. At the alpha-globin region, both the common African 3.7 kb deletion and common single nucleotide polymorphisms (SNPs) appear to contribute independently to RBC phenotypes among AAs. In the 2p21 region, we identified a novel variant of PRKCE distinctly associated with Hct in AAs. In a genome-wide admixture mapping scan, local European ancestry at the 6p22 region containing HFE and LRRC16A was associated with higher Hgb. LRRC16A has been previously associated with the platelet count and mean platelet volume in AAs, but not with Hgb. Finally, we extended to AAs the findings of association of erythrocyte traits with several loci previously reported in Europeans and/or Asians, including CD164 and HBS1L-MYB. In summary, this large-scale genome-wide analysis in AAs has extended the importance of several RBC-associated genetic loci to AAs and identified allelic heterogeneity and pleiotropy at several previously known genetic loci associated with blood cell traits in AAs. PMID:23446634

Skeletal and cardiac myocytes cease division within weeks of birth. Although skeletal muscle retains limited capacity for regeneration through recruitment of satellite cells, resident populations of adult myocardial stem cells have not been identified. Because cell cycle withdrawal accompanies myocyte differentiation, we hypothesized that C2C12 cells, a mouse myoblast cell line previously used to characterize myocyte differentiation, also would provide a model for studying cell cycle withdrawal during differentiation. C2C12 cells were differentiated in culture medium containing horse serum and harvested at various time points to characterize the expression profiles of known cell cycle and myogenic regulatory factors by immunoblot analysis. BrdU incorporation decreased dramatically in confluent cultures 48 hr after addition of horse serum, as cells started to form myotubes. This finding was preceded by up-regulation of MyoD, followed by myogenin, and activation of Bcl-2. Cyclin D1 was expressed in proliferating cultures and became undetectable in cultures containing 40 percent fused myotubes, as levels of p21(WAF1/Cip1) increased and alpha-actin became detectable. Because C2C12 myoblasts withdraw from the cell cycle during myocyte differentiation following a course that recapitulates this process in vivo, we performed a genome-wide screen to identify other gene products involved in this process. Using microarrays containing approximately 10,000 minimally redundant mouse sequences that map to the UniGene database of the National Center for Biotechnology Information, we compared gene expression profiles between proliferating, differentiating, and differentiated C2C12 cells and verified candidate genes demonstrating differential expression by RT-PCR. Cluster analysis of differentially expressed genes revealed groups of gene products involved in cell cycle withdrawal, muscle differentiation, and apoptosis. In addition, we identified several genes, including DDAH2 and Ly

Background Sleep is a highly conserved behavior, yet its duration and pattern vary extensively among species and between individuals within species. The genetic basis of natural variation in sleep remains unknown. Results We used the Drosophila Genetic Reference Panel (DGRP) to perform a genome-wide association (GWA) study of sleep in D. melanogaster. We identified candidate single nucleotide polymorphisms (SNPs) associated with differences in the mean as well as the environmental sensitivity of sleep traits; these SNPs typically had sex-specific or sex-biased effects, and were generally located in non-coding regions. The majority of SNPs (80.3%) affecting sleep were at low frequency and had moderately large effects. Additive models incorporating multiple SNPs explained as much as 55% of the genetic variance for sleep in males and females. Many of these loci are known to interact physically and/or genetically, enabling us to place them in candidate genetic networks. We confirmed the role of seven novel loci on sleep using insertional mutagenesis and RNA interference. Conclusions We identified many SNPs in novel loci that are potentially associated with natural variation in sleep, as well as SNPs within genes previously known to affect Drosophila sleep. Several of the candidate genes have human homologues that were identified in studies of human sleep, suggesting that genes affecting variation in sleep are conserved across species. Our discovery of genetic variants that influence environmental sensitivity to sleep may have a wider application to all GWA studies, because individuals with highly plastic genotypes will not have consistent phenotypes. PMID:23617951

Dramatic improvements in high throughput sequencing technologies have led to a staggering growth in the number of predicted genes. However, a large fraction of these newly discovered genes do not have a functional assignment. Fortunately, a variety of novel high-throughput genome-wide functional screening technologies provide important clues that shed light on gene function. The integration of heterogeneous data to predict protein function has been shown to improve the accuracy of automated gene annotation systems. In this paper, we propose and evaluate a probabilistic approach for protein function prediction that integrates protein-protein interaction (PPI) data, gene expression data, protein motif information, mutant phenotype data, and protein localization data. First, functional linkage graphs are constructed from PPI data and gene expression data, in which an edge between nodes (proteins) represents evidence for functional similarity. The assumption here is that graph neighbors are more likely to share protein function, compared to proteins that are not neighbors. The functional linkage graph model is then used in concert with protein domain, mutant phenotype and protein localization data to produce a functional prediction. Our method is applied to the functional prediction of Saccharomyces cerevisiae genes, using Gene Ontology (GO) terms as the basis of our annotation. In a cross validation study we show that the integrated model increases recall by 18%, compared to using PPI data alone at the 50% precision. We also show that the integrated predictor is significantly better than each individual predictor. However, the observed improvement vs. PPI depends on both the new source of data and the functional category to be predicted. Surprisingly, in some contexts integration hurts overall prediction accuracy. Lastly, we provide a comprehensive assignment of putative GO terms to 463 proteins that currently have no assigned function. PMID:17396164

Despite elevated incidence and recurrence rates for Primary Spontaneous Pneumothorax (PSP), little is known about its etiology, and the genetics of idiopathic PSP remains unexplored. To identify genetic variants contributing to sporadic PSP risk, we conducted the first PSP genome-wide association study. Two replicate pools of 92 Portuguese PSP cases and of 129 age- and sex-matched controls were allelotyped in triplicate on the Affymetrix Human SNP Array 6.0 arrays. Markers passing quality control were ranked by relative allele score difference between cases and controls (|RASdiff|), by a novel cluster method and by a combined Z-test. 101 single nucleotide polymorphisms (SNPs) were selected using these three approaches for technical validation by individual genotyping in the discovery dataset. 87 out of 94 successfully tested SNPs were nominally associated in the discovery dataset. Replication of the 87 technically validated SNPs was then carried out in an independent replication dataset of 100 Portuguese cases and 425 controls. The intergenic rs4733649 SNP in chromosome 8 (between LINC00824 and LINC00977) was associated with PSP in the discovery (P = 4.07E-03, ORC[95% CI] = 1.88[1.22-2.89]), replication (P = 1.50E-02, ORC[95% CI] = 1.50[1.08-2.09]) and combined datasets (P = 8.61E-05, ORC[95% CI] = 1.65[1.29-2.13]). This study identified for the first time one genetic risk factor for sporadic PSP, but future studies are warranted to further confirm this finding in other populations and uncover its functional role in PSP pathogenesis. PMID:27203581

Despite the well established role of the frontal and posterior perisylvian cortices in many facets of human-cognitive specializations, including language, little is known about the developmental patterning of these regions in the human brain. We performed a genome-wide analysis of human cerebral patterning during midgestation, a critical epoch in cortical regionalization. A total of 345 genes were identified as differentially expressed between superior temporal gyrus (STG) and the remaining cerebral cortex. Gene ontology categories representing transcription factors were enriched in STG, whereas cell-adhesion and extracellular matrix molecules were enriched in the other cortical regions. Quantitative RT-PCR or in situ hybridization was performed to validate differential expression in a subset of 32 genes, most of which were confirmed. LIM domain-binding 1 (LDB1), which we show to be enriched in the STG, is a recently identified interactor of LIM domain only 4 (LMO4), a gene known to be involved in the asymmetric pattering of the perisylvian region in the developing human brain. Protocadherin 17 (PCDH17), a neuronal cell adhesion molecule, was highly enriched in focal regions of the human prefrontal cortex. Contactin associated protein-like 2 (CNTNAP2), in which mutations are known to cause autism, epilepsy, and language delay, showed a remarkable pattern of anterior-enriched cortical expression in human that was not observed in mouse or rat. These data highlight the importance of expression analysis of human brain and the utility of cross-species comparisons of gene expression. Genes identified here provide a foundation for understanding molecular aspects of human-cognitive specializations and the disorders that disrupt them. PMID:17978184

Background Community samples suggest that approximately 1 in 20 children and adults exhibit clinically significant anger, hostility, and aggression. Individuals with dysregulated emotional control have a greater lifetime burden of psychiatric morbidity, severe impairment in role functioning, and premature mortality due to cardiovascular disease. Methods With publically available data secured from dbGaP, we conducted a genome-wide association study of proneness to anger using the Spielberger State-Trait Anger Scale in the Atherosclerosis Risk in Communities (ARIC) study (n = 8,747). Results Subjects were, on average, 54 (range 45–64) years old at baseline enrollment, 47% (n = 4,117) were male, and all were of European descent by self-report. The mean Angry Temperament and Angry Reaction scores were 5.8±1.8 and 7.6±2.2. We observed a nominally significant finding (p = 2.9E-08, λ = 1.027 - corrected pgc = 2.2E-07, λ = 1.0015) on chromosome 6q21 in the gene coding for the non-receptor protein-tyrosine kinase, Fyn. Conclusions Fyn interacts with NDMA receptors and inositol-1,4,5-trisphosphate (IP3)-gated channels to regulate calcium influx and intracellular release in the post-synaptic density. These results suggest that signaling pathways regulating intracellular calcium homeostasis, which are relevant to memory, learning, and neuronal survival, may in part underlie the expression of Angry Temperament. PMID:24489884

Despite elevated incidence and recurrence rates for Primary Spontaneous Pneumothorax (PSP), little is known about its etiology, and the genetics of idiopathic PSP remains unexplored. To identify genetic variants contributing to sporadic PSP risk, we conducted the first PSP genome-wide association study. Two replicate pools of 92 Portuguese PSP cases and of 129 age- and sex-matched controls were allelotyped in triplicate on the Affymetrix Human SNP Array 6.0 arrays. Markers passing quality control were ranked by relative allele score difference between cases and controls (|RASdiff|), by a novel cluster method and by a combined Z-test. 101 single nucleotide polymorphisms (SNPs) were selected using these three approaches for technical validation by individual genotyping in the discovery dataset. 87 out of 94 successfully tested SNPs were nominally associated in the discovery dataset. Replication of the 87 technically validated SNPs was then carried out in an independent replication dataset of 100 Portuguese cases and 425 controls. The intergenic rs4733649 SNP in chromosome 8 (between LINC00824 and LINC00977) was associated with PSP in the discovery (P = 4.07E-03, ORC[95% CI] = 1.88[1.22–2.89]), replication (P = 1.50E-02, ORC[95% CI] = 1.50[1.08–2.09]) and combined datasets (P = 8.61E-05, ORC[95% CI] = 1.65[1.29–2.13]). This study identified for the first time one genetic risk factor for sporadic PSP, but future studies are warranted to further confirm this finding in other populations and uncover its functional role in PSP pathogenesis. PMID:27203581

Motivation: Although GenomeWide Association Studies (GWAS) genotype a very large number of single nucleotide polymorphisms (SNPs), the data are often analyzed one SNP at a time. The low predictive power of single SNPs, coupled with the high significance threshold needed to correct for multiple testing, greatly decreases the power of GWAS. Results: We propose a procedure in which all the SNPs are analyzed in a multiple generalized linear model, and we show its use for extremely high-dimensional datasets. Our method yields P-values for assessing significance of single SNPs or groups of SNPs while controlling for all other SNPs and the family wise error rate (FWER). Thus, our method tests whether or not a SNP carries any additional information about the phenotype beyond that available by all the other SNPs. This rules out spurious correlations between phenotypes and SNPs that can arise from marginal methods because the ‘spuriously correlated’ SNP merely happens to be correlated with the ‘truly causal’ SNP. In addition, the method offers a data driven approach to identifying and refining groups of SNPs that jointly contain informative signals about the phenotype. We demonstrate the value of our method by applying it to the seven diseases analyzed by the Wellcome Trust Case Control Consortium (WTCCC). We show, in particular, that our method is also capable of finding significant SNPs that were not identified in the original WTCCC study, but were replicated in other independent studies. Availability and implementation: Reproducibility of our research is supported by the open-source Bioconductor package hierGWAS. Contact: peter.buehlmann@stat.math.ethz.ch Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153677

Background Short RNAs, and in particular microRNAs, are important regulators of gene expression both within defined regulatory pathways and at the epigenetic scale. We investigated the short RNA (sRNA) population (18-24 nt) of the transcriptome of green leaves from the sequenced Populus trichocarpa using a concatenation strategy in combination with 454 sequencing. Results The most abundant size class of sRNAs were 24 nt. Long Terminal Repeats were particularly associated with 24 nt sRNAs. Additionally, some repetitive elements were associated with 22 nt sRNAs. We identified an sRNA hot-spot on chromosome 19, overlapping a region containing both the proposed sex-determining locus and a major cluster of NBS-LRR genes. A number of phased siRNA loci were identified, a subset of which are predicted to target PPR and NBS-LRR disease resistance genes, classes of genes that have been significantly expanded in Populus. Additional loci enriched for sRNA production were identified and characterised. We identified 15 novel predicted microRNAs (miRNAs), including miRNA*sequences, and identified a novel locus that may encode a dual miRNA or a miRNA and short interfering RNAs (siRNAs). Conclusions The short RNA population of P. trichocarpa is at least as complex as that of Arabidopsis thaliana. We provide a first genome-wide view of short RNA production for P. trichocarpa and identify new, non-conserved miRNAs. PMID:20021695

Mitosis entails global alterations to chromosome structure and nuclear architecture, concomitant with transient silencing of transcription. How cells transmit transcriptional states through mitosis remains incompletely understood. While many nuclear factors dissociate from mitotic chromosomes, the observation that certain nuclear factors and chromatin features remain associated with individual loci during mitosis originated the hypothesis that such mitotically retained molecular signatures could provide transcriptional memory through mitosis. To understand the role of chromatin structure in mitotic memory, we performed the first genome-wide comparison of DNase I sensitivity of chromatin in mitosis and interphase, using a murine erythroblast model. Despite chromosome condensation during mitosis visible by microscopy, the landscape of chromatin accessibility at the macromolecular level is largely unaltered. However, mitotic chromatin accessibility is locally dynamic, with individual loci maintaining none, some, or all of their interphase accessibility. Mitotic reduction in accessibility occurs primarily within narrow, highly DNase hypersensitive sites that frequently coincide with transcription factor binding sites, whereas broader domains of moderate accessibility tend to be more stable. In mitosis, proximal promoters generally maintain their accessibility more strongly, whereas distal regulatory elements tend to lose accessibility. Large domains of DNA hypomethylation mark a subset of promoters that retain accessibility during mitosis and across many cell types in interphase. Erythroid transcription factor GATA1 exerts site-specific changes in interphase accessibility that are most pronounced at distal regulatory elements, but has little influence on mitotic accessibility. We conclude that features of open chromatin are remarkably stable through mitosis, but are modulated at the level of individual genes and regulatory elements. PMID:25373146

To gain insight into the pathogenesis of adrenocortical carcinoma (ACC) and whether there is progression from normal-to-adenoma-to-carcinoma, we performed genome-wide gene expression, gene methylation, microRNA expression and comparative genomic hybridization (CGH) analysis in human adrenocortical tissue (normal, adrenocortical adenomas and ACC) samples. A pairwise comparison of normal, adrenocortical adenomas and ACC gene expression profiles with more than four-fold expression differences and an adjusted P-value < 0.05 revealed no major differences in normal versus adrenocortical adenoma whereas there are 808 and 1085, respectively, dysregulated genes between ACC versus adrenocortical adenoma and ACC versus normal. The majority of the dysregulated genes in ACC were downregulated. By integrating the CGH, gene methylation and expression profiles of potential miRNAs with the gene expression of dysregulated genes, we found that there are higher alterations in ACC versus normal compared to ACC versus adrenocortical adenoma. Importantly, we identified several novel molecular pathways that are associated with dysregulated genes and further experimentally validated that oncostatin m signaling induces caspase 3 dependent apoptosis and suppresses cell proliferation. Finally, we propose that there is higher number of genomic changes from normal-to-adenoma-to-carcinoma and identified oncostatin m signaling as a plausible druggable pathway for therapeutics. PMID:26446994

Endometriosis is a heritable, complex chronic inflammatory disease, for which much of the causal pathogenic mechanism remains unknown. Genome-wide association studies (GWAS) to date have identified 12 single nucleotide polymorphisms at 10 independent genetic loci associated with endometriosis. Most of these were more strongly associated with revised American Fertility Society stage III/IV, rather than stage I/II. The loci are almost all located in intergenic regions that are known to play a role in the regulation of expression of target genes yet to be identified. To identify the target genes and pathways perturbed by the implicated variants, studies are required involving functional genomic annotation of the surrounding chromosomal regions, in terms of transcription factor binding, epigenetic modification (e.g., DNA methylation and histone modification) sites, as well as their correlation with RNA transcription. These studies need to be conducted in tissue types relevant to endometriosis-in particular, endometrium. In addition, to allow biologically and clinically relevant interpretation of molecular profiling data, they need to be combined and correlated with detailed, systematically collected phenotypic information (surgical and clinical). The WERF Endometriosis Phenome and Biobanking Harmonisation Project is a global standardization initiative that has produced consensus data and sample collection protocols for endometriosis research. These now pave the way for collaborative studies integrating phenomic with genomic data, to identify informative subtypes of endometriosis that will enhance understanding of the pathogenic mechanisms of the disease and discovery of novel, targeted treatments. PMID:27513026

A genomewidescan was performed on a total of 2093 Italian Holstein proven bulls genotyped with 50K single nucleotide polymorphisms (SNPs), with the objective of identifying loci associated with fertility related traits and to test their effects on milk production traits. The analysis was carried out using estimated breeding values for the aggregate fertility index and for each trait contributing to the index: angularity, calving interval, non-return rate at 56 days, days to first service, and 305 day first parity lactation. In addition, two production traits not included in the aggregate fertility index were analysed: fat yield and protein yield. Analyses were carried out using all SNPs treated separately, further the most significant marker on BTA14 associated to milk quality located in the DGAT1 region was treated as fixed effect. Genomewide association analysis identified 61 significant SNPs and 75 significant marker-trait associations. Eight additional SNP associations were detected when SNP located near DGAT1 was included as a fixed effect. As there were no obvious common SNPs between the traits analyzed independently in this study, a network analysis was carried out to identify unforeseen relationships that may link production and fertility traits. PMID:24265800

The recent series of large genome-wide association studies in European and Japanese cohorts established that Parkinson disease (PD) has a substantial genetic component. To further investigate the genetic landscape of PD, we performed a genome-widescan in the largest to date Ashkenazi Jewish cohort of 1130 Parkinson patients and 2611 pooled controls. Motivated by the reduced disease allele heterogeneity and a high degree of identical-by-descent (IBD) haplotype sharing in this founder population, we conducted a haplotype association study based on mapping of shared IBD segments. We observed significant haplotype association signals at three previously implicated Parkinson loci: LRRK2 (OR = 12.05, P = 1.23 × 10(-56)), MAPT (OR = 0.62, P = 1.78 × 10(-11)) and GBA (multiple distinct haplotypes, OR > 8.28, P = 1.13 × 10(-11) and OR = 2.50, P = 1.22 × 10(-9)). In addition, we identified a novel association signal on chr2q14.3 coming from a rare haplotype (OR = 22.58, P = 1.21 × 10(-10)) and replicated it in a secondary cohort of 306 Ashkenazi PD cases and 2583 controls. Our results highlight the power of our haplotype association method, particularly useful in studies of founder populations, and reaffirm the benefits of studying complex diseases in Ashkenazi Jewish cohorts. PMID:24842889

We conducted a multi-stage, genome-wide association study of bladder cancer with a primary scan of 591,637 SNPs in 3,532 affected individuals (cases) and 5,120 controls of European descent from five studies followed by a replication strategy, which included 8,382 cases and 48,275 controls from 16 studies. In a combined analysis, we identified three new regions associated with bladder cancer on chromosomes 22q13.1, 19q12 and 2q37.1: rs1014971, (P = 8 × 10⁻¹²) maps to a non-genic region of chromosome 22q13.1, rs8102137 (P = 2 × 10⁻¹¹) on 19q12 maps to CCNE1 and rs11892031 (P = 1 × 10⁻⁷) maps to the UGT1A cluster on 2q37.1. We confirmed four previously identified genome-wide associations on chromosomes 3q28, 4p16.3, 8q24.21 and 8q24.3, validated previous candidate associations for the GSTM1 deletion (P = 4 × 10⁻¹¹) and a tag SNP for NAT2 acetylation status (P = 4 × 10⁻¹¹), and found interactions with smoking in both regions. Our findings on common variants associated with bladder cancer risk should provide new insights into the mechanisms of carcinogenesis. PMID:20972438

The recent series of large genome-wide association studies in European and Japanese cohorts established that Parkinson disease (PD) has a substantial genetic component. To further investigate the genetic landscape of PD, we performed a genome-widescan in the largest to date Ashkenazi Jewish cohort of 1130 Parkinson patients and 2611 pooled controls. Motivated by the reduced disease allele heterogeneity and a high degree of identical-by-descent (IBD) haplotype sharing in this founder population, we conducted a haplotype association study based on mapping of shared IBD segments. We observed significant haplotype association signals at three previously implicated Parkinson loci: LRRK2 (OR = 12.05, P = 1.23 × 10−56), MAPT (OR = 0.62, P = 1.78 × 10−11) and GBA (multiple distinct haplotypes, OR > 8.28, P = 1.13 × 10−11 and OR = 2.50, P = 1.22 × 10−9). In addition, we identified a novel association signal on chr2q14.3 coming from a rare haplotype (OR = 22.58, P = 1.21 × 10−10) and replicated it in a secondary cohort of 306 Ashkenazi PD cases and 2583 controls. Our results highlight the power of our haplotype association method, particularly useful in studies of founder populations, and reaffirm the benefits of studying complex diseases in Ashkenazi Jewish cohorts. PMID:24842889

Motivation: Comparative genomics heavily relies on alignments of large and often complex DNA sequences. From an engineering perspective, the problem here is to provide maximum sensitivity (to find all there is to find), specificity (to only find real homology) and speed (to accommodate the billions of base pairs of vertebrate genomes). Results: Satsuma addresses all three issues through novel strategies: (i) cross-correlation, implemented via fast Fourier transform; (ii) a match scoring scheme that eliminates almost all false hits; and (iii) an asynchronous ‘battleship’-like search that allows for aligning two entire fish genomes (470 and 217 Mb) in 120 CPU hours using 15 processors on a single machine. Availability: Satsuma is part of the Spines software package, implemented in C++ on Linux. The latest version of Spines can be freely downloaded under the LGPL license from http://www.broadinstitute.org/science/programs/genome-biology/spines/ Contact: grabherr@broadinstitute.org PMID:20208069

A genome-wide linkage scan was conducted in a Northern-European multigenerational pedigree with nine of 40 related members affected with concomitant strabismus. Twenty-seven members of the pedigree including all affected individuals were genotyped using a SNP array interrogating > 300,000 common SNPs. We conducted parametric and non-parametric linkage analyses assuming segregation of an autosomal dominant mutation, yet allowing for incomplete penetrance and phenocopies. We detected two chromosome regions with near-suggestive evidence for linkage, respectively on chromosomes 8 and 18. The chromosome 8 linkage implied a penetrance of 0.80 and a rate of phenocopy of 0.11, while the chromosome 18 linkage implied a penetrance of 0.64 and a rate of phenocopy of 0. Our analysis excludes a simple genetic determinism of strabismus in this pedigree. PMID:24376720

Spots of blood are routinely collected from newborn babies onto filter paper called Guthrie cards and used to screen for metabolic and genetic disorders. The archived dried blood spots are an important and precious resource for genomic research. Whole genome amplification of dried blood spot DNA has been used to provide DNA for genome-wide SNP genotyping. Here we describe a 96 well format procedure to extract DNA from a portion of a dried blood spot that provides sufficient unamplified genomic DNA for genome-wide single nucleotide polymorphism (SNP) genotyping. We show that SNP genotyping of the unamplified DNA is more robust than genotyping amplified dried blood spot DNA, is comparable in cost, and can be done with thousands of samples. This procedure can be used for genome-wide association studies and other large-scale genomic analyses that require robust, high-accuracy genotyping of dried blood spot DNA. PMID:23737996

Because of the high cost of commercial genotyping chip technologies, many investigations have used a two-stage design for genome-wide association studies, using part of the sample for an initial discovery of “promising” SNPs at a less stringent significance level and the remainder in a joint analysis of just these SNPs using custom genotyping. Typical cost savings of about 50% are possible with this design to obtain comparable levels of overall type I error and power by using about half the sample for stage I and carrying about 0.1% of SNPs forward to the second stage, the optimal design depending primarily upon the ratio of costs per genotype for stages I and II. However, with the rapidly declining costs of the commercial panels, the generally low observed ORs of current studies, and many studies aiming to test multiple hypotheses and multiple endpoints, many investigators are abandoning the two-stage design in favor of simply genotyping all available subjects using a standard high-density panel. Concern is sometimes raised about the absence of a “replication” panel in this approach, as required by some high-profile journals, but it must be appreciated that the two-stage design is not a discovery/replication design but simply a more efficient design for discovery using a joint analysis of the data from both stages. Once a subset of highly-significant associations has been discovered, a truly independent “exact replication” study is needed in a similar population of the same promising SNPs using similar methods. This can then be followed by (1) “generalizability” studies to assess the full scope of replicated associations across different races, different endpoints, different interactions, etc.; (2) fine-mapping or re-sequencing to try to identify the causal variant; and (3) experimental studies of the biological function of these genes. Multistage sampling designs may be more useful at this stage, say for selecting subsets of subjects for deep re

Variability at microsatellite loci has been used widely to infer the extent of genetic diversity among related plant taxa. However, typically, only the most polymorphic loci in the genome were analyzed that may result in a biased, and generally overestimated picture of genome-wide microsatellite div...

Objective: Although twin and family studies have shown attention-deficit/hyperactivity disorder (ADHD) to be highly heritable, genetic variants influencing the trait at a genome-wide significant level have yet to be identified. Thus additional genome-wide association studies (GWAS) are needed. Method: We used case-control analyses of 896 cases…

Objective: Although twin and family studies have shown attention-deficit/hyperactivity disorder (ADHD) to be highly heritable, genetic variants influencing the trait at a genome-wide significant level have yet to be identified. As prior genome-wide association studies (GWAS) have not yielded significant results, we conducted a meta-analysis of…

A major objective of genomic research in dairy cattle at present is to identify, map, and characterize individual quantitative trait loci (QTL) that affects production traits. A genomescan was conducted in the US Jersey population to identify QTL affecting milk, fat and protein production. Data use...

Sorghum bicolor is a member of grass family which is an attractive model plant for genome study due to interesting genome features like low genome size. In this research, we performed comprehensive investigation of Alternative Splicing and ontology aspects of genes those have undergone these events in sorghum bicolor. We used homology based alignments between gene rich transcripts, represented by tentative consensus (TC) transcript sequences, and genomic scaffolds to deduce the structure of genes and identify alternatively spliced transcripts in sorghum. Using homology mapping of assembled expressed sequence tags with genomics data, we identified 2,137 Alternative Splicing events in S. bicolor. Our study showed that complex events and intron retention are the main types of Alternative Splicing events in S. bicolor and highlights the prevalence of splicing site recognition for definition of introns in this plant. Annotations of the alternatively spliced genes revealed that they represent diverse biological process and molecular functions, suggesting a fundamental role for Alternative Splicing in affecting the development and physiology of S. bicolor. PMID:25049459

CGEMS identifies common inherited genetic variations associated with a number of cancers, including breast and prostate. Data from these genome-wide association studies (GWAS) are available through the Division of Cancer Epidemiology & Genetics website.

As more and more genomic DNAs are sequenced to characterize human genetic variations, the demand for a very fast and accurate method to genomically position these DNA sequences is high. We have developed a new mapping method that does not require sequence alignment. In this method, we first identified DNA fragments of 15 bp in length that are unique in the human genome and then used them to position single nucleotide polymorphism (SNP) sequences. By use of four desktop personal computers with AMD K7 (1 GHz) processors, our new method mapped more than 1.6 million SNP sequences in 20 hr and achieved a very good agreement with mapping results from alignment-based methods. PMID:12097348

Ticks and tick-borne diseases are among the main causes of economic loss in the South African cattle industry through high morbidity and mortality rates. Concerns of the general public regarding chemical residues may tarnish their perceptions of food safety and environmental health when the husbandry of cattle includes frequent use of acaricides to manage ticks. The primary objective of this study was to identify single nucleotide polymorphism (SNP) markers associated with host resistance to ticks in South African Nguni cattle. Tick count data were collected monthly from 586 Nguni cattle reared in four herds under natural grazing conditions over a period of two years. The counts were recorded for six species of ticks attached in eight anatomical locations on the animals and were summed by species and anatomical location. This gave rise to 63 measured phenotypes or traits, with results for 12 of these traits being reported here. Tick count (x) data were transformed using log10(x+1) and the resulting values were examined for normality. DNA was extracted from hair and blood samples and was genotyped using the Illumina BovineSNP50 assay. After quality control (call rate >90%, minor allele frequency >0.02), 40,436 SNPs were retained for analysis. Genetic parameters were estimated and association analysis for tick resistance was carried out using two approaches: a genome-wide association (GWA) analysis using the GenABEL package and a regional heritability mapping (RHM) analysis. The Bonferroni genome-wide (P<0.05) corrected significance threshold was 1.24×10(-6), with 2.47×10(-5) as the suggestive significance threshold (P<0.10) (i.e., one false positive per genomescan) in the GWA analysis. Likelihood ratio test (LRT) thresholds for genome-wide and suggestive significance were 13.5 and 9.15 for the RHM analysis. Six ixodid tick species were identified, with Amblyomma hebraeum (the vector for Heartwater disease) being the dominant species. Heritability estimates (h(2

We compare scanned-mosaicking and blanket illumination schemes for wide-field photoacoustic tomography with potential applications to breast imaging. For each illumination, a locally high-SNR image patch is reconstructed then mosaicked with image patches from other illuminations. Because the beam is not diffused over the entire area, the fluence of the beam can be maximized, therefore maximizing the signal generated. Moreover, the imaging can potentially still be done fast enough within a breath-hold. A Monte Carlo simulation as a function of beam-spot size and depth is performed to quantify this signal gain. We experimentally test both schemes using a 256-element Imasonic ring array on a tissue-mimicking phantom. We were able to verify the simulated signal gain of 2.9x under 0.5 cm of tissue with the experimental data, and measured the signal gain decrease expected when imaging deeper into the tissue. We also measured the effectiveness of averaging the diffused beam versus the scanned-mosaicking approach, and observed that for the same scan times and limited laser power output, scanned-mosaicking was able to produce a higher SNR than the blanket illumination approach. We have shown that this technique will allow wide-area PAT to utilize the maximum SNR available from any system while minimizing the number of acquisitions to reach this SNR.

Abstract Regulatory sites that control gene expression are essential to the proper functioning of cells, and identifying them is critical for modeling regulatory networks. We have developed Magma (Multiple Aligner of Genomic Multiple Alignments), a software tool for multiple species, multiple gene motif discovery. Magma identifies putative regulatory sites that are conserved across multiple species and occur near multiple genes throughout a reference genome. Magma takes as input multiple alignments that can include gaps. It uses efficient clustering methods that make it about 70 times faster than PhyloNet, a previous program for this task, with slightly greater sensitivity. We ran Magma on all non-coding DNA conserved between Caenorhabditis elegans and five additional species, about 70 Mbp in total, in <4 h. We obtained 2,309 motifs with lengths of 6–20 bp, each occurring at least 10 times throughout the genome, which collectively covered about 566 kbp of the genomes, approximately 0.8% of the input. Predicted sites occurred in all types of non-coding sequence but were especially enriched in the promoter regions. Comparisons to several experimental datasets show that Magma motifs correspond to a variety of known regulatory motifs. PMID:22300316

The whole genome sequence of the cucumber cultivar Gy14 was recently sequenced at 15× coverage with the Roche 454 Titanium technology. The microsatellite DNA sequences (simple sequence repeats, SSRs) in the assembled scaffolds were computationally explored and characterized. A total of 112,073 SSRs ...

We have developed a new, unified implementation of the adaptive optics scanning laser ophthalmoscope (AOSLO) incorporating a wide-field line-scanning ophthalmoscope (LSO) and a closed-loop optical retinal tracker. AOSLO raster scans are deflected by the integrated tracking mirrors so that direct AOSLO stabilization is automatic during tracking. The wide-field imager and large-spherical-mirror optical interface design, as well as a large-stroke deformable mirror (DM), enable the AOSLO image field to be corrected at any retinal coordinates of interest in a field of >25 deg. AO performance was assessed by imaging individuals with a range of refractive errors. In most subjects, image contrast was measurable at spatial frequencies close to the diffraction limit. Closed-loop optical (hardware) tracking performance was assessed by comparing sequential image series with and without stabilization. Though usually better than 10 μm rms, or 0.03 deg, tracking does not yet stabilize to single cone precision but significantly improves average image quality and increases the number of frames that can be successfully aligned by software-based post-processing methods. The new optical interface allows the high-resolution imaging field to be placed anywhere within the wide field without requiring the subject to re-fixate, enabling easier retinal navigation and faster, more efficient AOSLO montage capture and stitching. PMID:21045887

Copy-number variations (CNV), loss of heterozygosity (LOH), and uniparental disomy (UPD) are large genomic aberrations leading to many common inherited diseases, cancers, and other complex diseases. An integrated tool to identify these aberrations is essential in understanding diseases and in designing clinical interventions. Previous discovery methods based on whole-genome sequencing (WGS) require very high depth of coverage on the whole genome scale, and are cost-wise inefficient. Another approach, whole exome genome sequencing (WEGS), is limited to discovering variations within exons. Thus, we are lacking efficient methods to detect genomic aberrations on the whole genome scale using next-generation sequencing technology. Here we present a method to identify genome-wide CNV, LOH and UPD for the human genome via selectively sequencing a small portion of genome termed Selected Target Regions (SeTRs). In our experiments, the SeTRs are covered by 99.73%~99.95% with sufficient depth. Our developed bioinformatics pipeline calls genome-wide CNVs with high confidence, revealing 8 credible events of LOH and 3 UPD events larger than 5M from 15 individual samples. We demonstrate that genome-wide CNV, LOH and UPD can be detected using a cost-effective SeTRs sequencing approach, and that LOH and UPD can be identified using just a sample grouping technique, without using a matched sample or familial information. PMID:25919136

Copy-number variations (CNV), loss of heterozygosity (LOH), and uniparental disomy (UPD) are large genomic aberrations leading to many common inherited diseases, cancers, and other complex diseases. An integrated tool to identify these aberrations is essential in understanding diseases and in designing clinical interventions. Previous discovery methods based on whole-genome sequencing (WGS) require very high depth of coverage on the whole genome scale, and are cost-wise inefficient. Another approach, whole exome genome sequencing (WEGS), is limited to discovering variations within exons. Thus, we are lacking efficient methods to detect genomic aberrations on the whole genome scale using next-generation sequencing technology. Here we present a method to identify genome-wide CNV, LOH and UPD for the human genome via selectively sequencing a small portion of genome termed Selected Target Regions (SeTRs). In our experiments, the SeTRs are covered by 99.73%~99.95% with sufficient depth. Our developed bioinformatics pipeline calls genome-wide CNVs with high confidence, revealing 8 credible events of LOH and 3 UPD events larger than 5M from 15 individual samples. We demonstrate that genome-wide CNV, LOH and UPD can be detected using a cost-effective SeTRs sequencing approach, and that LOH and UPD can be identified using just a sample grouping technique, without using a matched sample or familial information. PMID:25919136

The current trend in genome-wide association studies is to identify regions where the true disease-causing genes may lie by evaluating thousands of single-nucleotide polymorphisms (SNPs) across the whole genome. However, many challenges exist in detecting disease-causing genes among the thousands of SNPs. Examples include multicollinearity and multiple testing issues, especially when a large number of correlated SNPs are simultaneously tested. Multicollinearity can often occur when predictor variables in a multiple regression model are highly correlated, and can cause imprecise estimation of association. In this study, we propose a simple stepwise procedure that identifies disease-causing SNPs simultaneously by employing elastic-net regularization, a variable selection method that allows one to address multicollinearity. At Step 1, the single-marker association analysis was conducted to screen SNPs. At Step 2, the multiple-marker association was scanned based on the elastic-net regularization. The proposed approach was applied to the rheumatoid arthritis (RA) case-control data set of Genetic Analysis Workshop 16. While the selected SNPs at the screening step are located mostly on chromosome 6, the elastic-net approach identified putative RA-related SNPs on other chromosomes in an increased proportion. For some of those putative RA-related SNPs, we identified the interactions with sex, a well known factor affecting RA susceptibility. PMID:20018015

We report a genome-wide association (GWA) study of severe malaria in The Gambia. The initial GWA scan included 2,500 children genotyped on the Affymetrix 500K GeneChip, and a replication study included 3,400 children. We used this to examine the performance of GWA methods in Africa. We found considerable population stratification, and also that signals of association at known malaria resistance loci were greatly attenuated owing to weak linkage disequilibrium (LD). To investigate possible solutions to the problem of low LD, we focused on the HbS locus, sequencing this region of the genome in 62 Gambian individuals and then using these data to conduct multipoint imputation in the GWA samples. This increased the signal of association, from P = 4 × 10−7 to P = 4 × 10−14, with the peak of the signal located precisely at the HbS causal variant. Our findings provide proof of principle that fine-resolution multipoint imputation, based on population-specific sequencing data, can substantially boost authentic GWA signals and enable fine mapping of causal variants in African populations. PMID:19465909

We report a genome-wide association (GWA) study of severe malaria in The Gambia. The initial GWA scan included 2,500 children genotyped on the Affymetrix 500K GeneChip, and a replication study included 3,400 children. We used this to examine the performance of GWA methods in Africa. We found considerable population stratification, and also that signals of association at known malaria resistance loci were greatly attenuated owing to weak linkage disequilibrium (LD). To investigate possible solutions to the problem of low LD, we focused on the HbS locus, sequencing this region of the genome in 62 Gambian individuals and then using these data to conduct multipoint imputation in the GWA samples. This increased the signal of association, from P = 4 × 10(-7) to P = 4 × 10(-14), with the peak of the signal located precisely at the HbS causal variant. Our findings provide proof of principle that fine-resolution multipoint imputation, based on population-specific sequencing data, can substantially boost authentic GWA signals and enable fine mapping of causal variants in African populations. PMID:19465909

Asthma is a complex disease determined by the interaction of different genes and environmental factors. The first genetic investigations in asthma were candidate gene association studies and linkage studies. In recent years research has focused on association studies that scan the entire genome without any prior conditioning hypothesis: the so-called genome-wide association studies (GWAS). The first GWAS was published in 2007, and described a new locus associated to asthma in chromosome 17q12-q21, involving the ORMDL3, GSDMB and ZPBP2 genes (a description of the genes named in the manuscript are listed in Table 1). None of these genes would have been selected in a classical genetic association study since it was not known they could be implicated in asthma. To date, a number of GWAS studies in asthma have been made, with the identification of about 1000 candidate genes. Coordination of the different research groups in international consortiums and the application of new technologies such as new generation sequencing will help discover new implicated genes and improve our understanding of the molecular mechanisms underlying the disease. PMID:25433770

DNA microarray technologies have advanced rapidly and had a profound impact on examining gene expression on a genomic scale in research. This review discusses the history and development of microarray and DNA chip devices, and specific microarrays are described along with their methods and applications. In particular, microarrays have detected many novel cancer-related genes by comparing cancer tissues and non-cancerous tissues in oncological research. Recently, new methods have been in development, such as the double-combination array and triple-combination array, which allow more effective analysis of gene expression and epigenetic changes. Analysis of gene expression alterations in precancerous regions compared with normal regions and array analysis in drug-resistance cancer tissues are also successfully performed. Compared with next-generation sequencing, a similar method of genome analysis, several important differences distinguish these techniques and their applications. Development of novel microarray technologies is expected to contribute to further cancer research.

Toothed whales and two groups of bats independently acquired echolocation, the ability to locate and identify objects by reflected sound. Echolocation requires physiologically complex and coordinated vocal, auditory, and neural functions, but the molecular basis of the capacity for echolocation is not well understood. A recent study suggested that convergent amino acid substitutions widespread in the proteins of echolocators underlay the convergent origins of mammalian echolocation. Here, we show that genomic signatures of molecular convergence between echolocating lineages are generally no stronger than those between echolocating and comparable nonecholocating lineages. The same is true for the group of 29 hearing-related proteins claimed to be enriched with molecular convergence. Reexamining the previous selection test reveals several flaws and invalidates the asserted evidence for adaptive convergence. Together, these findings indicate that the reported genomic signatures of convergence largely reflect the background level of sequence convergence unrelated to the origins of echolocation. PMID:25631925

Toothed whales and two groups of bats independently acquired echolocation, the ability to locate and identify objects by reflected sound. Echolocation requires physiologically complex and coordinated vocal, auditory, and neural functions, but the molecular basis of the capacity for echolocation is not well understood. A recent study suggested that convergent amino acid substitutions widespread in the proteins of echolocators underlay the convergent origins of mammalian echolocation. Here, we show that genomic signatures of molecular convergence between echolocating lineages are generally no stronger than those between echolocating and comparable nonecholocating lineages. The same is true for the group of 29 hearing-related proteins claimed to be enriched with molecular convergence. Reexamining the previous selection test reveals several flaws and invalidates the asserted evidence for adaptive convergence. Together, these findings indicate that the reported genomic signatures of convergence largely reflect the background level of sequence convergence unrelated to the origins of echolocation. PMID:25631925

The complex correlation structure of a collection of orthologous DNA sequences is uniquely captured by the “ancestral recombination graph” (ARG), a complete record of coalescence and recombination events in the history of the sample. However, existing methods for ARG inference are computationally intensive, highly approximate, or limited to small numbers of sequences, and, as a consequence, explicit ARG inference is rarely used in applied population genomics. Here, we introduce a new algorithm for ARG inference that is efficient enough to apply to dozens of complete mammalian genomes. The key idea of our approach is to sample an ARG of chromosomes conditional on an ARG of chromosomes, an operation we call “threading.” Using techniques based on hidden Markov models, we can perform this threading operation exactly, up to the assumptions of the sequentially Markov coalescent and a discretization of time. An extension allows for threading of subtrees instead of individual sequences. Repeated application of these threading operations results in highly efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these methods in a computer program called ARGweaver. Experiments with simulated data indicate that ARGweaver converges rapidly to the posterior distribution over ARGs and is effective in recovering various features of the ARG for dozens of sequences generated under realistic parameters for human populations. In applications of ARGweaver to 54 human genome sequences from Complete Genomics, we find clear signatures of natural selection, including regions of unusually ancient ancestry associated with balancing selection and reductions in allele age in sites under directional selection. The patterns we observe near protein-coding genes are consistent with a primary influence from background selection rather than hitchhiking, although we cannot rule out a contribution from recurrent selective sweeps. PMID:24831947

The complex correlation structure of a collection of orthologous DNA sequences is uniquely captured by the "ancestral recombination graph" (ARG), a complete record of coalescence and recombination events in the history of the sample. However, existing methods for ARG inference are computationally intensive, highly approximate, or limited to small numbers of sequences, and, as a consequence, explicit ARG inference is rarely used in applied population genomics. Here, we introduce a new algorithm for ARG inference that is efficient enough to apply to dozens of complete mammalian genomes. The key idea of our approach is to sample an ARG of [Formula: see text] chromosomes conditional on an ARG of [Formula: see text] chromosomes, an operation we call "threading." Using techniques based on hidden Markov models, we can perform this threading operation exactly, up to the assumptions of the sequentially Markov coalescent and a discretization of time. An extension allows for threading of subtrees instead of individual sequences. Repeated application of these threading operations results in highly efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these methods in a computer program called ARGweaver. Experiments with simulated data indicate that ARGweaver converges rapidly to the posterior distribution over ARGs and is effective in recovering various features of the ARG for dozens of sequences generated under realistic parameters for human populations. In applications of ARGweaver to 54 human genome sequences from Complete Genomics, we find clear signatures of natural selection, including regions of unusually ancient ancestry associated with balancing selection and reductions in allele age in sites under directional selection. The patterns we observe near protein-coding genes are consistent with a primary influence from background selection rather than hitchhiking, although we cannot rule out a contribution from recurrent selective sweeps. PMID:24831947

Background Laribacter hongkongensis is associated with community-acquired gastroenteritis and traveler's diarrhea. In this study, we performed an in-depth annotation of the genes and pathways of the general metabolism of L. hongkongensis and correlated them with its phenotypic characteristics. Results The L. hongkongensis genome possesses the pentose phosphate and gluconeogenesis pathways and tricarboxylic acid and glyoxylate cycles, but incomplete Embden-Meyerhof-Parnas and Entner-Doudoroff pathways, in agreement with its asaccharolytic phenotype. It contains enzymes for biosynthesis and β-oxidation of saturated fatty acids, biosynthesis of all 20 universal amino acids and selenocysteine, the latter not observed in Neisseria gonorrhoeae, Neisseria meningitidis and Chromobacterium violaceum. The genome contains a variety of dehydrogenases, enabling it to utilize different substrates as electron donors. It encodes three terminal cytochrome oxidases for respiration using oxygen as the electron acceptor under aerobic and microaerophilic conditions and four reductases for respiration with alternative electron acceptors under anaerobic conditions. The presence of complete tetrathionate reductase operon may confer survival advantage in mammalian host in association with diarrhea. The genome contains CDSs for incorporating sulfur and nitrogen by sulfate assimilation, ammonia assimilation and nitrate reduction. The existence of both glutamate dehydrogenase and glutamine synthetase/glutamate synthase pathways suggests an importance of ammonia metabolism in the living environments that it may encounter. Conclusions The L. hongkongensis genome possesses a variety of genes and pathways for carbohydrate, amino acid and lipid metabolism, respiratory chain and sulfur and nitrogen metabolism. These allow the bacterium to utilize various substrates for energy production and survive in different environmental niches. PMID:21711917

Leptospira interrogans is the most common cause of leptospirosis in humans and animals. Genetic analysis of L. interrogans has been severely hindered by a lack of tools for genetic manipulation. Recently we developed the mariner-based transposon Himar1 to generate the first defined mutants in L. interrogans. In this study, a total of 929 independent transposon mutants were obtained and the location of insertion determined. Of these mutants, 721 were located in the protein coding regions of 551 different genes. While sequence analysis of transposon insertion sites indicated that transposition occurred in an essentially random fashion in the genome, 25 unique transposon mutants were found to exhibit insertions into genes encoding 16S or 23S rRNAs, suggesting these genes are insertional hot spots in the L. interrogans genome. In contrast, loci containing notionally essential genes involved in lipopolysaccharide and heme biosynthesis showed few transposon insertions. The effect of gene disruption on the virulence of a selected set of defined mutants was investigated using the hamster model of leptospirosis. Two attenuated mutants with disruptions in hypothetical genes were identified, thus validating the use of transposon mutagenesis for the identification of novel virulence factors in L. interrogans. This library provides a valuable resource for the study of gene function in L. interrogans. Combined with the genome sequences of L. interrogans, this provides an opportunity to investigate genes that contribute to pathogenesis and will provide a better understanding of the biology of L. interrogans. PMID:19047402

Bacterial identification on the basis of the highly conserved 16S rRNA (rrs) gene is limited by its presence in multiple copies and a very high level of similarity among them. The need is to look for other genes with unique characteristics to be used as biomarkers. Fifty-one sequenced genomes belonging to 10 different Yersinia species were used for searching genes common to all the genomes. Out of 304 common genes, 34 genes of sizes varying from 0.11 to 4.42 kb, were selected and subjected to in silico digestion with 10 different Restriction endonucleases (RE) (4-6 base cutters). Yersinia species have 6-7 copies of rrs per genome, which are difficult to distinguish by multiple sequence alignments or their RE digestion patterns. However, certain unique combinations of other common gene sequences-carB, fadJ, gluM, gltX, ileS, malE, nusA, ribD, and rlmL and their RE digestion patterns can be used as markers for identifying 21 strains belonging to 10 Yersinia species: Y. aldovae, Y. enterocolitica, Y. frederiksenii, Y. intermedia, Y. kristensenii, Y. pestis, Y. pseudotuberculosis, Y. rohdei, Y. ruckeri, and Y. similis. This approach can be applied for rapid diagnostic applications. PMID:26543261

Genomic imprinting is an epigenetic mechanism by which alleles of some specific genes are expressed in a parent-of-origin manner. It has been observed in mammals and marsupials, but not in birds. Until now, only a few genes orthologous to mammalian imprinted ones have been analyzed in chicken and did not demonstrate any evidence of imprinting in this species. However, several published observations such as imprinted-like QTL in poultry or reciprocal effects keep the question open. Our main objective was thus to screen the entire chicken genome for parental-allele-specific differential expression on whole embryonic transcriptomes, using high-throughput sequencing. To identify the parental origin of each observed haplotype, two chicken experimental populations were used, as inbred and as genetically distant as possible. Two families were produced from two reciprocal crosses. Transcripts from 20 embryos were sequenced using NGS technology, producing ∼200 Gb of sequences. This allowed the detection of 79 potentially imprinted SNPs, through an analysis method that we validated by detecting imprinting from mouse data already published. However, out of 23 candidates tested by pyrosequencing, none could be confirmed. These results come together, without a priori, with previous statements and phylogenetic considerations assessing the absence of genomic imprinting in chicken. PMID:24452801

Genomic imprinting is an epigenetic mechanism by which alleles of some specific genes are expressed in a parent-of-origin manner. It has been observed in mammals and marsupials, but not in birds. Until now, only a few genes orthologous to mammalian imprinted ones have been analyzed in chicken and did not demonstrate any evidence of imprinting in this species. However, several published observations such as imprinted-like QTL in poultry or reciprocal effects keep the question open. Our main objective was thus to screen the entire chicken genome for parental-allele-specific differential expression on whole embryonic transcriptomes, using high-throughput sequencing. To identify the parental origin of each observed haplotype, two chicken experimental populations were used, as inbred and as genetically distant as possible. Two families were produced from two reciprocal crosses. Transcripts from 20 embryos were sequenced using NGS technology, producing ∼200 Gb of sequences. This allowed the detection of 79 potentially imprinted SNPs, through an analysis method that we validated by detecting imprinting from mouse data already published. However, out of 23 candidates tested by pyrosequencing, none could be confirmed. These results come together, without a priori, with previous statements and phylogenetic considerations assessing the absence of genomic imprinting in chicken. PMID:24452801

Studies in model organisms suggest that epistasis may play an important role in the etiology of complex diseases and traits in humans. With the era of large-scale genome-wide association studies fast approaching, it is important to quantify whether it will be possible to detect interacting loci using realistic sample sizes in humans and to what extent undetected epistasis will adversely affect power to detect association when single-locus approaches are employed. We therefore investigated the power to detect association for an extensive range of two-locus quantitative trait models that incorporated varying degrees of epistasis. We compared the power to detect association using a single-locus model that ignored interaction effects, a full two-locus model that allowed for interactions, and, most important, two two-stage strategies whereby a subset of loci initially identified using single-locus tests were analyzed using the full two-locus model. Despite the penalty introduced by multiple testing, fitting the full two-locus model performed better than single-locus tests for many of the situations considered, particularly when compared with attempts to detect both individual loci. Using a two-stage strategy reduced the computational burden associated with performing an exhaustive two-locus search across the genome but was not as powerful as the exhaustive search when loci interacted. Two-stage approaches also increased the risk of missing interacting loci that contributed little effect at the margins. Based on our extensive simulations, our results suggest that an exhaustive search involving all pairwise combinations of markers across the genome might provide a useful complement to single-locus scans in identifying interacting loci that contribute to moderate proportions of the phenotypic variance. PMID:17002500

High-throughput sequencing has been dramatically accelerating the discovery of microsatellite markers (also known as Simple Sequence Repeats). Both 454 and Illumina reads have been used directly in microsatellite discovery and primer design (the “Seq-to-SSR” approach). However, constraints of this approach include: 1) many microsatellite-containing reads do not have sufficient flanking sequences to allow primer design, and 2) difficulties in removing microsatellite loci residing in longer, repetitive regions. In the current study, we applied the novel “Seq-Assembly-SSR” approach to overcome these constraints in Anisogramma anomala. In our approach, Illumina reads were first assembled into a draft genome, and the latter was then used in microsatellite discovery. A. anomala is an obligate biotrophic ascomycete that causes eastern filbert blight disease of commercial European hazelnut. Little is known about its population structure or diversity. Approximately 26 M 146 bp Illumina reads were generated from a paired-end library of a fungal strain from Oregon. The reads were assembled into a draft genome of 333 Mb (excluding gaps), with contig N50 of 10,384 bp and scaffold N50 of 32,987 bp. A bioinformatics pipeline identified 46,677 microsatellite motifs at 44,247 loci, including 2,430 compound loci. Primers were successfully designed for 42,923 loci (97%). After removing 2,886 loci close to assembly gaps and 676 loci in repetitive regions, a genome-wide microsatellite database of 39,361 loci was generated for the fungus. In experimental screening of 236 loci using four geographically representative strains, 228 (96.6%) were successfully amplified and 214 (90.7%) produced single PCR products. Twenty-three (9.7%) were found to be perfect polymorphic loci. A small-scale population study using 11 polymorphic loci revealed considerable gene diversity. Clustering analysis grouped isolates of this fungus into two clades in accordance with their geographic origins

Surface topography is an important geometrical feature of a workpiece that influences its quality and functions such as friction, wearing, lubrication and sealing. Precision measurement of surface topography is fundamental for product quality characterizing and assurance. Stylus scanning technique is a widely used method for surface topography measurement, and it is also regarded as the international standard method for 2-D surface characterizing. Usually surface topography, including primary profile, waviness and roughness, can be measured precisely and efficiently by this method. However, by stylus scanning method to measure curved surface topography, the nonlinear error is unavoidable because of the difference of horizontal position of the actual measured point from given sampling point and the nonlinear transformation process from vertical displacement of the stylus tip to angle displacement of the stylus arm, and the error increases with the increasing of measuring range. In this paper, a wide range stylus scanning measurement system based on cylindrical grating interference principle is constructed, the originations of the nonlinear error are analyzed, the error model is established and a solution to decrease the nonlinear error is proposed, through which the error of the collected data is dynamically compensated.

Age-related changes in DNA methylation have been implicated in cellular senescence and longevity, yet the causes and functional consequences of these variants remain unclear. To elucidate the role of age-related epigenetic changes in healthy ageing and potential longevity, we tested for association between whole-blood DNA methylation patterns in 172 female twins aged 32 to 80 with age and age-related phenotypes. Twin-based DNA methylation levels at 26,690 CpG-sites showed evidence for mean genome-wide heritability of 18%, which was supported by the identification of 1,537 CpG-sites with methylation QTLs in cis at FDR 5%. We performed genome-wide analyses to discover differentially methylated regions (DMRs) for sixteen age-related phenotypes (ap-DMRs) and chronological age (a-DMRs). Epigenome-wide association scans (EWAS) identified age-related phenotype DMRs (ap-DMRs) associated with LDL (STAT5A), lung function (WT1), and maternal longevity (ARL4A, TBX20). In contrast, EWAS for chronological age identified hundreds of predominantly hyper-methylated age DMRs (490 a-DMRs at FDR 5%), of which only one (TBX20) was also associated with an age-related phenotype. Therefore, the majority of age-related changes in DNA methylation are not associated with phenotypic measures of healthy ageing in later life. We replicated a large proportion of a-DMRs in a sample of 44 younger adult MZ twins aged 20 to 61, suggesting that a-DMRs may initiate at an earlier age. We next explored potential genetic and environmental mechanisms underlying a-DMRs and ap-DMRs. Genome-wide overlap across cis-meQTLs, genotype-phenotype associations, and EWAS ap-DMRs identified CpG-sites that had cis-meQTLs with evidence for genotype–phenotype association, where the CpG-site was also an ap-DMR for the same phenotype. Monozygotic twin methylation difference analyses identified one potential environmentally-mediated ap-DMR associated with total cholesterol and LDL (CSMD1). Our results suggest that in a

Nuclear emulsion, a tracking detector with sub-micron position resolution, has played a successful role in the field of particle physics and the analysis speed has been substantially improved by the development of automated scanning systems. This paper describes a newly developed automated scanning system and its application to the analysis of nuclear fragments emitted almost isotropically in nuclear evaporation. This system is able to recognize tracks of nuclear fragments up to |tan θ| < 3.0 (where θ is the track angle with respect to the perpendicular to the emulsion film), while existing systems have an angular acceptance limited to |tan θ| < 0.6. The automatic scanning for such a large angle track in nuclear emulsion is the first trial. Furthermore the track recognition algorithm is performed by a powerful Graphics Processing Unit (GPU) for the first time. This GPU has a sufficient computing power to process large area scanning data with a wide angular acceptance and enough flexibility to allow the tuning of the recognition algorithm. This new system will in particular be applied in the framework of the OPERA experiment: the background in the sample of τ decay candidates due to hadronic interactions will be reduced by a better detection of the emitted nuclear fragments.

The male-to-female sex ratio at birth is constant across world populations with an average of 1.06 (106 male to 100 female live births) for populations of European descent. The sex ratio is considered to be affected by numerous biological and environmental factors and to have a heritable component. The aim of this study was to investigate the presence of common allele modest effects at autosomal and chromosome X variants that could explain the observed sex ratio at birth. We conducted a large-scale genome-wide association scan (GWAS) meta-analysis across 51 studies, comprising overall 114 863 individuals (61 094 women and 53 769 men) of European ancestry and 2 623 828 common (minor allele frequency >0.05) single-nucleotide polymorphisms (SNPs). Allele frequencies were compared between men and women for directly-typed and imputed variants within each study. Forward-time simulations for unlinked, neutral, autosomal, common loci were performed under the demographic model for European populations with a fixed sex ratio and a random mating scheme to assess the probability of detecting significant allele frequency differences. We do not detect any genome-wide significant (P < 5 × 10−8) common SNP differences between men and women in this well-powered meta-analysis. The simulated data provided results entirely consistent with these findings. This large-scale investigation across ∼115 000 individuals shows no detectable contribution from common genetic variants to the observed skew in the sex ratio. The absence of sex-specific differences is useful in guiding genetic association study design, for example when using mixed controls for sex-biased traits. PMID:22843499

Bladder cancer is a complex disease with known environmental and genetic risk factors. We performed a genome-wide interaction study (GWAS) of smoking and bladder cancer risk based on primary scan data from 3002 cases and 4411 controls from the National Cancer Institute Bladder Cancer GWAS. Alternative methods were used to evaluate both additive and multiplicative interactions between individual single nucleotide polymorphisms (SNPs) and smoking exposure. SNPs with interaction P values < 5 × 10− 5 were evaluated further in an independent dataset of 2422 bladder cancer cases and 5751 controls. We identified 10 SNPs that showed association in a consistent manner with the initial dataset and in the combined dataset, providing evidence of interaction with tobacco use. Further, two of these novel SNPs showed strong evidence of association with bladder cancer in tobacco use subgroups that approached genome-wide significance. Specifically, rs1711973 (FOXF2) on 6p25.3 was a susceptibility SNP for never smokers [combined odds ratio (OR) = 1.34, 95% confidence interval (CI) = 1.20–1.50, P value = 5.18 × 10− 7]; and rs12216499 (RSPH3-TAGAP-EZR) on 6q25.3 was a susceptibility SNP for ever smokers (combined OR = 0.75, 95% CI = 0.67–0.84, P value = 6.35 × 10− 7). In our analysis of smoking and bladder cancer, the tests for multiplicative interaction seemed to more commonly identify susceptibility loci with associations in never smokers, whereas the additive interaction analysis identified more loci with associations among smokers—including the known smoking and NAT2 acetylation interaction. Our findings provide additional evidence of gene–environment interactions for tobacco and bladder cancer. PMID:24662972

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species. The project exploits and extends technologies for genome annotation, analysis and dissemination, developed in the context of the vertebrate-focused Ensembl project, and provi...

Aim To identify genetic variants underlying biochemical traits – total cholesterol, low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol, triglycerides, uric acid, albumin, and fibrinogen, in a genome-wide association study in an isolated population where rare variants of larger effect may be more easily identified. Methods The study included 944 adult inhabitants of the island of Korčula, as a part of a larger DNA-based genetic epidemiological study in 2007. Biochemical measurements were performed in a single laboratory with stringent internal and external quality control procedures. Examinees were genotyped using Human Hap370CNV chip by Illumina, with a genome-widescan containing 346 027 single nucleotide polymorphisms (SNP). Results A total of 31 SNPs were associated with 7 investigated traits at the level of P

Low von Willebrand factor (VWF) levels are associated with bleeding symptoms and are a diagnostic criterion for von Willebrand disease, the most common inherited bleeding disorder. To date, it is unclear which genetic loci are associated with reduced VWF levels. Therefore, we conducted a meta-analysis of genome-wide association studies to identify genetic loci associated with low VWF levels. For this meta-analysis, we included 31 149 participants of European ancestry from 11 community-based studies. From all participants, VWF antigen (VWF:Ag) measurements and genome-wide single-nucleotide polymorphism (SNP) scans were available. Each study conducted analyses using logistic regression of SNPs on dichotomized VWF:Ag measures (lowest 5% for blood group O and non-O) with an additive genetic model adjusted for age and sex. An inverse-variance weighted meta-analysis was performed for VWF:Ag levels. A total of 97 SNPs exceeded the genome-wide significance threshold of 5 × 10(-8) and comprised five loci on four different chromosomes: 6q24 (smallest P-value 5.8 × 10(-10)), 9q34 (2.4 × 10(-64)), 12p13 (5.3 × 10(-22)), 12q23 (1.2 × 10(-8)) and 13q13 (2.6 × 10(-8)). All loci were within or close to genes, including STXBP5 (Syntaxin Binding Protein 5) (6q24), STAB5 (stabilin-5) (12q23), ABO (9q34), VWF (12p13) and UFM1 (ubiquitin-fold modifier 1) (13q13). Of these, UFM1 has not been previously associated with VWF:Ag levels. Four genes that were previously associated with VWF levels (VWF, ABO, STXBP5 and STAB2) were also associated with low VWF levels, and, in addition, we identified a new gene, UFM1, that is associated with low VWF levels. These findings point to novel mechanisms for the occurrence of low VWF levels. PMID:26486471

Microelectromechanical (MEMS) mirrors have extended vision capabilities onto small, low-power platforms. However, the field-of-view (FOV) of these MEMS mirrors is usually less than 90° and any increase in the MEMS mirror scanning angle has design and fabrication trade-offs in terms of power, size, speed and stability. Therefore, we need techniques to increase the scanning range while still maintaining a small form factor. In this paper we exploit our recent breakthrough that has enabled the immersion of MEMS mirrors in liquid. While allowing the MEMS to move, the liquid additionally provides a "Snell's window" effect and enables an enlarged FOV (≈ 150°). We present an optimized MEMS mirror design and use it to demonstrate applications in extreme wide-angle structured light. PMID:26907006

Ilmenite (FeTiO3) is a wide bandgap semiconductor with an energy gap of about 2.5eV. Initial radiation studies indicate that ilmenite has properties suited for radiation tolerant applications, as well as a variety of other electronic applications. Two scanning probe microscopy methods have been used to characterize the surface of samples taken from Czochralski grown single crystals. The two methods, atomic force microscopy (AFM) and scanning tunneling microscopy (STM), are based on different physical principles and therefore provide different information about the samples. AFM provides a direct, three-dimensional image of the surface of the samples, while STM give a convolution of topographic and electronic properties of the surface. We will discuss the differences between the methods and present preliminary data of each method for ilmenite samples.

An ultrathin scanning fiber endoscope (SFE) has been developed for high resolution imaging of regions in the body that are commonly inaccessible. The SFE produces 500 line color images at 30 Hz frame rate while maintaining a 1.2-1.7 mm outer diameter. The distal tip of the SFE houses a 9 mm rigid scan engine attached to a highly flexible tether (minimum bend radius < 8 mm) comprised of optical fibers and electrical wires within a protective sheath. Unlike other ultrathin technologies, the unique characteristics of this system have allowed the SFE to navigate narrow passages without sacrificing image quality. To date, the SFE has been used for in vivo imaging of the bile duct, esophagus and peripheral airways. In this study, the standard SFE operation was tailored to capture wide field fluorescence images and spectra. Green (523 nm) and blue (440 nm) lasers were used as illumination sources, while the white balance gain values were adjusted to accentuate red fluorescence signal. To demonstrate wide field fluorescence imaging of small lumens, the SFE was inserted into a phantom model of a human pancreatobiliary tract and navigated to a custom fluorescent target. Both wide field fluorescence and standard color images of the target were captured to demonstrate multimodal imaging.

Background Recent progresses in genotyping technologies allow the generation high-density genetic maps using hundreds of thousands of genetic markers for each DNA sample. The availability of this large amount of genotypic data facilitates the whole genome search for genetic basis of diseases. We need a suitable information management system to efficiently manage the data flow produced by whole genome genotyping and to make it available for further analyses. Results We have developed an information system mainly devoted to the storage and management of SNP genotype data produced by the Illumina platform from the raw outputs of genotyping into a relational database. The relational database can be accessed in order to import any existing data and export user-defined formats compatible with many different genetic analysis programs. After calculating family-based or case-control association study data, the results can be imported in SNPLims. One of the main features is to allow the user to rapidly identify and annotate statistically relevant polymorphisms from the large volume of data analyzed. Results can be easily visualized either graphically or creating ASCII comma separated format output files, which can be used as input to further analyses. Conclusions The proposed infrastructure allows to manage a relatively large amount of genotypes for each sample and an arbitrary number of samples and phenotypes. Moreover, it enables the users to control the quality of the data and to perform the most common screening analyses and identify genes that become “candidate” for the disease under consideration. PMID:18387201

Recent studies have reported that regions of homozygosity (ROH) in the genome are detectable in outbred populations and can be associated with an increased risk of malignancy. To examine whether homozygosity is associated with an increased risk of developing Hodgkin lymphoma (HL) we analysed 589 HL cases and 5,199 controls genotyped for 484,072 tag single nucleotide polymorphisms (SNPs). Across the genome the cumulative distribution of ROH was not significantly different between cases and controls. Seven ROH at 4q22.3, 4q32.2, 7p12.3–14.1, 7p22.2, 10p11.22–23, 19q13.12-2 and 19p13.2 were associated with HL risk at P

Most annexins are calcium-dependent, phospholipid-binding proteins with suggested functions in response to environmental stresses and signaling during plant growth and development. They have previously been identified and characterized in Arabidopsis and rice, and constitute a multigene family in plants. In this study, we performed a comparative analysis of annexin gene families in the sequenced genomes of Viridiplantae ranging from unicellular green algae to multicellular plants, and identified 149 genes. Phylogenetic studies of these deduced annexins classified them into nine different arbitrary groups. The occurrence and distribution of bona fide type II calcium binding sites within the four annexin domains were found to be different in each of these groups. Analysis of chromosomal distribution of annexin genes in rice, Arabidopsis and poplar revealed their localization on various chromosomes with some members also found on duplicated chromosomal segments leading to gene family expansion. Analysis of gene structure suggests sequential or differential loss of introns during the evolution of land plant annexin genes. Intron positions and phases are well conserved in annexin genes from representative genomes ranging from Physcomitrella to higher plants. The occurrence of alternative motifs such as K/R/HGD was found to be overlapping or at the mutated regions of the type II calcium binding sites indicating potential functional divergence in certain plant annexins. This study provides a basis for further functional analysis and characterization of annexin multigene families in the plant lineage. PMID:23133603

Genetic variation allows the malaria parasite Plasmodium falciparum to overcome chemotherapeutic agents, vaccines and vector control strategies and remain a leading cause of global morbidity and mortality. Here we describe an initial survey of genetic variation across the P. falciparum genome. We performed extensive sequencing of 16 geographically diverse parasites and identified 46,937 SNPs, demonstrating rich diversity among P. falciparum parasites (pi = 1.16 x 10(-3)) and strong correlation with gene function. We identified multiple regions with signatures of selective sweeps in drug-resistant parasites, including a previously unidentified 160-kb region with extremely low polymorphism in pyrimethamine-resistant parasites. We further characterized 54 worldwide isolates by genotyping SNPs across 20 genomic regions. These data begin to define population structure among African, Asian and American groups and illustrate the degree of linkage disequilibrium, which extends over relatively short distances in African parasites but over longer distances in Asian parasites. We provide an initial map of genetic diversity in P. falciparum and demonstrate its potential utility in identifying genes subject to recent natural selection and in understanding the population genetics of this parasite. PMID:17159979

Most methods for survival prediction from high-dimensional genomic data combine the Cox proportional hazards model with some technique of dimension reduction, such as partial least squares regression (PLS). Applying PLS to the Cox model is not entirely straightforward, and multiple approaches have been proposed. The method of Park etal. (Bioinformatics 18(Suppl. 1):S120-S127, 2002) uses a reformulation of the Cox likelihood to a Poisson type likelihood, thereby enabling estimation by iteratively reweighted partial least squares for generalized linear models. We propose a modification of the method of Park et al. (2002) such that estimates of the baseline hazard and the gene effects are obtained in separate steps. The resulting method has several advantages over the method of Park et al. (2002) and other existing Cox PLS approaches, as it allows for estimation of survival probabilities for new patients, enables a less memory-demanding estimation procedure, and allows for incorporation of lower-dimensional non-genomic variables like disease grade and tumor thickness. We also propose to combine our Cox PLS method with an initial gene selection step in which genes are ordered by their Cox score and only the highest-ranking k% of the genes are retained, obtaining a so-called supervised partial least squares regression method. In simulations, both the unsupervised and the supervised version outperform other Cox PLS methods. PMID:18188699

Meta-analyses of genome-wide association study data have begun to lead to promising new discoveries for behavioral and psychiatrically relevant phenotypes (e.g., schizophrenia, educational attainment). We outline how this methodology can similarly lead to novel discoveries in genomic studies of substance use disorders, and discuss challenges that will need to be overcome to accomplish this goal. We illustrate our approach with the work of the newly established Substance Use Disorders workgroup of the Psychiatric Genomics Consortium. PMID:27588522

Background: Quantitative genetic data from our group indicates that antisocial behaviour (AB) is strongly heritable when coupled with psychopathic, callous-unemotional (CU) personality traits. We have also demonstrated that the genetic influences for AB and CU overlap considerably. We conducted a genome-wide association scan that capitalises on…

Comprehensive whole-genome structural variation detection is challenging with current approaches. With diploid cells as DNA source and the presence of numerous repetitive elements, short-read DNA sequencing cannot be used to detect structural variation efficiently. In this report, we show that genome mapping with long, fluorescently labeled DNA molecules imaged on nanochannel arrays can be used for whole-genome structural variation detection without sequencing. While whole-genome haplotyping is not achieved, local phasing (across >150-kb regions) is routine, as molecules from the parental chromosomes are examined separately. In one experiment, we generated genome maps from a trio from the 1000 Genomes Project, compared the maps against that derived from the reference human genome, and identified structural variations that are >5 kb in size. We find that these individuals have many more structural variants than those published, including some with the potential of disrupting gene function or regulation. PMID:26510793

The small East African Shorthorn Zebu is the main indigenous cattle across East Africa. A recent genomewide SNPs analysis has revealed their ancient stable African taurine x Asian zebu admixture. Here, we assess the presence of candidate signature of positive selection in their genome, with the aim...

Transcription initiation, essential to gene expression regulation, involves recruitment of basal transcription factors to the core promoter elements (CPEs). The distribution of currently known CPEs across plant genomes is largely unknown. This is the first large scale genome-wide report on the compu...

Next-generation sequencing and related technologies have facilitated the creation of enormous public databases that catalogue genomic variation. These databases have facilitated a variety of approaches to discover new genes that regulate normal biology as well as disease. Genomewide association (...

Background Autism Spectrum Disorders (ASD) are phenotypically heterogeneous, characterized by impairments in the development of communication and social behaviour and the presence of repetitive behaviour and restricted interests. Dissecting the genetic complexity of ASD may require phenotypic data reflecting more detail than is offered by a categorical clinical diagnosis. Such data are available from the Social Responsiveness Scale (SRS) which is a continuous, quantitative measure of social ability giving scores that range from significant impairment to above average ability. Methods We present genome-wide results for 64 multiplex and extended families ranging from two to nine generations. SRS scores were available from 518 genotyped pedigree subjects, including affected and unaffected relatives. Genotypes from the Illumina 6 k single nucleotide polymorphism panel were provided by the Center for Inherited Disease Research. Quantitative and qualitative analyses were done using MCLINK, a software package that uses Markov chain Monte Carlo (MCMC) methods to perform multilocus linkage analysis on large extended pedigrees. Results When analysed as a qualitative trait, linkage occurred in the same locations as in our previous affected-only genomescan of these families, with findings on chromosomes 7q31.1-q32.3 [heterogeneity logarithm of the odds (HLOD) = 2.91], 15q13.3 (HLOD = 3.64), and 13q12.3 (HLOD = 2.23). Additional positive qualitative results were seen on chromosomes 6 and 10 in regions that may be of interest for other neuropsychiatric disorders. When analysed as a quantitative trait, results replicated a peak found in an independent sample using quantitative SRS scores on chromosome 11p15.1-p15.4 (HLOD = 2.77). Additional positive quantitative results were seen on chromosomes 7, 9, and 19. Conclusions The SRS linkage peaks reported here substantially overlap with peaks found in our previous affected-only genomescan of clinical diagnosis. In addition, we

Pairwise whole-genome alignment involves the creation of a homology map, capable of performing a near complete transformation of one genome into another. For multiple genomes this problem is generalized to finding a set of consistent homology maps for converting each genome in the set of aligned genomes into any of the others. The problem can be divided into two principal stages. First, the partitioning of the input genomes into a set of colinear segments, a process which essentially deals with the complex processes of rearrangement. Second, the generation of a base pair level alignment map for each colinear segment. We have developed a new genome-wide segmentation program, Enredo, which produces colinear segments from extant genomes handling rearrangements, including duplications. We have then applied the new alignment program Pecan, which makes the consistency alignment methodology practical at a large scale, to create a new set of genome-wide mammalian alignments. We test both Enredo and Pecan using novel and existing assessment analyses that incorporate both real biological data and simulations, and show that both independently and in combination they outperform existing programs. Alignments from our pipeline are publicly available within the Ensembl genome browser. PMID:18849524

Imputation is an in silico method that can increase the power of association studies by inferring missing genotypes, harmonizing data sets for meta-analyses, and increasing the overall number of markers available for association testing. This unit provides an introductory overview of the imputation method and describes a two-step imputation approach that consists of the phasing of the study genotypes and the imputation of reference panel genotypes into the study haplotypes. Detailed steps for data preparation and quality control illustrate how to run the computationally intensive two-step imputation with the high-density reference panels of the 1000 Genomes Project, which currently integrates more than 39 million variants. Additionally, the influence of reference panel selection, input marker density, and imputation settings on imputation quality are demonstrated with a simulated data set to give insight into crucial points of successful genotype imputation. PMID:23853078

Polycomb Group (PcG) complexes are multiprotein assemblages that bind to chromatin and establish chromatin states leading to epigenetic silencing. PcG proteins regulate homeotic genes in flies and vertebrates but little is known about other PcG targets and the role of the PcG in development, differentiation and disease. We have determined the distribution of the PcG proteins PC, E(Z) and PSC and of histone H3K27 trimethylation in the Drosophila genome. At more than 200 PcG target genes, binding sites for the three PcG proteins colocalize to presumptive Polycomb Response Elements (PREs). In contrast, H3 me3K27 forms broad domains including the entire transcription unit and regulatory regions. PcG targets are highly enriched in genes encoding transcription factors but receptors, signaling proteins, morphogens and regulators representing all major developmental pathways are also included.

Glioblastoma (GBM) is a very aggressive and lethal brain tumor with poor prognosis. Despite new treatment strategies, patients’ median survival is still less than 1 year in most cases. Few studies have focused exclusively on this disease in children and most of our understanding of the disease process and its clinical outcome has come from studies on malignant gliomas in childhood, combining children with the diagnosis of GBM with other pediatric patients harboring high grade malignant tumors other than GBM. In this study we investigated, using array-CGH platforms, children (median age of 9 years) affected by GBM (WHO-grade IV). We identified recurrent Copy Number Alterations demonstrating that different chromosome regions are involved, in various combinations. These observations suggest a condition of strong genomic instability. Since cancer is an acquired disease and inherited factors play a significant role, we compared for the first time the constitutional Copy Number Variations with the Copy Number Alterations found in tumor biopsy. We speculate that genes included in the recurrent 9p21.3 and 16p13.3 deletions and 1q32.1-q44 duplication play a crucial role for tumorigenesis and/or progression. In particular we suggest that the A2BP1 gene (16p13.3) is one possible culprit of the disease. Given the rarity of the disease, the poor quality and quantity of bioptic material and the scarcity of data in the literature, our findings may better elucidate the genomic background of these tumors. The recognition of candidate genes underlying this disease could then improve treatment strategies for this devastating tumor. PMID:24959384

Glioblastoma (GBM) is a very aggressive and lethal brain tumor with poor prognosis. Despite new treatment strategies, patients' median survival is still less than 1 year in most cases. Few studies have focused exclusively on this disease in children and most of our understanding of the disease process and its clinical outcome has come from studies on malignant gliomas in childhood, combining children with the diagnosis of GBM with other pediatric patients harboring high grade malignant tumors other than GBM. In this study we investigated, using array-CGH platforms, children (median age of 9 years) affected by GBM (WHO-grade IV). We identified recurrent Copy Number Alterations demonstrating that different chromosome regions are involved, in various combinations. These observations suggest a condition of strong genomic instability. Since cancer is an acquired disease and inherited factors play a significant role, we compared for the first time the constitutional Copy Number Variations with the Copy Number Alterations found in tumor biopsy. We speculate that genes included in the recurrent 9p21.3 and 16p13.3 deletions and 1q32.1-q44 duplication play a crucial role for tumorigenesis and/or progression. In particular we suggest that the A2BP1 gene (16p13.3) is one possible culprit of the disease. Given the rarity of the disease, the poor quality and quantity of bioptic material and the scarcity of data in the literature, our findings may better elucidate the genomic background of these tumors. The recognition of candidate genes underlying this disease could then improve treatment strategies for this devastating tumor. PMID:24959384

The bladder exstrophy-epispadias complex (BEEC) represents the severe end of the uro-rectal malformation spectrum, and is thought to result from aberrant embryonic morphogenesis of the cloacal membrane and the urorectal septum. The most common form of BEEC is isolated classic bladder exstrophy (CBE). To identify susceptibility loci for CBE, we performed a genome-wide association study (GWAS) of 110 CBE patients and 1,177 controls of European origin. Here, an association was found with a region of approximately 220kb on chromosome 5q11.1. This region harbors the ISL1 (ISL LIM homeobox 1) gene. Multiple markers in this region showed evidence for association with CBE, including 84 markers with genome-wide significance. We then performed a meta-analysis using data from a previous GWAS by our group of 98 CBE patients and 526 controls of European origin. This meta-analysis also implicated the 5q11.1 locus in CBE risk. A total of 138 markers at this locus reached genome-wide significance in the meta-analysis, and the most significant marker (rs9291768) achieved a P value of 2.13 × 10−12. No other locus in the meta-analysis achieved genome-wide significance. We then performed murine expression analyses to follow up this finding. Here, Isl1 expression was detected in the genital region within the critical time frame for human CBE development. Genital regions with Isl1 expression included the peri-cloacal mesenchyme and the urorectal septum. The present study identified the first genome-wide significant locus for CBE at chromosomal region 5q11.1, and provides strong evidence for the hypothesis that ISL1 is the responsible candidate gene in this region. PMID:25763902

The kuruma prawn, Marsupenaeus japonicus, is one of the most cultivated and consumed species of shrimp. However, very few molecular genetic/genomic resources are publically available for it. Thus, the characterization and distribution of simple sequence repeats (SSRs) remains ambiguous and the use of SSR markers in genomic studies and marker-assisted selection is limited. The goal of this study is to characterize and develop genome-wide SSR markers in M. japonicus by genome survey sequencing for application in comparative genomics and breeding. A total of 326 945 perfect SSRs were identifi ed, among which dinucleotide repeats were the most frequent class (44.08%), followed by mononucleotides (29.67%), trinucleotides (18.96%), tetranucleotides (5.66%), hexanucleotides (1.07%), and pentanucleotides (0.56%). In total, 151 541 SSR loci primers were successfully designed. A subset of 30 SSR primer pairs were synthesized and tested in 42 individuals from a wild population, of which 27 loci (90.0%) were successfully amplifi ed with specifi c products and 24 (80.0%) were polymorphic. For the amplifi ed polymorphic loci, the alleles ranged from 5 to 17 (with an average of 9.63), and the average PIC value was 0.796. A total of 58 256 SSR-containing sequences had signifi cant Gene Ontology annotation; these are good functional molecular marker candidates for association studies and comparative genomic analysis. The newly identifi ed SSRs signifi cantly contribute to the M. japonicus genomic resources and will facilitate a number of genetic and genomic studies, including high density linkage mapping, genome-wide association analysis, marker-aided selection, comparative genomics analysis, population genetics, and evolution.

At airports, security screening can cause long delays. In order to speed up screening a solution to avoid passengers removing their shoes to have them X-ray scanned is required. To detect threats or contraband items hidden within the shoe, a method of screening using frequency swept signals between 15 to 40 GHz has been developed, where the scan is carried out whilst the shoes are being worn. Most footwear is transparent to microwaves to some extent in this band. The scans, data processing and interpretation of the 2D image of the cross section of the shoe are completed in a few seconds. Using safe low power UWB radar, scattered signals from the shoe can be observed which are caused by changes in material properties such as cavities, dielectric or metal objects concealed within the shoe. By moving the transmission horn along the length of the shoe a 2D image corresponding to a cross section through the footwear is built up, which can be interpreted by the user, or automatically, to reveal the presence of concealed threat within the shoe. A prototype system with a resolution of 6 mm or less has been developed and results obtained for a wide range of commonly worn footwear, some modified by the inclusion of concealed material. Clear differences between the measured images of modified and unmodified shoes are seen. Procedures for enhancing the image through electronic image synthesis techniques and image processing methods are discussed and preliminary performance data presented.

Ebola virus (EBOV) is a member of the family Filoviridae and its genome consists of a 19-kb, single-stranded, negative sense RNA. EBOV is subdivided into five distinct species with different pathogenicities, being Zaire ebolavirus (ZEBOV) the most lethal species. The interplay of codon usage among viruses and their hosts is expected to affect overall viral survival, fitness, evasion from host's immune system and evolution. In the present study, we performed comprehensive analyses of codon usage and composition of ZEBOV. Effective number of codons (ENC) indicates that the overall codon usage among ZEBOV strains is slightly biased. Different codon preferences in ZEBOV genes in relation to codon usage of human genes were found. Highly preferred codons are all A-ending triplets, which strongly suggests that mutational bias is a main force shaping codon usage in ZEBOV. Dinucleotide composition also plays a role in the overall pattern of ZEBOV codon usage. ZEBOV does not seem to use the most abundant tRNAs present in the human cells for most of their preferred codons. PMID:25445348

The TCP family is a transcription factor family, members of which are extensively involved in plant growth and development as well as in signal transduction in the response against many physiological and biochemical stimuli. In the present study, 61 TCP genes were identified in tobacco (Nicotiana tabacum) genome. Bioinformatic methods were employed for predicting and analyzing the gene structure, gene expression, phylogenetic analysis, and conserved domains of TCP proteins in tobacco. The 61 NtTCP genes were divided into three diverse groups, based on the division of TCP genes in tomato and Arabidopsis, and the results of the conserved domain and sequence analyses further confirmed the classification of the NtTCP genes. The expression pattern of NtTCP also demonstrated that majority of these genes play important roles in all the tissues, while some special genes exercise their functions only in specific tissues. In brief, the comprehensive and thorough study of the TCP family in other plants provides sufficient resources for studying the structure and functions of TCPs in tobacco. PMID:27323069

Rice growth is severely affected by toxic concentrations of the nonessential heavy metal cadmium (Cd). To elucidate the molecular basis of the response to Cd stress, we performed mRNA sequencing of rice following our previous study on exposure to high concentrations of Cd (Oono et al., 2014). In this study, rice plants were hydroponically treated with low concentrations of Cd and approximately 211 million sequence reads were mapped onto the IRGSP-1.0 reference rice genome sequence. Many genes, including some identified under high Cd concentration exposure in our previous study, were found to be responsive to low Cd exposure, with an average of about 11,000 transcripts from each condition. However, genes expressed constitutively across the developmental course responded only slightly to low Cd concentrations, in contrast to their clear response to high Cd concentration, which causes fatal damage to rice seedlings according to phenotypic changes. The expression of metal ion transporter genes tended to correlate with Cd concentration, suggesting the potential of the RNA-Seq strategy to reveal novel Cd-responsive transporters by analyzing gene expression under different Cd concentrations. This study could help to develop novel strategies for improving tolerance to Cd exposure in rice and other cereal crops. PMID:27034955

DNA methylation plays a key role in regulating eukaryotic gene expression. Although mitotically heritable and stable over time, patterns of DNA methylation frequently change in response to cell differentiation, disease and environmental influences. Several methods have been developed to map DNA methylation on a genomic scale. Here, we benchmark four of these approaches by analyzing two human embryonic stem cell lines derived from genetically unrelated embryos and a matched pair of colon tumor and adjacent normal colon tissue obtained from the same donor. Our analysis reveals that methylated DNA immunoprecipitation sequencing (MeDIP-seq), methylated DNA capture by affinity purification (MethylCap-seq), reduced representation bisulfite sequencing (RRBS) and the Infinium HumanMethylation27 assay all produce accurate DNA methylation data. However, these methods differ in their ability to detect differentially methylated regions between pairs of samples. We highlight strengths and weaknesses of the four methods and give practical recommendations for the design of epigenomic case-control studies. PMID:20852634

OBJECTIVE We performed a whole-genome expression study to clarify the nature of the biological processes mediating between inherited genetic variations and cognitive dysfunction in schizophrenia. METHOD Gene expression was assayed from peripheral blood mononuclear cells using Illumina Human WG6 v3.0 chips in twins discordant for schizophrenia or bipolar disorder and control twins. After quality control, expression levels of 18,559 genes were screened for association with California Verbal Learning Test (CVLT) performance, and any memory-related probes were then evaluated for variation by diagnostic status in the discovery sample (N = 190), and in an independent replication sample (N = 73). Heritability of gene expression using the twin design was also assessed. RESULTS After Bonferroni correction (p < 2.69 × 10−6), CVLT performance was significantly related to expression levels for 76 genes, 43 of which were differentially expressed in schizophrenia patients, with comparable effect sizes in the same direction in the replication sample. For 41 of these 43 transcripts, expression levels were heritable. Nearly all identified genes contain common or de novo mutations associated with schizophrenia in prior studies. CONCLUSION Genes increasing risk for schizophrenia appear to do so in part via effects on signaling cascades influencing memory. The genes implicated in these processes are enriched for those related to RNA processing and DNA replication and include genes influencing G-protein coupled signal transduction, cytokine signaling, and oligodendrocyte function. PMID:26710095

We did whole-transcriptome sequencing and whole-genome sequencing on nine pairs of Hepatocellular carcinoma (HCC) tumors and matched adjacent tissues to identify RNA editing events. We identified mean 26,982 editing sites with mean 89.5% canonical A→G edits in each sample using an improved bioinformatics pipeline. The editing rate was significantly higher in tumors than adjacent normal tissues. Comparing the difference between tumor and normal tissues of each patient, we found 7 non-synonymous tissue specific editing events including 4 tumor-specific edits and 3 normal-specific edits in the coding region, as well as 292 edits varying in editing degree. The significant expression changes of 150 genes associated with RNA editing were found in tumors, with 3 of the 4 most significant genes being cancer related. Our results show that editing might be related to higher gene expression. These findings indicate that RNA editing modification may play an important role in the development of HCC. PMID:25462863

High-accuracy aspherical mirrors and lenses with large dimensions are widely used in large telescopes and other industry fields. However, the measurement methods for large aspherical optical surfaces are not well established. Scanning deflectometry is used for measuring optical signals near flat surfaces with uncertainties on subnanometer scales. A critical issue regarding scanning deflectometry is that high-accuracy autocollimators (AC) have narrow angular measuring ranges and are not suitable for measuring surfaces with large slopes and angular changes. The goal of our study is to measure the profile of large aspherical optical surfaces with an accuracy of approximately 10 nm. We have proposed a new method to measure optical surfaces with large aspherical dimensions and large angular changes by using a scanning deflectometry method. A rotating AC was used to increase the allowable measuring range. Error analysis showed that the rotating AC reduces the accuracy of the measurements. In this study, we developed a new AC with complementary metal-oxide semiconductor (CMOS) as a light-receiving element (CMOS-type AC). The CMOS-type AC can measure wider ranges of angular changes, with a maximum range of 21 500 µrad (4500 arcsec) and a stability (standard deviation) of 0.1 µrad (0.02 arcsec). We conducted an experiment to verify the effectivity of the wide measuring range AC by the measurement of a spherical mirror with a curvature radius of 500 mm. Furthermore, we conducted an experiment to measure an aspherical optical surface (an off-axis parabolic mirror) and found an angular change of 0.07 rad (4 arcdegrees). The repeatability (average standard deviation) for ten measurements of the off-axis parabolic mirror was less than 4 nm.

Successful reverse engineering of mutants that have been obtained by nontargeted strain improvement has long presented a major challenge in yeast biotechnology. This paper reviews the use of genome-wide approaches for analysis of Saccharomyces cerevisiae strains originating from evolutionary engineering or random mutagenesis. On the basis of an evaluation of the strengths and weaknesses of different methods, we conclude that for the initial identification of relevant genetic changes, whole genome sequencing is superior to other analytical techniques, such as transcriptome, metabolome, proteome, or array-based genome analysis. Key advantages of this technique over gene expression analysis include the independency of genome sequences on experimental context and the possibility to directly and precisely reproduce the identified changes in naive strains. The predictive value of genome-wide analysis of strains with industrially relevant characteristics can be further improved by classical genetics or simultaneous analysis of strains derived from parallel, independent strain improvement lineages. PMID:22152095

Despite its importance in cell biology and evolution, the centromere has remained the final frontier in genome assembly and annotation due to its complex repeat structure. However, isolation and characterization of the centromeric repeats from newly sequenced species are necessary for a complete understanding of genome evolution and function. In recent years, various genomes have been sequenced, but the characterization of the corresponding centromeric DNA has lagged behind. Here, we present a computational method (RepeatNet) to systematically identify higher-order repeat structures from unassembled whole-genome shotgun sequence and test whether these sequence elements correspond to functional centromeric sequences. We analyzed genome datasets from six species of mammals representing the diversity of the mammalian lineage, namely, horse, dog, elephant, armadillo, opossum, and platypus. We define candidate monomer satellite repeats and demonstrate centromeric localization for five of the six genomes. Our analysis revealed the greatest diversity of centromeric sequences in horse and dog in contrast to elephant and armadillo, which showed high-centromeric sequence homogeneity. We could not isolate centromeric sequences within the platypus genome, suggesting that centromeres in platypus are not enriched in satellite DNA. Our method can be applied to the characterization of thousands of other vertebrate genomes anticipated for sequencing in the near future, providing an important tool for annotation of centromeres. PMID:21081712

The primary objective of this study was to determine genetic and genomic parameters among swine farrowing traits. Genetic parameters were obtained by using MTDFREML and genomic parameters were obtained using GenSel. Genetic and residual variances obtained from MTDFREML were used as priors for the ...

RNA silencing at the transcriptional and posttranscriptional levels regulates endogenous gene expression, controls invading transposable elements (TEs), and protects the cell against viruses. Key components of the mechanism are small RNAs (sRNAs) of 21–24 nt that guide the silencing machinery to their nucleic acid targets in a nucleotide sequence-specific manner. Transcriptional gene silencing is associated with 24-nt sRNAs and RNA-directed DNA methylation (RdDM) at cytosine residues in three DNA sequence contexts (CG, CHG, and CHH). We previously demonstrated that 24-nt sRNAs are mobile from shoot to root in Arabidopsis thaliana and confirmed that they mediate DNA methylation at three sites in recipient cells. In this study, we extend this finding by demonstrating that RdDM of thousands of loci in root tissues is dependent upon mobile sRNAs from the shoot and that mobile sRNA-dependent DNA methylation occurs predominantly in non-CG contexts. Mobile sRNA-dependent non-CG methylation is largely dependent on the DOMAINS REARRANGED METHYLTRANSFERASES 1/2 (DRM1/DRM2) RdDM pathway but is independent of the CHROMOMETHYLASE (CMT)2/3 DNA methyltransferases. Specific superfamilies of TEs, including those typically found in gene-rich euchromatic regions, lose DNA methylation in a mutant lacking 22- to 24-nt sRNAs (dicer-like 2, 3, 4 triple mutant). Transcriptome analyses identified a small number of genes whose expression in roots is associated with mobile sRNAs and connected to DNA methylation directly or indirectly. Finally, we demonstrate that sRNAs from shoots of one accession move across a graft union and target DNA methylation de novo at normally unmethylated sites in the genomes of root cells from a different accession. PMID:26787884

Brachial circumference (BC), also known as upper arm or mid arm circumference, can be used as an indicator of muscle mass and fat tissue, which are distributed differently in men and women. Analysis of anthropometric measures of peripheral fat distribution such as BC could help in understanding the complex pathophysiology behind overweight and obesity. The purpose of this study is to identify genetic variants associated with BC through a large-scale genome-wide association scan (GWAS) meta-analysis. We used fixed-effects meta-analysis to synthesise summary results across 14 GWAS discovery and 4 replication cohorts comprising overall 22,376 individuals (12,031 women and 10,345 men) of European ancestry. Individual analyses were carried out for men, women, and combined across sexes using linear regression and an additive genetic model: adjusted for age and adjusted for age and BMI. We prioritised signals for follow-up in two-stages. We did not detect any signals reaching genome-wide significance. The FTO rs9939609 SNP showed nominal evidence for association (p<0.05) in the age-adjusted strata for men and across both sexes. In this first GWAS meta-analysis for BC to date, we have not identified any genome-wide significant signals and do not observe robust association of previously established obesity loci with BC. Large-scale collaborations will be necessary to achieve higher power to detect loci underlying BC. PMID:22479309

In the post-GWAS (Genome-Wide Association Scan) era, the interpretation of GWAS results is crucial to screen for highly relevant phenotype-genotype association pairs. Based on the single genotype-phenotype association test and a pathway enrichment analysis, we propose a Metabolite-pathway-based Phenome-Wide Association Scan (M-PheWAS) to analyze the key metabolite-SNP pairs in rice and determine the regulatory relationship by assessing similarities in the changes of enzymes and downstream products in a pathway. Two SNPs, sf0315305925 and sf0315308337, were selected using this approach, and their molecular function and regulatory relationship with Enzyme EC:5.5.1.6 and with flavonoids, a significant downstream regulatory metabolite product, were demonstrated. Moreover, a total of 105 crucial SNPs were screened using M-PheWAS, which may be important for metabolite associations. PMID:26640468

Summary Clear evidence exists for heritability of human longevity, and much interest is focused on identifying genes associated with longer lives. To identify such longevity alleles, we performed the largest genome-wide linkage scan thus far reported. Linkage analyses included 2118 nonagenarian Caucasian sibling pairs that have been enrolled in fifteen study centers of eleven European countries as part of the Genetics of Healthy Ageing (GEHA) project. In the joint linkage analyses we observed four regions that show linkage with longevity; chromosome 14q11.2 (LOD=3.47), chromosome 17q12-q22 (LOD=2.95), chromosome 19p13.3-p13.11 (LOD=3.76) and chromosome 19q13.11-q13.32 (LOD=3.57). To fine map these regions linked to longevity, we performed association analysis using GWAS data in a subgroup of 1,228 unrelated nonagenarian and 1,907 geographically matched controls. Using a fixed effect meta-analysis approach, rs4420638 at the TOMM40/APOE/APOC1 gene locus showed significant association with longevity (p-value=9.6 × 10−8). By combined modeling of linkage and association we showed that association of longevity with APOEε4 and APOEε2 alleles explain the linkage at 19q13.11-q13.32 with p-value=0.02 and p-value=1.0 × 10−5, respectively. In the largest linkage scan thus far performed for human familial longevity, we confirm that the APOE locus is a longevity gene and that additional longevity loci may be identified at 14q11.2, 17q12-q22 and 19p13.3-p13.11. Since the latter linkage results are not explained by common variants, we suggest that rare variants play an important role in human familial longevity. PMID:23286790

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species. The project exploits and extends technologies for genome annotation, analysis and dissemination, developed in the context of the vertebrate-focused Ensembl project, and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. This article provides an update to the previous publications about the resource, with a focus on recent developments. These include the addition of important new genomes (and related data sets) including crop plants, vectors of human disease and eukaryotic pathogens. In addition, the resource has scaled up its representation of bacterial genomes, and now includes the genomes of over 9000 bacteria. Specific extensions to the web and programmatic interfaces have been developed to support users in navigating these large data sets. Looking forward, analytic tools to allow targeted selection of data for visualization and download are likely to become increasingly important in future as the number of available genomes increases within all domains of life, and some of the challenges faced in representing bacterial data are likely to become commonplace for eukaryotes in future. PMID:24163254

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species. The project exploits and extends technologies for genome annotation, analysis and dissemination, developed in the context of the vertebrate-focused Ensembl project, and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. This article provides an update to the previous publications about the resource, with a focus on recent developments. These include the addition of important new genomes (and related data sets) including crop plants, vectors of human disease and eukaryotic pathogens. In addition, the resource has scaled up its representation of bacterial genomes, and now includes the genomes of over 9000 bacteria. Specific extensions to the web and programmatic interfaces have been developed to support users in navigating these large data sets. Looking forward, analytic tools to allow targeted selection of data for visualization and download are likely to become increasingly important in future as the number of available genomes increases within all domains of life, and some of the challenges faced in representing bacterial data are likely to become commonplace for eukaryotes in future. PMID:24163254

Pork quality plays an important role in the meat processing industry, thus different methodologies have been implemented to elucidate the genetic architecture of traits affecting meat quality. One of the most common and widely used approaches is to perform genome-wide association (GWA) studies. Howe...

A genomescan was conducted to identify QTL affecting milk yield in a Brazilian Gyr population of progeny test bulls (N=319). Data used in this study was derived from traditional genetic evaluation records computed by the Embrapa Dairy Cattleand released in May/2009 (http://www.cnpgl.embrapa.br/nova...

Reading and language abilities are heritable traits that are likely to share some genetic influences with each other. To identify pleiotropic genetic variants affecting these traits, we first performed a genome-wide association scan (GWAS) meta-analysis using three richly characterized datasets comprising individuals with histories of reading or language problems, and their siblings. GWAS was performed in a total of 1862 participants using the first principal component computed from several quantitative measures of reading- and language-related abilities, both before and after adjustment for performance IQ. We identified novel suggestive associations at the SNPs rs59197085 and rs5995177 (uncorrected P ≈ 10(-7) for each SNP), located respectively at the CCDC136/FLNC and RBFOX2 genes. Each of these SNPs then showed evidence for effects across multiple reading and language traits in univariate association testing against the individual traits. FLNC encodes a structural protein involved in cytoskeleton remodelling, while RBFOX2 is an important regulator of alternative splicing in neurons. The CCDC136/FLNC locus showed association with a comparable reading/language measure in an independent sample of 6434 participants from the general population, although involving distinct alleles of the associated SNP. Our datasets will form an important part of on-going international efforts to identify genes contributing to reading and language skills. PMID:25065397

Reading and language abilities are heritable traits that are likely to share some genetic influences with each other. To identify pleiotropic genetic variants affecting these traits, we first performed a genome-wide association scan (GWAS) meta-analysis using three richly characterized datasets comprising individuals with histories of reading or language problems, and their siblings. GWAS was performed in a total of 1862 participants using the first principal component computed from several quantitative measures of reading- and language-related abilities, both before and after adjustment for performance IQ. We identified novel suggestive associations at the SNPs rs59197085 and rs5995177 (uncorrected P ≈ 10–7 for each SNP), located respectively at the CCDC136/FLNC and RBFOX2 genes. Each of these SNPs then showed evidence for effects across multiple reading and language traits in univariate association testing against the individual traits. FLNC encodes a structural protein involved in cytoskeleton remodelling, while RBFOX2 is an important regulator of alternative splicing in neurons. The CCDC136/FLNC locus showed association with a comparable reading/language measure in an independent sample of 6434 participants from the general population, although involving distinct alleles of the associated SNP. Our datasets will form an important part of on-going international efforts to identify genes contributing to reading and language skills. PMID:25065397

Colorectal cancer (CRC) is a complex disease with an estimated heritability of approximately 35%. However, known CRC-related common single nucleotide polymorphisms (SNPs) can only explain ~0.65% of the heritability. This “missing heritability” may be explained partially by rare copy number variants (CNVs). In this study, we performed a genome-widescan using Illumina Human-Omni Express BeadChip, 694 sporadic CRC cases and 1641 controls were eventually included in our analysis after quality control. The global burden analysis revealed a 1.53-fold excess of rare CNVs in CRC cases compared with controls (P < 1 × 10−6), and the difference being more pronounced for genic rare CNVs and CNVs overlapped with coding regions (1.65-fold and 1.84-fold, respectively, both P < 1 × 10−6). Interestingly, both the cases in the lowest and middle tertile of age carried a higher burden of rare CNVs comparing to the highest tertile. Furthermore, 639 CNV-disrupted genes exclusive to CRC cases were found to be significantly enriched in gene ontology (GO) terms concerning nucleosome assembly and olfactory receptor activity. Our study was the first to evaluate the burden of rare CNVs in sporadic CRC and suggested that rare CNVs contributed to the missing heritability of CRC. PMID:26315111

The vast amount of recent progress made on the sequence of the human genome has allowed an unprecedented examination of cis-regulatory networks. These networks consist of functional elements such as promoters, enhancers, silencers, and insulators, and their coordinated activity is responsible for regulation of gene expression. Recent studies surveyed the entire genome, identifying novel elements and evaluating functional differences in respect to development. These investigations present the first steps towards a global regulatory map for expression in the human genome. PMID:19560550

Genetic polymorphisms, particularly single nucleotide polymorphisms (SNPs), have been widely used to advance quantitative, functional and evolutionary genomics. Ideally, all genetic variants among individuals should be discovered when next generation sequencing (NGS) technologies and platforms are used for whole genome sequencing or resequencing. In order to improve the cost-effectiveness of the process, however, the research community has mainly focused on developing genome-wide sampling sequencing (GWSS) methods, a collection of reduced genome complexity sequencing, reduced genome representation sequencing and selective genome target sequencing. Here we review the major steps involved in library preparation, the types of adapters used for ligation and the primers designed for amplification of ligated products for sequencing. Unfortunately, currently available GWSS methods have their drawbacks, such as inconsistency in the number of reads per sample library, the number of sites/targets per individual, and the number of reads per site/target, all of which result in missing data. Suggestions are proposed here to improve library construction, genotype calling accuracy, genome-wide marker density and read mapping rate. In brief, optimized GWSS library preparation should generate a unique set of target sites with dense distribution along chromosomes and even coverage per site across all individuals. PMID:26722221

Genetic polymorphisms, particularly single nucleotide polymorphisms (SNPs), have been widely used to advance quantitative, functional and evolutionary genomics. Ideally, all genetic variants among individuals should be discovered when next generation sequencing (NGS) technologies and platforms are used for whole genome sequencing or resequencing. In order to improve the cost-effectiveness of the process, however, the research community has mainly focused on developing genome-wide sampling sequencing (GWSS) methods, a collection of reduced genome complexity sequencing, reduced genome representation sequencing and selective genome target sequencing. Here we review the major steps involved in library preparation, the types of adapters used for ligation and the primers designed for amplification of ligated products for sequencing. Unfortunately, currently available GWSS methods have their drawbacks, such as inconsistency in the number of reads per sample library, the number of sites/targets per individual, and the number of reads per site/target, all of which result in missing data. Suggestions are proposed here to improve library construction, genotype calling accuracy, genome-wide marker density and read mapping rate. In brief, optimized GWSS library preparation should generate a unique set of target sites with dense distribution along chromosomes and even coverage per site across all individuals. PMID:26722221

Aluminum (Al) toxicity is an important limitation to food security in the tropical and subtropical regions. High Al saturation in acid soils limits root development and its ability to uptake water and nutrients. In this study, we present a genomescan for Al tolerance loci with over 50,000 GBS-based...

Processing speed is an important cognitive function that is compromised in psychiatric illness (e.g., schizophrenia, depression) and old age; it shares genetic background with complex cognition (e.g., working memory, reasoning). To find genes influencing speed we performed a genome-wide association scan in up to three cohorts: Brisbane (mean age 16 years; N = 1659); LBC1936 (mean age 70 years, N = 992); LBC1921 (mean age 82 years, N = 307), and; HBCS (mean age 64 years, N = 1080). Meta-analysis of the common measures highlighted various suggestively significant (p < 1.21 × 10−5) SNPs and plausible candidate genes (e.g., TRIB3). A biological pathways analysis of the speed factor identified two common pathways from the KEGG database (cell junction, focal adhesion) in two cohorts, while a pathway analysis linked to the GO database revealed common pathways across pairs of speed measures (e.g., receptor binding, cellular metabolic process). These highlighted genes and pathways will be able to inform future research, including results for psychiatric disease. PMID:21130836

The term epistasis refers to interactions between multiple genetic loci. Genetic epistasis is important in regulating biological function and is considered to explain part of the ‘missing heritability,’ which involves marginal genetic effects that cannot be accounted for in genome-wide association studies. Thus, the study of epistasis is of great interest to geneticists. However, estimating epistatic effects for quantitative traits is challenging due to the large number of interaction effects that must be estimated, thus significantly increasing computing demands. Here, we present a new web server-based tool, the Pipeline for estimating EPIStatic genetic effects (PEPIS), for analyzing polygenic epistatic effects. The PEPIS software package is based on a new linear mixed model that has been used to predict the performance of hybrid rice. The PEPIS includes two main sub-pipelines: the first for kinship matrix calculation, and the second for polygenic component analyses and genomescanning for main and epistatic effects. To accommodate the demand for high-performance computation, the PEPIS utilizes C/C++ for mathematical matrix computing. In addition, the modules for kinship matrix calculations and main and epistatic-effect genomescanning employ parallel computing technology that effectively utilizes multiple computer nodes across our networked cluster, thus significantly improving the computational speed. For example, when analyzing the same immortalized F2 rice population genotypic data examined in a previous study, the PEPIS returned identical results at each analysis step with the original prototype R code, but the computational time was reduced from more than one month to about five minutes. These advances will help overcome the bottleneck frequently encountered in genomewide epistatic genetic effect analysis and enable accommodation of the high computational demand. The PEPIS is publically available at http://bioinfo.noble.org/PolyGenic_QTL/. PMID:27224861

The term epistasis refers to interactions between multiple genetic loci. Genetic epistasis is important in regulating biological function and is considered to explain part of the 'missing heritability,' which involves marginal genetic effects that cannot be accounted for in genome-wide association studies. Thus, the study of epistasis is of great interest to geneticists. However, estimating epistatic effects for quantitative traits is challenging due to the large number of interaction effects that must be estimated, thus significantly increasing computing demands. Here, we present a new web server-based tool, the Pipeline for estimating EPIStatic genetic effects (PEPIS), for analyzing polygenic epistatic effects. The PEPIS software package is based on a new linear mixed model that has been used to predict the performance of hybrid rice. The PEPIS includes two main sub-pipelines: the first for kinship matrix calculation, and the second for polygenic component analyses and genomescanning for main and epistatic effects. To accommodate the demand for high-performance computation, the PEPIS utilizes C/C++ for mathematical matrix computing. In addition, the modules for kinship matrix calculations and main and epistatic-effect genomescanning employ parallel computing technology that effectively utilizes multiple computer nodes across our networked cluster, thus significantly improving the computational speed. For example, when analyzing the same immortalized F2 rice population genotypic data examined in a previous study, the PEPIS returned identical results at each analysis step with the original prototype R code, but the computational time was reduced from more than one month to about five minutes. These advances will help overcome the bottleneck frequently encountered in genomewide epistatic genetic effect analysis and enable accommodation of the high computational demand. The PEPIS is publically available at http://bioinfo.noble.org/PolyGenic_QTL/. PMID:27224861

Background Prolificacy is the most important trait influencing the reproductive efficiency of pig production systems. The low heritability and sex-limited expression of prolificacy have hindered to some extent the improvement of this trait through artificial selection. Moreover, the relative contributions of additive, dominant and epistatic QTL to the genetic variance of pig prolificacy remain to be defined. In this work, we have undertaken this issue by performing one-dimensional and bi-dimensional genomescans for number of piglets born alive (NBA) and total number of piglets born (TNB) in a three generation Iberian by Meishan F2 intercross. Results The one-dimensional genomescan for NBA and TNB revealed the existence of two genome-wide highly significant QTL located on SSC13 (P < 0.001) and SSC17 (P < 0.01) with effects on both traits. This relative paucity of significant results contrasted very strongly with the wide array of highly significant epistatic QTL that emerged in the bi-dimensional genome-widescan analysis. As much as 18 epistatic QTL were found for NBA (four at P < 0.01 and five at P < 0.05) and TNB (three at P < 0.01 and six at P < 0.05), respectively. These epistatic QTL were distributed in multiple genomic regions, which covered 13 of the 18 pig autosomes, and they had small individual effects that ranged between 3 to 4% of the phenotypic variance. Different patterns of interactions (a × a, a × d, d × a and d × d) were found amongst the epistatic QTL pairs identified in the current work. Conclusions The complex inheritance of prolificacy traits in pigs has been evidenced by identifying multiple additive (SSC13 and SSC17), dominant and epistatic QTL in an Iberian × Meishan F2 intercross. Our results demonstrate that a significant fraction of the phenotypic variance of swine prolificacy traits can be attributed to first-order gene-by-gene interactions emphasizing that the phenotypic effects of alleles might be strongly modulated by the

Microsatellites are a ubiquitous component of the eukaryote genome and constitute one of the most popular sources of molecular markers for genetic studies. However, no data are currently available regarding microsatellites across the entire genome in oysters, despite their importance to the aquaculture industry. We present the first genome-wide investigation of microsatellites in the Pacific oyster Crassostrea gigas by analysis of the complete genome, resequencing, and expression data. The Pacific oyster genome is rich in microsatellites. A total of 604 653 repeats were identified, in average of one locus per 815 base pairs (bp). A total of 12 836 genes had coding repeats, and 7 332 were expressed normally, including genes with a wide range of molecular functions. Compared with 20 different species of animals, microsatellites in the oyster genome typically exhibited 1) an intermediate overall frequency; 2) relatively uniform contents of (A)n and (C)n repeats and abundant long (C)n repeats (≥24 bp); 3) large average length of (AG)n repeats; and 4) scarcity of trinucleotide repeats. The microsatellite-flanking regions exhibited a high degree of polymorphism with a heterozygosity rate of around 2.0%, but there was no correlation between heterozygosity and microsatellite abundance. A total of 19 462 polymorphic microsatellites were discovered, and dinucleotide repeats were the most active, with over 26% of loci found to harbor allelic variations. In all, 7 451 loci with high potential for marker development were identified. Better knowledge of the microsatellites in the oyster genome will provide information for the future design of a wide range of molecular markers and contribute to further advancements in the field of oyster genetics, particularly for molecular-based selection and breeding.

We report a novel wide field-of-view (FOV) scanning endoscope, the AnCam, which is based on contact image sensor (CIS) technology used in commercialized business card scanners. The AnCam can capture the whole image of the anal canal within 10 seconds with a resolution of 89 μm, a maximum FOV of 100 mm × 120 mm, and a depth-of-field (DOF) of 0.65 mm at 5.9 line pairs per mm (lp/mm). We demonstrate the performance of the AnCam by imaging the entire anal canal of pigs and tracking the dynamics of acetowhite testing. We believe the AnCam can potentially be a simple and convenient solution for screening of the anal canal for dysplasia and for surveillance in patients following treatment for anal cancer. PMID:25780750

Electronically scanned pressure (ESP) modules have been developed that can operate in ambient and in cryogenic environments, particularly Langley's National Transonic Facility (NTF). Because they can operate directly in a cryogenic environment, their use eliminates many of the operational problems associated with using conventional modules at low temperatures. To ensure the accuracy of these new instruments, calibration was conducted in a laboratory simulating the environmental conditions of NTF. This paper discusses the calibration process by means of the simulation laboratory, the system inputs and outputs and the analysis of the calibration data. Calibration results of module M4, a wide temperature ESP module with 16 ports and a pressure range of +/- 4 psid are given.

We report a novel wide field-of-view (FOV) scanning endoscope, the AnCam, which is based on contact image sensor (CIS) technology used in commercialized business card scanners. The AnCam can capture the whole image of the anal canal within 10 seconds with a resolution of 89 μm, a maximum FOV of 100 mm × 120 mm, and a depth-of-field (DOF) of 0.65 mm at 5.9 line pairs per mm (lp/mm). We demonstrate the performance of the AnCam by imaging the entire anal canal of pigs and tracking the dynamics of acetowhite testing. We believe the AnCam can potentially be a simple and convenient solution for screening of the anal canal for dysplasia and for surveillance in patients following treatment for anal cancer. PMID:25780750

Genomescans have been used in the studies of ecological speciation to find genomic regions ('outlier loci') showing reduced gene flow between divergent populations/species. High-throughput sequencing ('454') offers new opportunities in this field via transcriptome sequencing. Divergent ecotypes of the marine gastropod Littorina saxatilis represent a good example of incipient ecological speciation. We performed a 454-based genomescan between H and M ecotypes of L. saxatilis from the British Isles using cDNA of pooled individuals. Allele frequencies were calculated for 2454 single nucleotide polymorphisms (SNPs), within 572 contigs, and 7% of loci were detected as outliers. Functional annotation of the contigs containing outlier SNPs showed that they included shell matrix and muscle proteins (lithostathine, mucin, titin), proteins involved in energetic metabolism (arginine kinase, NADH dehydrogenase) and reverse transcriptases. Follow-up investigations into these proteins and unannotated outliers will be a promising route in the study of ecological speciation in L. saxatilis. PMID:20695960

In the age of whole-genome population genetics, so-called genomicscan studies often conclude with a long list of putatively selected loci. These lists are then further scrutinized to annotate these regions by gene function, corresponding biological processes, expression levels, or gene networks. Such annotations are often used to assess and/or verify the validity of the genomescan and the statistical methods that have been used to perform the analyses. Furthermore, these results are frequently considered to validate "true-positives" if the identified regions make biological sense a posteriori. Here, we show that this approach can be potentially misleading. By simulating neutral evolutionary histories, we demonstrate that it is possible not only to obtain an extremely high false-positive rate but also to make biological sense out of the false-positives and construct a sensible biological narrative. Results are compared with a recent polymorphism data set from Drosophila melanogaster. PMID:22617950

We present a new technique for calibrating the primary beam of a wide-field, drift-scanning antenna element. Drift-scan observing is not compatible with standard beam calibration routines, and the situation is further complicated by difficult-to-parameterize beam shapes and, at low frequencies, the sparsity of accurate source spectra to use as calibrators. We overcome these challenges by building up an interrelated network of source 'crossing points'-locations where the primary beam is sampled by multiple sources. Using the single assumption that a beam has 180 Degree-Sign rotational symmetry, we can achieve significant beam coverage with only a few tens of sources. The resulting network of crossing points allows us to solve for both a beam model and source flux densities referenced to a single calibrator source, circumventing the need for a large sample of well-characterized calibrators. We illustrate the method with actual and simulated observations from the Precision Array for Probing the Epoch of Reionization.

Background Trypanosoma brucei is a eukaryotic pathogen which causes African trypanosomiasis. It is notable for its variant surface glycoprotein (VSG) coat, which undergoes antigenic variation enabled by a large suite of VSG pseudogenes, allowing for persistent evasion of host adaptive immunity. While Trypanosoma brucei rhodesiense (Tbr) and T. b gambiense (Tbg) are human infective, related T. b. brucei (Tbb) is cleared by human sera. A single gene, the Serum Resistance Associated (SRA) gene, confers Tbr its human infectivity phenotype. Potential genetic recombination of this gene between Tbr and non-human infective Tbb strains has significant epidemiological consequences for Human African Trypanosomiasis outbreaks. Results Using long and short read whole genome sequencing, we generated a hybrid de novo assembly of a Tbr strain, producing 4,210 scaffolds totaling approximately 38.8 megabases, which comprise a significant proportion of the Tbr genome, and thus represents a valuable tool for a comparative genomics analyses among human and non-human infective T. brucei and future complete genome assembly. We detected 5,970 putative genes, of which two, an alcohol oxidoreductase and a pentatricopeptide repeat-containing protein, were members of gene families common to all T. brucei subspecies, but variants specific to the Tbr strain sequenced in this study. Our findings confirmed the extremely high level of genomic similarity between the two parasite subspecies found in other studies. Conclusions We confirm at the whole genome level high similarity between the two Tbb and Tbr strains studied. The discovery of extremely minor genomic differentiation between Tbb and Tbr suggests that the transference of the SRA gene via genetic recombination could potentially result in novel human infective strains, thus all genetic backgrounds of T. brucei should be considered potentially human infective in regions where Tbr is prevalent. PMID:26910229

RNA interference (RNAi) is a process in which double-stranded RNA (dsRNA) molecules mediate the inhibition of gene expression. RNAi in C. elegans can be achieved by simply feeding animals with bacteria expressing dsRNA against the gene of interest. This "feeding" method has made it possible to conduct genome-wide RNAi experiments for the systematic knockdown and subsequent investigation of almost every single gene in the genome. Historically, these genome-scale RNAi screens have been labor and time intensive. However, recent advances in automated, high-throughput methodologies have allowed the development of more rapid and efficient screening protocols. In this report, we describe a fast and efficient, liquid-based method for genome-wide RNAi screening. PMID:27581291

This tutorial is a learning resource that outlines the basic process and provides specific software tools for implementing a complete genome-wide association analysis. Approaches to post-analytic visualization and interrogation of potentially novel findings are also presented. Applications are illustrated using the free and open-source R statistical computing and graphics software environment, Bioconductor software for bioinformatics and the UCSC Genome Browser. Complete genome-wide association data on 1401 individuals across 861,473 typed single nucleotide polymorphisms from the PennCATH study of coronary artery disease are used for illustration. All data and code, as well as additional instructional resources, are publicly available through the Open Resources in Statistical Genomics project: http://www.stat-gen.org. PMID:26343929

To determine which genomic features promote homologous recombination, we created a genome-wide map of gene targeting sites. An adeno-associated virus vector was used to target identical loci introduced as transcriptionally active retroviral vector proviruses. A comparison of ~2,000 targeted and untargeted sites showed that targeting occurred throughout the human genome and was not influenced by the presence of nearby CpG islands, sequence repeats, or DNase I hypersensitive sites. Targeted sites were preferentially found within transcription units, especially when the target loci were transcribed in the opposite orientation to their surrounding chromosomal genes. The impact of DNA replication was determined by mapping replication forks, which revealed a preference for recombination at target loci transcribed towards an incoming fork. Our results constitute the first genome-wide screen of gene targeting in mammalian cells, and they demonstrate a strong recombinogenic effect of colliding polymerases. PMID:25282150

New technologies allow for genome-scale measurement of DNA methylation. In an effort to increase the clinical utility of DNA methylation as a biomarker, we have adapted a commercial bisulfite epigenotyping assay for genome-wide methylation profiling in archival formalin-fixed paraffin-embedded pathology specimens. This chapter takes the reader step by step through a biomarker discovery experiment to identify phenotype-correlated DNA methylation signatures in routine pathology specimens. PMID:22081342

Objective Survival of patients with pancreatic adenocarcinoma is limited and few prognostic factors are known. We conducted a two-stage genome-wide association study (GWAS) to identify germline variants associated with survival in patients with pancreatic adenocarcinoma. Design We analyzed overall survival in relation to single nucleotide polymorphisms (SNPs) among 1,005 patients from two large GWAS datasets, PanScan I and ChinaPC. Cox proportional hazards regression was used in an additive genetic model with adjustment for age, sex, clinical stage and the top four principal components of population stratification. The first stage included 642 cases of European ancestry (PanScan), from which the top SNPs (P≤10−5) were advanced to a joint analysis with 363 additional patients from China (ChinaPC). Results In the first stage of cases of European descent, the top-ranked loci were at chromosomes 11p15.4, 18p11.21, and 1p36.13, tagged by rs12362504 (P=1.63×10−7), rs981621 (P=1.65×10−7), and rs16861827 (P=3.75×10−7), respectively. One-hundred thirty-one SNPs with P ≤ 10−5 were advanced to a joint analysis with cases from the ChinaPC study. In the joint analysis, the top-ranked SNP was rs10500715 (minor allele frequency, 0.37; P=1.72×10−7) on chromosome 11p15.4, which is intronic to the SET binding factor 2 (SBF2) gene. The hazard ratio (95% CI) for death was 0.74 (0.66–0.84) in PanScan I, 0.79 (0.65–0.97) in ChinaPC, and 0.76 (0.68–0.84) in the joint analysis. Conclusion Germline genetic variation in the SBF2 locus was associated with overall survival in patients with pancreatic adenocarcinoma of European and Asian ancestry. This association should be investigated in additional large patient cohorts. PMID:23180869

Quantitative traits in plants are controlled by a large number of genes and their interaction with the environment. To disentangle the genetic architecture of such traits, natural variation within species can be explored by studying genotype-phenotype relationships. Genome-wide association studies that link phenotypes to thousands of single nucleotide polymorphism markers are nowadays common practice for such analyses. In many cases, however, the identified individual loci cannot fully explain the heritability estimates, suggesting missing heritability. We analyzed 349 Arabidopsis accessions and found extensive variation and high heritabilities for different morphological traits. The number of significant genome-wide associations was, however, very low. The application of genomic prediction models that take into account the effects of all individual loci may greatly enhance the elucidation of the genetic architecture of quantitative traits in plants. Here, genomic prediction models revealed different genetic architectures for the morphological traits. Integrating genomic prediction and association mapping enabled the assignment of many plausible candidate genes explaining the observed variation. These genes were analyzed for functional and sequence diversity, and good indications that natural allelic variation in many of these genes contributes to phenotypic variation were obtained. For ACS11, an ethylene biosynthesis gene, haplotype differences explaining variation in the ratio of petiole and leaf length could be identified. PMID:26869705

Members of the genus Curtovirus (family Geminiviridae) are important pathogens of many wild and cultivated plant species. Until recently, relatively few full curtovirus genomes have been characterised. However, with the 19 full genome sequences now available in public databases, we revisit the proposed curtovirus species and strain classification criteria. Using pairwise identities coupled with phylogenetic evidence, revised species and strain demarcation guidelines have been instituted. Specifically, we have established 77 % genome-wide pairwise identity as a species demarcation threshold and 94 % genome-wide pairwise identity as a strain demarcation threshold. Hence, whereas curtovirus sequences with >77 % genome-wide pairwise identity would be classified as belonging to the same species, those sharing >94 % identity would be classified as belonging to the same strain. We provide step-by-step guidelines to facilitate the classification of newly discovered curtovirus full genome sequences and a set of defined criteria for naming new species and strains. The revision yields three curtovirus species: Beet curly top virus (BCTV), Spinach severe surly top virus (SpSCTV) and Horseradish curly top virus (HrCTV). PMID:24463952

Genomic evolution can be highly heterogeneous. Here, we introduce a new framework to simulate genome-wide sequence evolution under a variety of substitution models that may change along the genome and the phylogeny, following complex multispecies coalescent histories that can include recombination, demographics, longitudinal sampling, population subdivision/species history, and migration. A key aspect of our simulation strategy is that the heterogeneity of the whole evolutionary process can be parameterized according to statistical prior distributions specified by the user. We used this framework to carry out a study of the impact of variable codon frequencies across genomic regions on the estimation of the genome-wide nonsynonymous/synonymous ratio. We found that both variable codon frequencies across genes and rate variation among sites and regions can lead to severe underestimation of the global dN/dS values. The program SGWE-Simulation of Genome-Wide Evolution-is freely available from http://code.google.com/p/sgwe-project/, including extensive documentation and detailed examples. PMID:24557445

Detecting gene losses is a novel aspect of evolutionary genomics that has been made feasible by whole-genome sequencing. However, research to date has concentrated on elucidating evolutionary patterns of genomic components shared between species, rather than identifying disparities between genomes. In this study, we searched for gene losses in the lineage leading to eutherian mammals. First, as a pilot analysis, we selected five gene families (Wnt, Fgf, Tbx, TGFβ, and Frizzled) for molecular phylogenetic analyses, and identified mammalian lineage-specific losses of Wnt11b, Tbx6L/VegT/tbx16, Nodal-related, ADMP1, ADMP2, Sizzled, and Crescent. Second, automated genome-wide phylogenetic screening was implemented based on this pilot analysis. As a result, we detected 147 chicken genes without eutherian orthologs, which resulted from 141 gene loss events. Our inventory contained a group of regulatory genes governing early embryonic axis formation, such as Noggins, and multiple members of the opsin and prolactin-releasing hormone receptor ("PRLHR") gene families. Our findings highlight the potential of genome-wide gene phylogeny ("phylome") analysis in detecting possible rearrangement of gene networks and the importance of identifying losses of ancestral genomic components in analyzing the molecular basis underlying phenotypic evolution. PMID:22094861

In mammals the regulation of genomic instability plays a key role in tumor suppression and also controls genome plasticity, which is important for recombination during the processes of immunity and meiosis. Most studies to identify regulators of genomic instability have been performed in cells in culture or in systems that report on gross rearrangements of the genome, yet subtle differences in the level of genomic instability can contribute to whole organism phenotypes such as tumor predisposition. Here we performed a genome-wide association study in a population of 1379 outbred Crl:CFW(SW)-US_P08 mice to dissect the genetic landscape of micronucleus formation, a biomarker of chromosomal breaks, whole chromosome loss, and extranuclear DNA. Variation in micronucleus levels is a complex trait with a genome-wide heritability of 53.1%. We identify seven loci influencing micronucleus formation (false discovery rate <5%), and define candidate genes at each locus. Intriguingly at several loci we find evidence for sexual dimorphism in micronucleus formation, with a locus on chromosome 11 being specific to males. PMID:27233670

In mammals the regulation of genomic instability plays a key role in tumor suppression and also controls genome plasticity, which is important for recombination during the processes of immunity and meiosis. Most studies to identify regulators of genomic instability have been performed in cells in culture or in systems that report on gross rearrangements of the genome, yet subtle differences in the level of genomic instability can contribute to whole organism phenotypes such as tumor predisposition. Here we performed a genome-wide association study in a population of 1379 outbred Crl:CFW(SW)-US_P08 mice to dissect the genetic landscape of micronucleus formation, a biomarker of chromosomal breaks, whole chromosome loss, and extranuclear DNA. Variation in micronucleus levels is a complex trait with a genome-wide heritability of 53.1%. We identify seven loci influencing micronucleus formation (false discovery rate <5%), and define candidate genes at each locus. Intriguingly at several loci we find evidence for sexual dimorphism in micronucleus formation, with a locus on chromosome 11 being specific to males. PMID:27233670

The white-tailed deer (Odocoileus virginianus) represents one of the most successful and widely distributed large mammal species within North America, yet very little nucleotide sequence information is available. We utilized massively parallel pyrosequencing of a reduced representation library (RRL) and a random shotgun library (RSL) to generate a complete mitochondrial genome sequence and identify a large number of putative single nucleotide polymorphisms (SNPs) distributed throughout the white-tailed deer nuclear and mitochondrial genomes. A SNP validation study designed to test specific classes of putative SNPs provides evidence for as many as 10,476 genome-wide SNPs in the current dataset. Based on cytogenetic evidence for homology between cow (Bos taurus) and white-tailed deer chromosomes, we demonstrate that a divergent genome may be used for estimating the relative distribution and density of de novo sequence contigs as well as putative SNPs for species without draft genome assemblies. Our approach demonstrates that bioinformatic tools developed for model or agriculturally important species may be leveraged to support next-generation research programs for species of biological, ecological and evolutionary importance. We also provide a functional annotation analysis for the de novo sequence contigs assembled from white-tailed deer pyrosequencing reads, a mitochondrial phylogeny involving 13,722 nucleotide positions for 10 unique species of Cervidae, and a median joining haplotype network as a putative representation of mitochondrial evolution in O. virginianus. The results of this study are expected to provide a detailed template enabling genome-wide sequence-based studies of threatened, endangered or conservationally important non-model organisms. PMID:21283515

Background Specific genetic contributions for preeclampsia (PE) are currently unknown. This genome-wide association study (GWAS) aims to identify maternal single nucleotide polymorphisms (SNPs) and copy-number variants (CNVs) involved in the etiology of PE. Methods A genome-widescan was performed on 177 PE cases (diagnosed according to National Heart, Lung and Blood Institute guidelines) and 116 normotensive controls. White female study subjects from Iowa were genotyped on Affymetrix SNP 6.0 microarrays. CNV calls made using a combination of four detection algorithms (Birdseye, Canary, PennCNV, and QuantiSNP) were merged using CNVision and screened with stringent prioritization criteria. Due to limited DNA quantities and the deleterious nature of copy-number deletions, it was decided a priori that only deletions would be selected for assay on the entire case-control dataset using quantitative real-time PCR. Results The top four SNP candidates had an allelic or genotypic p-value between 10-5 and 10-6, however, none surpassed the Bonferroni-corrected significance threshold. Three recurrent rare deletions meeting prioritization criteria detected in multiple cases were selected for targeted genotyping. A locus of particular interest was found showing an enrichment of case deletions in 19q13.31 (5/169 cases and 1/114 controls), which encompasses the PSG11 gene contiguous to a highly plastic genomic region. All algorithm calls for these regions were assay confirmed. Conclusions CNVs may confer risk for PE and represent interesting regions that warrant further investigation. Top SNP candidates identified from the GWAS, although not genome-wide significant, may be useful to inform future studies in PE genetics. PMID:22748001

Many human diseases are multifactorial, involving multiple genetic and environmental factors impacting on one or more biological pathways. Much of the environmental effect is believed to be mediated through epigenetic changes. Although many genome-wide genetic and epigenetic association studies have been conducted for different diseases and traits, it is still far from clear to what extent the genomic loci and biological pathways identified in the genetic and epigenetic studies are shared. There is also a lack of statistical tools to assess these important aspects of disease mechanisms. In the present study, we describe a protocol for the integrated analysis of genome-wide genetic and epigenetic data based on permutation of a sum statistic for the combined effects in a locus or pathway. The method was then applied to published type 1 diabetes (T1D) genome-wide- and epigenome-wide-association studies data to identify genomic loci and biological pathways that are associated with T1D genetically and epigenetically. Through combined analysis, novel loci and pathways were also identified, which could add to our understanding of disease mechanisms of T1D as well as complex diseases in general. PMID:24071862

We compare the annotation of three complete genomes using the ab initio methods of gene identification GeneScan and GLIMMER. The annotation given in GenBank, the standard against which these are compared, has been made using GeneMark. We find a number of novel genes which are predicted by both methods used here, as well as a number of genes that are predicted by GeneMark, but are not identified by either of the nonconsensus methods that we have used. The three organisms studied here are all prokaryotic species with fairly compact genomes. The Fourier measure forms the basis for an efficient non-consensus method for gene prediction, and the algorithm GeneScan exploits this measure. We have bench-marked this program as well as GLIMMER using 3 complete prokaryotic genomes. An effort has also been made to study the limitations of these techniques for complete genome analysis. GeneScan and GLIMMER are of comparable accuracy insofar as gene-identification is concerned, with sensitivities and specificities typically greater than 0.9. The number of false predictions (both positive and negative) is higher for GeneScan as compared to GLIMMER, but in a significant number of cases, similar results are provided by the two techniques. This suggests that there could be some as-yet unidentified additional genes in these three genomes, and also that some of the putative identifications made hitherto might require re-evaluation. All these cases are discussed in detail. PMID:11927773

Riemerella anatipestifer is a well-described pathogen of waterfowl and other avian species that can cause septicemic and exudative diseases. In this study, we sequenced the complete genome of R. anatipestifer strain Yb2 and analyzed it against the published genomic sequences of R. anatipestifer strains DSM15868, RA-GD, RA-CH-1, and RA-CH-2. The Yb2 genome contains one circular chromosome of 2,184,066 bp with a 35.73% GC content and no plasmid. The genome has 2,021 open reading frames that occupy 90.88% of the genome. A comparative genomic analysis revealed that genome organization is highly conserved among R. anatipestifer strains, except for four inversions of a sequence segment in Yb2. A phylogenetic analysis found that the closest neighbor of Yb2 is RA-GD. Furthermore, we constructed a library of 3,175 mutants by random transposon mutagenesis, and 100 mutants exhibiting more than 100-fold-attenuated virulence were obtained by animal screening experiments. Southern blot analysis and genetic characterization of the mutants led to the identification of 49 virulence genes. Of these, 25 encode cytoplasmic proteins, 6 encode cytoplasmic membrane proteins, 4 encode outer membrane proteins, and the subcellular localization of the remaining 14 gene products is unknown. The functional classification of orthologous-group clusters revealed that 16 genes are associated with metabolism, 6 are associated with cellular processing and signaling, and 4 are associated with information storage and processing. The functions of the other 23 genes are poorly characterized or unknown. This genome-wide study identified genes important to the virulence of R. anatipestifer. PMID:26002892

Although cis-regulatory binding sites (CRBSs) are at least as important as the coding sequences in a genome, our general understanding of them in most sequenced genomes is very limited due to the lack of efficient and accurate experimental and computational methods for their characterization, which has largely hindered our understanding of many important biological processes. In this article, we describe a novel algorithm for genome-wide de novo prediction of CRBSs with high accuracy. We designed our algorithm to circumvent three identified difficulties for CRBS prediction using comparative genomics principles based on a new method for the selection of reference genomes, a new metric for measuring the similarity of CRBSs, and a new graph clustering procedure. When operon structures are correctly predicted, our algorithm can predict 81% of known individual binding sites belonging to 94% of known cis-regulatory motifs in the Escherichia coli K12 genome, while achieving high prediction specificity. Our algorithm has also achieved similar prediction accuracy in the Bacillus subtilis genome, suggesting that it is very robust, and thus can be applied to any other sequenced prokaryotic genome. When compared with the prior state-of-the-art algorithms, our algorithm outperforms them in both prediction sensitivity and specificity. PMID:19383880

Crop improvement always involves selection of specific alleles at genes controlling traits of agronomic importance, likely resulting in detectable signatures within the genome of modern soybean. The identification of these signatures is meaningful from the perspective of evolutionary biology, and fo...

Whole-genome radiation hybrid mapping has been applied extensively to human and certain animal species but little to plants. We recently demonstrated an alternative mapping approach in cotton (Gossypium hirsutum L.) based on segmentation by 5-krad gamma-irradiation and derivation of wild-cross whol...

Common bean (Phaseolus vulgaris) is the single most important grain legume for human consumption and, due to its ability to fix atmospheric nitrogen via symbioses with soil-borne microorganisms, has a valuable place in sustainable agriculture. We assembled 473 Mb of the common bean genome and geneti...

At TIGR, the human Bacterial Artificial Chromosome (BAC) end sequencing and trimming were with an overall sequencing success rate of 65%. CalTech human BAC libraries A, B, C and D as well as Roswell Park Cancer Institute's library RPCI-11 were used. To date, we have generated >300,000 end sequences from >186,000 human BAC clones with an average read length {approx}460 bp for a total of 141 Mb covering {approx}4.7% of the genome. Over sixty percent of the clones have BAC end sequences (BESs) from both ends representing over five-fold coverage of the genome by the paired-end clones. The average phred Q20 length is {approx}400 bp. This high accuracy makes our BESs match the human finished sequences with an average identity of 99% and a match length of 450 bp, and a frequency of one match per 12.8 kb contig sequence. Our sample tracking has ensured a clone tracking accuracy of >90%, which gives researchers a high confidence in (1) retrieving the right clone from the BA C libraries based on the sequence matches; and (2) building a minimum tiling path of sequence-ready clones across the genome and genome assembly scaffolds.

It has long been speculated that common genetic variation influences the development of B-cell malignancy, however until recently evidence for this assertion was lacking. The advent of genome-wide association studies (GWAS) has allowed the search for this class of susceptibility allele to be conducted on a genome-wide basis. Recent GWAS of chronic lymphocytic leukemia (CLL) and acute lymphoblastic leukemia (ALL) have identified novel disease genes for CLL and ALL and underscore the importance of polymorphic variation in B-cell development genes as determinants of leukemia risk. PMID:21307401

Rapid typing of genetic variation at many regions of the genome is an efficient way to survey variability in natural populations in an effort to identify segments of the genome that have experienced recent natural selection. Following such a genomescan, individual regions may be chosen for further sequencing and a more detailed analysis of patterns of variability, often to perform a parametric test for selection and to estimate the strength of a recent selective sweep. We show here that not accounting for the ascertainment of loci in such analyses leads to false inference of natural selection when the true model is selective neutrality, because the procedure of choosing unusual loci (in comparison to the rest of the genome-scan data) selects regions of the genome with genealogies similar to those expected under models of recent directional selection. We describe a simple and efficient correction for this ascertainment bias, which restores the false-positive rate to near-nominal levels. For the parameters considered here, we find that obtaining a test with the expected distribution of P-values depends on accurately accounting both for ascertainment of regions and for demography. Finally, we use simulations to explore the utility of relying on outlier loci to detect recent selective sweeps. We find that measures of diversity and of population differentiation are more effective than summaries of the site-frequency spectrum and that sequencing larger regions (2.5 kbp) in genome-scan studies leads to more power to detect recent selective sweeps. PMID:17110489

Longevity is an important economic trait in dairy production. Improvements in longevity could increase the average number of lactations per cow, thereby affecting the profitability of the dairy cattle industry. Improved longevity for cows reduces the replacement cost of stock and enables animals to achieve the highest production period. Moreover, longevity is an indirect indicator of animal welfare. Using whole-genome sequencing variants in 3 dairy cattle breeds, we carried out an association study and identified 7 genomic regions in Holstein and 5 regions in Red Dairy Cattle that were associated with longevity. Meta-analyses of 3 breeds revealed 2 significant genomic regions, located on chromosomes 6 (META-CHR6-88MB) and 18 (META-CHR18-58MB). META-CHR6-88MB overlaps with 2 known genes: neuropeptide G-protein coupled receptor (NPFFR2; 89,052,210-89,059,348 bp) and vitamin D-binding protein precursor (GC; 88,695,940-88,739,180 bp). The NPFFR2 gene was previously identified as a candidate gene for mastitis resistance. META-CHR18-58MB overlaps with zinc finger protein 717 (ZNF717; 58,130,465-58,141,877 bp) and zinc finger protein 613 (ZNF613; 58,115,782-58,117,110 bp), which have been associated with calving difficulties. Information on longevity-associated genomic regions could be used to find causal genes/variants influencing longevity and exploited to improve the reliability of genomic prediction. PMID:27289149

Background The ability to conduct genome-wide association studies (GWAS) has enabled new exploration of how genetic variations contribute to health and disease etiology. However, historically GWAS have been limited by inadequate sample size due to associated costs for genotyping and phenotyping of study subjects. This has prompted several academic medical centers to form “biobanks” where biospecimens linked to personal health information, typically in electronic health records (EHRs), are collected and stored on a large number of subjects. This provides tremendous opportunities to discover novel genotype-phenotype associations and foster hypotheses generation. Results In this work, we study how emerging Semantic Web technologies can be applied in conjunction with clinical and genotype data stored at the Mayo Clinic Biobank to mine the phenotype data for genetic associations. In particular, we demonstrate the role of using Resource Description Framework (RDF) for representing EHR diagnoses and procedure data, and enable federated querying via standardized Web protocols to identify subjects genotyped for Type 2 Diabetes and Hypothyroidism to discover gene-disease associations. Our study highlights the potential of Web-scale data federation techniques to execute complex queries. Conclusions This study demonstrates how Semantic Web technologies can be applied in conjunction with clinical data stored in EHRs to accurately identify subjects with specific diseases and phenotypes, and identify genotype-phenotype associations. PMID:23244446

The application of genome-wide expression analysis to a large-scale, multicentered program in critically ill patients poses a number of theoretical and technical challenges. We describe here an analytical and organizational approach to a systematic evaluation of the variance associated with genome-wide expression analysis specifically tailored to study human disease. We analyzed sources of variance in genome-wide expression analyses performed with commercial oligonucleotide arrays. In addition, variance in gene expression in human blood leukocytes caused by repeated sampling in the same subject, among different healthy subjects, among different leukocyte subpopulations, and the effect of traumatic injury, were also explored. We report that analytical variance caused by sample processing was acceptably small. Blood leukocyte gene expression in the same individual over a 24-h period was remarkably constant. In contrast, genome-wide expression varied significantly among different subjects and leukocyte subpopulations. Expectedly, traumatic injury induced dramatic changes in apparent gene expression that were greater in magnitude than the analytical noise and interindividual variance. We demonstrate that the development of a nation-wide program for gene expression analysis with careful attention to analytical details can reduce the variance in the clinical setting to a level where patterns of gene expression are informative among different healthy human subjects, and can be studied with confidence in human disease. PMID:15781863

The p53 ability to elicit stress specific and cell type specific responses is well recognized, but how that specificity is established remains to be defined. Whether upon activation p53 binds to its genomic targets in a cell type and stress type dependent manner is still an open question. Here we show that the p53 binding to the human genome is selective and cell context-dependent. We mapped the genomic binding sites for the endogenous wild type p53 protein in the human cancer cell line HCT116 and compared them to those we previously determined in the normal cell line IMR90. We reportmore » distinct p53 genome-wide binding landscapes in two different cell lines, analyzed under the same treatment and experimental conditions, using the same ChIP-seq approach. This is evidence for cell context dependent p53 genomic binding. The observed differences affect the p53 binding sites distribution with respect to major genomic and epigenomic elements (promoter regions, CpG islands and repeats). We correlated the high-confidence p53 ChIP-seq peaks positions with the annotated human repeats (UCSC Human Genome Browser) and observed both common and cell line specific trends. In HCT116, the p53 binding was specifically enriched at LINE repeats, compared to IMR90 cells. The p53 genome-wide binding patterns in HCT116 and IMR90 likely reflect the different epigenetic landscapes in these two cell lines, resulting from cancer-associated changes (accumulated in HCT116) superimposed on tissue specific differences (HCT116 has epithelial, while IMR90 has mesenchymal origin). In conclusion, our data support the model for p53 binding to the human genome in a highly selective manner, mobilizing distinct sets of genes, contributing to distinct pathways.« less

The p53 ability to elicit stress specific and cell type specific responses is well recognized, but how that specificity is established remains to be defined. Whether upon activation p53 binds to its genomic targets in a cell type and stress type dependent manner is still an open question. Here we show that the p53 binding to the human genome is selective and cell context-dependent. We mapped the genomic binding sites for the endogenous wild type p53 protein in the human cancer cell line HCT116 and compared them to those we previously determined in the normal cell line IMR90. We report distinct p53 genome-wide binding landscapes in two different cell lines, analyzed under the same treatment and experimental conditions, using the same ChIP-seq approach. This is evidence for cell context dependent p53 genomic binding. The observed differences affect the p53 binding sites distribution with respect to major genomic and epigenomic elements (promoter regions, CpG islands and repeats). We correlated the high-confidence p53 ChIP-seq peaks positions with the annotated human repeats (UCSC Human Genome Browser) and observed both common and cell line specific trends. In HCT116, the p53 binding was specifically enriched at LINE repeats, compared to IMR90 cells. The p53 genome-wide binding patterns in HCT116 and IMR90 likely reflect the different epigenetic landscapes in these two cell lines, resulting from cancer-associated changes (accumulated in HCT116) superimposed on tissue specific differences (HCT116 has epithelial, while IMR90 has mesenchymal origin). Our data support the model for p53 binding to the human genome in a highly selective manner, mobilizing distinct sets of genes, contributing to distinct pathways. PMID:25415302

The p53 ability to elicit stress specific and cell type specific responses is well recognized, but how that specificity is established remains to be defined. Whether upon activation p53 binds to its genomic targets in a cell type and stress type dependent manner is still an open question. Here we show that the p53 binding to the human genome is selective and cell context-dependent. We mapped the genomic binding sites for the endogenous wild type p53 protein in the human cancer cell line HCT116 and compared them to those we previously determined in the normal cell line IMR90. We report distinct p53 genome-wide binding landscapes in two different cell lines, analyzed under the same treatment and experimental conditions, using the same ChIP-seq approach. This is evidence for cell context dependent p53 genomic binding. The observed differences affect the p53 binding sites distribution with respect to major genomic and epigenomic elements (promoter regions, CpG islands and repeats). We correlated the high-confidence p53 ChIP-seq peaks positions with the annotated human repeats (UCSC Human Genome Browser) and observed both common and cell line specific trends. In HCT116, the p53 binding was specifically enriched at LINE repeats, compared to IMR90 cells. The p53 genome-wide binding patterns in HCT116 and IMR90 likely reflect the different epigenetic landscapes in these two cell lines, resulting from cancer-associated changes (accumulated in HCT116) superimposed on tissue specific differences (HCT116 has epithelial, while IMR90 has mesenchymal origin). In conclusion, our data support the model for p53 binding to the human genome in a highly selective manner, mobilizing distinct sets of genes, contributing to distinct pathways.

Previously we have conducted a genome-wide search for inflammatory bowel disease susceptibility loci in a large European cohort. Results from this study demonstrated suggestive evidence of linkage to loci at chromosomes 1q, 6p, and 10p and replicated linkages on chromosomes 12 and 16. Recently, NOD2/CARD15 on chromosome 16q12 has been found to be strongly associated with Crohn's disease. In order to determine if there are other loci in the genome that interact with the three associated functional variants in CARD15 (R702W, G908R, 1007fs), we have stratified our large inflammatory bowel disease genomescan cohort by dividing pedigrees into two groups stratified by CARD15 variant genotype. The two pedigree groups were analysed using non-parametric allele sharing methods. The group of pedigrees that contained one of the three CARD15 variants had two suggestive linkage results occurring in 6p (lod = 3.06 at D6S197, IBD phenotype) and 10p (lod=2.29 at D10S197, CD phenotype). In addition, at 16q12 where CARD15 is located, the original genomescan had a peak lod score of 2.18 at D16S415 (CD phenotype). The stratified pedigree cohort containing one of three CARD15 variants had a peak lod score of 0.90 at D16S415 (CD phenotype), accounting for approximately less than half of the genetic evidence for linkage at this locus. This result is in agreement with the existence of a substantial number of private variants at the NOD2/CARD15 locus. Interaction with NOD2/CARD15 needs to be considered in future gene identification efforts on chromosomes 6 and 10. PMID:13680363

SCAN+ is a software application specifically designed to control the positioning of a gamma spectrometer by a two dimensional translation system above spent fuel bundles located in a sealed spent fuel cask. The gamma spectrometer collects gamma spectrum information for the purpose of spent fuel cask fuel loading verification. SCAN+ performs manual and automatic gamma spectrometer positioning functions as-well-as exercising control of the gamma spectrometer data acquisitioning functions. Cask configuration files are used to determine the positions of spent fuel bundles. Cask scanning files are used to determine the desired scan paths for scanning a spent fuel cask allowing for automatic unattended cask scanning that may take several hours.

Objective: For 3,670 stroke patients from the United Kingdom, United States, Australia, Belgium, and Italy, we performed a genome-wide meta-analysis of white matter hyperintensity volumes (WMHV) on data imputed to the 1000 Genomes reference dataset to provide insights into disease mechanisms. Methods: We first sought to identify genetic associations with white matter hyperintensities in a stroke population, and then examined whether genetic loci previously linked to WMHV in community populations are also associated in stroke patients. Having established that genetic associations are shared between the 2 populations, we performed a meta-analysis testing which associations with WMHV in stroke-free populations are associated overall when combined with stroke populations. Results: There were no associations at genome-wide significance with WMHV in stroke patients. All previously reported genome-wide significant associations with WMHV in community populations shared direction of effect in stroke patients. In a meta-analysis of the genome-wide significant and suggestive loci (p < 5 × 10−6) from community populations (15 single nucleotide polymorphisms in total) and from stroke patients, 6 independent loci were associated with WMHV in both populations. Four of these are novel associations at the genome-wide level (rs72934505 [NBEAL1], p = 2.2 × 10−8; rs941898 [EVL], p = 4.0 × 10−8; rs962888 [C1QL1], p = 1.1 × 10−8; rs9515201 [COL4A2], p = 6.9 × 10−9). Conclusions: Genetic associations with WMHV are shared in otherwise healthy individuals and patients with stroke, indicating common genetic susceptibility in cerebral small vessel disease. PMID:26674333

Background Characterization of population structure and genetic diversity of germplasm is essential for the efficient organization and utilization of breeding material. The objectives of this study were to (i) explore the patterns of population structure in the pollen parent heterotic pool using different methods, (ii) investigate the genome-wide distribution of genetic diversity, and (iii) assess the extent and genome-wide distribution of linkage disequilibrium (LD) in elite sugar beet germplasm. Results A total of 264 and 238 inbred lines from the yield type and sugar type inbreds of the pollen parent heterotic gene pools, respectively, which had been genotyped with 328 SNP markers, were used in this study. Two distinct subgroups were detected based on different statistical methods within the elite sugar beet germplasm set, which was in accordance with its breeding history. MCLUST based on principal components, principal coordinates, or lapvectors had high correspondence with the germplasm type information as well as the assignment by STRUCTURE, which indicated that these methods might be alternatives to STRUCTURE for population structure analysis. Gene diversity and modified Roger's distance between the examined germplasm types varied considerably across the genome, which might be due to artificial selection. This observation indicates that population genetic approaches could be used to identify candidate genes for the traits under selection. Due to the fact that r2 >0.8 is required to detect marker-phenotype association explaining less than 1% of the phenotypic variance, our observation of a low proportion of SNP loci pairs showing such levels of LD suggests that the number of markers has to be dramatically increased for powerful genome-wide association mapping. Conclusions We provided a genome-wide distribution map of genetic diversity and linkage disequilibrium for the elite sugar beet germplasm, which is useful for the application of genome-wide association

Background Eukaryotic DNA replication is regulated at the level of large chromosomal domains (0.5–5 megabases in mammals) within which replicons are activated relatively synchronously. These domains replicate in a specific temporal order during S-phase and our genome-wide analyses of replication timing have demonstrated that this temporal order of domain replication is a stable property of specific cell types. Results We have developed ReplicationDomain as a web-based database for analysis of genome-wide replication timing maps (replication profiles) from various cell lines and species. This database also provides comparative information of transcriptional expression and is configured to display any genome-wide property (for instance, ChIP-Chip or ChIP-Seq data) via an interactive web interface. Our published microarray data sets are publicly available. Users may graphically display these data sets for a selected genomic region and download the data displayed as text files, or alternatively, download complete genome-wide data sets. Furthermore, we have implemented a user registration system that allows registered users to upload their own data sets. Upon uploading, registered users may choose to: (1) view their data sets privately without sharing; (2) share with other registered users; or (3) make their published or "in press" data sets publicly available, which can fulfill journal and funding agencies' requirements for data sharing. Conclusion ReplicationDomain is a novel and powerful tool to facilitate the comparative visualization of replication timing in various cell types as well as other genome-wide chromatin features and is considerably faster and more convenient than existing browsers when viewing multi-megabase segments of chromosomes. Furthermore, the data upload function with the option of private viewing or sharing of data sets between registered users should be a valuable resource for the scientific community. PMID:19077204

We present GStream, a method that combines genome-wide SNP and CNV genotyping in the Illumina microarray platform with unprecedented accuracy. This new method outperforms previous well-established SNP genotyping software. More importantly, the CNV calling algorithm of GStream dramatically improves the results obtained by previous state-of-the-art methods and yields an accuracy that is close to that obtained by purely CNV-oriented technologies like Comparative Genomic Hybridization (CGH). We demonstrate the superior performance of GStream using microarray data generated from HapMap samples. Using the reference CNV calls generated by the 1000 Genomes Project (1KGP) and well-known studies on whole genome CNV characterization based either on CGH or genotyping microarray technologies, we show that GStream can increase the number of reliably detected variants up to 25% compared to previously developed methods. Furthermore, the increased genome coverage provided by GStream allows the discovery of CNVs in close linkage disequilibrium with SNPs, previously associated with disease risk in published Genome-Wide Association Studies (GWAS). These results could provide important insights into the biological mechanism underlying the detected disease risk association. With GStream, large-scale GWAS will not only benefit from the combined genotyping of SNPs and CNVs at an unprecedented accuracy, but will also take advantage of the computational efficiency of the method. PMID:23844243

Brassica napus (oilseed rape, canola) is one of the world’s most important sources of vegetable oil for human nutrition and biofuel, and also a model species for studies investigating the evolutionary consequences of polyploidisation. Strong bottlenecks during its recent origin from interspecific hybridisation, and subsequently through intensive artificial selection, have severely depleted the genetic diversity available for breeding. On the other hand, high-throughput genome profiling technologies today provide unprecedented scope to identify, characterise and utilise genetic diversity in primary and secondary crop gene pools. Such methods also enable implementation of genomic selection strategies to accelerate breeding progress. The key prerequisite is availability of high-quality sequence data and identification of high-quality, genome-wide sequence polymorphisms representing relevant gene pools. We present comprehensive genome resequencing data from a panel of 52 highly diverse natural and synthetic B. napus accessions, along with a stringently selected panel of 4.3 million high-confidence, genome-wide SNPs. The data is of great interest for genomics-assisted breeding and for evolutionary studies on the origins and consequences in allopolyploidisation in plants. PMID:26647166

Brassica napus (oilseed rape, canola) is one of the world's most important sources of vegetable oil for human nutrition and biofuel, and also a model species for studies investigating the evolutionary consequences of polyploidisation. Strong bottlenecks during its recent origin from interspecific hybridisation, and subsequently through intensive artificial selection, have severely depleted the genetic diversity available for breeding. On the other hand, high-throughput genome