Bottom Line:
The average reduction of the number of genes was over ten-fold.Comparison with various types of experimental datasets (including QTL fine-mapping and Genome Wide Association Study results) indicated both statistical significance and biological relevance of the obtained connections between genes and traits.This way it capitalizes on QTL data to uncover how individual genes influence trait variation.

Background: Elucidation of genotype-to-phenotype relationships is a major challenge in biology. In plants, it is the basis for molecular breeding. Quantitative Trait Locus (QTL) mapping enables to link variation at the trait level to variation at the genomic level. However, QTL regions typically contain tens to hundreds of genes. In order to prioritize such candidate genes, we show that we can identify potentially causal genes for a trait based on overrepresentation of biological processes (gene functions) for the candidate genes in the QTL regions of that trait.

Results: The prioritization method was applied to rice QTL data, using gene functions predicted on the basis of sequence- and expression-information. The average reduction of the number of genes was over ten-fold. Comparison with various types of experimental datasets (including QTL fine-mapping and Genome Wide Association Study results) indicated both statistical significance and biological relevance of the obtained connections between genes and traits. A detailed analysis of flowering time QTLs illustrates that genes with completely unknown function are likely to play a role in this important trait.

Conclusions: Our approach can guide further experimentation and validation of causal genes for quantitative traits. This way it capitalizes on QTL data to uncover how individual genes influence trait variation.

Fig1: Prioritizing QTL candidate genes via associating traits to biological processes. (A) Principle of method used: Biological processes (indicated as different colored boxes) are annotated for genes in QTL regions for a trait-of-interest. Using these gene functions, trait-biological process associations are obtained based on enrichment of biological processes among the genes linked to a particular trait, integrating information from multiple QTL regions. Genes annotated with overrepresented biological processes are prioritized. (B) Number of QTL regions connected to traits in the rice QTL compendium used for this analysis. The scale of the horizontal axis in the histogram is clipped at 50, so traits with more than 50 QTL regions associated (~2% of the total) are not included. (C) Number of genes connected to traits in the rice QTL compendium. The scale of the horizontal axis in the histogram is clipped at 5000, so traits with more than 5000 genes associated (~5% of the total) are not included.

Mentions:
We here present a novel computational method for plant QTL candidate gene prioritization. In our approach (Figure 1A), for each gene contained in every QTL region for a trait-of-interest, we first predict which biological processes it is involved in. This is done using our previously developed gene function prediction method BMRF, which uses sequence data and co-expression information as input [34]. Enrichment (overrepresentation) of biological process (BP) terms, preferably based on multiple QTL regions for a given trait, allows association of the trait-of-interest with specific biological processes. Overrepresented BP terms are used to prioritize the candidate genes from the QTL gene lists that are most likely to be the underlying causal genes responsible for the variation in the trait-of-interest.Figure 1

Fig1: Prioritizing QTL candidate genes via associating traits to biological processes. (A) Principle of method used: Biological processes (indicated as different colored boxes) are annotated for genes in QTL regions for a trait-of-interest. Using these gene functions, trait-biological process associations are obtained based on enrichment of biological processes among the genes linked to a particular trait, integrating information from multiple QTL regions. Genes annotated with overrepresented biological processes are prioritized. (B) Number of QTL regions connected to traits in the rice QTL compendium used for this analysis. The scale of the horizontal axis in the histogram is clipped at 50, so traits with more than 50 QTL regions associated (~2% of the total) are not included. (C) Number of genes connected to traits in the rice QTL compendium. The scale of the horizontal axis in the histogram is clipped at 5000, so traits with more than 5000 genes associated (~5% of the total) are not included.

Mentions:
We here present a novel computational method for plant QTL candidate gene prioritization. In our approach (Figure 1A), for each gene contained in every QTL region for a trait-of-interest, we first predict which biological processes it is involved in. This is done using our previously developed gene function prediction method BMRF, which uses sequence data and co-expression information as input [34]. Enrichment (overrepresentation) of biological process (BP) terms, preferably based on multiple QTL regions for a given trait, allows association of the trait-of-interest with specific biological processes. Overrepresented BP terms are used to prioritize the candidate genes from the QTL gene lists that are most likely to be the underlying causal genes responsible for the variation in the trait-of-interest.Figure 1

Bottom Line:
The average reduction of the number of genes was over ten-fold.Comparison with various types of experimental datasets (including QTL fine-mapping and Genome Wide Association Study results) indicated both statistical significance and biological relevance of the obtained connections between genes and traits.This way it capitalizes on QTL data to uncover how individual genes influence trait variation.

Background: Elucidation of genotype-to-phenotype relationships is a major challenge in biology. In plants, it is the basis for molecular breeding. Quantitative Trait Locus (QTL) mapping enables to link variation at the trait level to variation at the genomic level. However, QTL regions typically contain tens to hundreds of genes. In order to prioritize such candidate genes, we show that we can identify potentially causal genes for a trait based on overrepresentation of biological processes (gene functions) for the candidate genes in the QTL regions of that trait.

Results: The prioritization method was applied to rice QTL data, using gene functions predicted on the basis of sequence- and expression-information. The average reduction of the number of genes was over ten-fold. Comparison with various types of experimental datasets (including QTL fine-mapping and Genome Wide Association Study results) indicated both statistical significance and biological relevance of the obtained connections between genes and traits. A detailed analysis of flowering time QTLs illustrates that genes with completely unknown function are likely to play a role in this important trait.

Conclusions: Our approach can guide further experimentation and validation of causal genes for quantitative traits. This way it capitalizes on QTL data to uncover how individual genes influence trait variation.