Relationship between phylogenetic distribution and genomic features in Neurospora crassa.

Kasuga T, Mannhaupt G, Glass NL - PLoS ONE (2009)

Bottom Line:
We found that 11% of N. crassa-orphans have paralogous N. crassa-orphan genes.Of the paralogous N. crassa-orphan gene pairs, 33% were tandemly located in the genome, implying a duplication origin of N. crassa-orphan PCGs in the past.LS grouping is thus a useful tool to explore and understand genome organization, evolution and gene function in fungi.

Affiliation: Department of Plant and Microbial Biology, University of California, Berkeley, California, USA.

ABSTRACTIn the post-genome era, insufficient functional annotation of predicted genes greatly restricts the potential of mining genome data. We demonstrate that an evolutionary approach, which is independent of functional annotation, has great potential as a tool for genome analysis. We chose the genome of a model filamentous fungus Neurospora crassa as an example. Phylogenetic distribution of each predicted protein coding gene (PCG) in the N. crassa genome was used to classify genes into six mutually exclusive lineage specificity (LS) groups, i.e. Eukaryote/Prokaryote-core, Dikarya-core, Ascomycota-core, Pezizomycotina-specific, N. crassa-orphans and Others. Functional category analysis revealed that only approximately 23% of PCGs in the two most highly lineage-specific grouping, Pezizomycotina-specific and N. crassa-orphans, have functional annotation. In contrast, approximately 76% of PCGs in the remaining four LS groups have functional annotation. Analysis of chromosomal localization of N. crassa-orphan PCGs and genes encoding for secreted proteins showed enrichment in subtelomeric regions. The origin of N. crassa-orphans is not known. We found that 11% of N. crassa-orphans have paralogous N. crassa-orphan genes. Of the paralogous N. crassa-orphan gene pairs, 33% were tandemly located in the genome, implying a duplication origin of N. crassa-orphan PCGs in the past. LS grouping is thus a useful tool to explore and understand genome organization, evolution and gene function in fungi.

pone-0005286-g001: Lineage specificity classification of predicted N. crassa protein coding gene (PCG) set based on phylogenetic distribution.A black circle indicates that the gene homolog is present in the corresponding lineage; a white circle means it is absent. Number of PCGs in each LS group is shown at the bottom. Note that N. crassa is a member of the class Sordariomycetes, which is within the Pezizomycotina.

Mentions:
Mutually exclusive LS groups of N. crassa PCGs were delimited based on the absence/presence of homologous genes in defined taxonomic units (see also [3]) (Fig. 1, a complete list of genes can be found in Table S1). The membership in each LS group depends of the threshold values for percent protein identity. As anticipated, the higher the threshold value is, the more genes are assigned to specific LS groups such as N. crassa-orphan and Pezizomycotina-specific genes (Fig. 2). We therefore chose 30% for the threshold value of length-adjusted protein identity, at which the majority of genes are predicted to encode structurally homologous proteins [11]. Among the phylogenetic groups, 2,358 N. crassa PCGs were highly conserved and had at least 30% length-adjusted protein identity with PCGs in non-fungal eukaryotes (e.g. plants and animals) and/or prokaryotes, in addition to Ascomycota and Basidiomycota species. This group of PCGs genes was referred to as Euk/Prok-core. Homologs of 1,026 N. crassa PCGs were found in Basidiomycota fungi, in addition to Ascomycota fungi, and which were defined as Dikarya-core genes. Homologs of 145 N. crassa PCGs were found in species in the Saccharomycotina (e.g. Saccharomyces cerevisiae) and/or Taphrinomycotina (e.g. Schizosaccharomyces pombe), but homologs were not identified in non-Ascomycota fungi. This group of PCGs was defined as Ascomycota-core genes. All the Ascomycota-core genes also had homologs in the genomes of Pezizomycotina fungi. Homologs of 3,194 N. crassa PCGs were identified in members of the Pezizomycotina, but not in members of the Saccharomycotina or Taphrinomycotina. This group of genes was defined as Pezizo-specific genes. For 2,219 of the 9,127 PCGs predicted in the N. crassa genome, homologous genes were not identified in any other genome; these were defined as N. crassa-orphans. Of the remaining 185 N. crassa genes, at least one homolog was identified in non-fungal eukaryotes or bacteria in addition to Pezizomycotina fungi, but homologous sequences were not identified in the genomes of Basidiomycota, Saccharomycotina or Taphrinomycotina fungi. Since the lineage-specificity and origin of this group of genes was unclear, they were gathered together in a group termed “Others”. The Others group includes genes that are conserved in the Pezizomycotina clade, but which may have been lost or diverged in the genomes of other members of the Ascomycota and Basidiomycota. Others also includes genes that are candidates for horizontally transferred genes (Charles Hall, personal communication). The coverage of sequenced taxa in the database and the quality of annotation influences the membership of LS groups. For instance, the number of genes in the N. crassa-orphan group is likely to be reduced upon the release of genomes from closely-related species, such as N. tetrasperma and N. discreta (currently being sequenced by the Joint Genomes Institute).

pone-0005286-g001: Lineage specificity classification of predicted N. crassa protein coding gene (PCG) set based on phylogenetic distribution.A black circle indicates that the gene homolog is present in the corresponding lineage; a white circle means it is absent. Number of PCGs in each LS group is shown at the bottom. Note that N. crassa is a member of the class Sordariomycetes, which is within the Pezizomycotina.

Mentions:
Mutually exclusive LS groups of N. crassa PCGs were delimited based on the absence/presence of homologous genes in defined taxonomic units (see also [3]) (Fig. 1, a complete list of genes can be found in Table S1). The membership in each LS group depends of the threshold values for percent protein identity. As anticipated, the higher the threshold value is, the more genes are assigned to specific LS groups such as N. crassa-orphan and Pezizomycotina-specific genes (Fig. 2). We therefore chose 30% for the threshold value of length-adjusted protein identity, at which the majority of genes are predicted to encode structurally homologous proteins [11]. Among the phylogenetic groups, 2,358 N. crassa PCGs were highly conserved and had at least 30% length-adjusted protein identity with PCGs in non-fungal eukaryotes (e.g. plants and animals) and/or prokaryotes, in addition to Ascomycota and Basidiomycota species. This group of PCGs genes was referred to as Euk/Prok-core. Homologs of 1,026 N. crassa PCGs were found in Basidiomycota fungi, in addition to Ascomycota fungi, and which were defined as Dikarya-core genes. Homologs of 145 N. crassa PCGs were found in species in the Saccharomycotina (e.g. Saccharomyces cerevisiae) and/or Taphrinomycotina (e.g. Schizosaccharomyces pombe), but homologs were not identified in non-Ascomycota fungi. This group of PCGs was defined as Ascomycota-core genes. All the Ascomycota-core genes also had homologs in the genomes of Pezizomycotina fungi. Homologs of 3,194 N. crassa PCGs were identified in members of the Pezizomycotina, but not in members of the Saccharomycotina or Taphrinomycotina. This group of genes was defined as Pezizo-specific genes. For 2,219 of the 9,127 PCGs predicted in the N. crassa genome, homologous genes were not identified in any other genome; these were defined as N. crassa-orphans. Of the remaining 185 N. crassa genes, at least one homolog was identified in non-fungal eukaryotes or bacteria in addition to Pezizomycotina fungi, but homologous sequences were not identified in the genomes of Basidiomycota, Saccharomycotina or Taphrinomycotina fungi. Since the lineage-specificity and origin of this group of genes was unclear, they were gathered together in a group termed “Others”. The Others group includes genes that are conserved in the Pezizomycotina clade, but which may have been lost or diverged in the genomes of other members of the Ascomycota and Basidiomycota. Others also includes genes that are candidates for horizontally transferred genes (Charles Hall, personal communication). The coverage of sequenced taxa in the database and the quality of annotation influences the membership of LS groups. For instance, the number of genes in the N. crassa-orphan group is likely to be reduced upon the release of genomes from closely-related species, such as N. tetrasperma and N. discreta (currently being sequenced by the Joint Genomes Institute).

Bottom Line:
We found that 11% of N. crassa-orphans have paralogous N. crassa-orphan genes.Of the paralogous N. crassa-orphan gene pairs, 33% were tandemly located in the genome, implying a duplication origin of N. crassa-orphan PCGs in the past.LS grouping is thus a useful tool to explore and understand genome organization, evolution and gene function in fungi.

Affiliation:
Department of Plant and Microbial Biology, University of California, Berkeley, California, USA.

ABSTRACTIn the post-genome era, insufficient functional annotation of predicted genes greatly restricts the potential of mining genome data. We demonstrate that an evolutionary approach, which is independent of functional annotation, has great potential as a tool for genome analysis. We chose the genome of a model filamentous fungus Neurospora crassa as an example. Phylogenetic distribution of each predicted protein coding gene (PCG) in the N. crassa genome was used to classify genes into six mutually exclusive lineage specificity (LS) groups, i.e. Eukaryote/Prokaryote-core, Dikarya-core, Ascomycota-core, Pezizomycotina-specific, N. crassa-orphans and Others. Functional category analysis revealed that only approximately 23% of PCGs in the two most highly lineage-specific grouping, Pezizomycotina-specific and N. crassa-orphans, have functional annotation. In contrast, approximately 76% of PCGs in the remaining four LS groups have functional annotation. Analysis of chromosomal localization of N. crassa-orphan PCGs and genes encoding for secreted proteins showed enrichment in subtelomeric regions. The origin of N. crassa-orphans is not known. We found that 11% of N. crassa-orphans have paralogous N. crassa-orphan genes. Of the paralogous N. crassa-orphan gene pairs, 33% were tandemly located in the genome, implying a duplication origin of N. crassa-orphan PCGs in the past. LS grouping is thus a useful tool to explore and understand genome organization, evolution and gene function in fungi.