Bottom Line:
Because many of the recently accumulating genomic data are draft genome sequences for which some complete genome sequences of the same or closely related species are available, MBGD now stores draft genome data and allows the users to incorporate them into a user-specific ortholog database using the MyMBGD functionality.In this function, draft genome data are incorporated into an existing ortholog table created only from the complete genome data in an incremental manner to prevent low-quality draft data from affecting clustering results.In addition, to provide high-quality orthology relationships, the standard ortholog table containing all the representative genomes, which is first created by the rapid classification program DomClust, is now refined using DomRefine, a recently developed program for improving domain-level clustering using multiple sequence alignment information.

Figure 2: Modification of the domain-level classification in the standard ortholog table by DomRefine. (A) An example of modification by DomRefine. Here, two clusters A and B are merged into a new cluster AB. In this case, the number of clusters is reduced from two to one (cluster-level modification) and the numbers of domain-reorganized genes and of classification-changed genes are two and four, respectively (gene-level modification). (B) The effect of cluster-level modification by DomRefine. (C) The effect of gene-level modification by DomRefine.

Mentions:
Although MBGD can provide various specialized ortholog tables, the standard (default) ortholog table covering the entire taxonomic diversity is the most important for general use. Unfortunately, however, preserving the quality of ortholog classification generally becomes more difficult when a more diverse set of genomes is to be incorporated. Thus, improving the domain-level classification in the standard ortholog table is a critical issue. In this release, the DomRefine pipeline was used to improve the domain-level classification generated by DomClust to create a refined version of the standard ortholog table. DomRefine takes DomClust output as input, and for each pair of domain-level ortholog groups that are adjacent in at least one common protein, it constructs a multiple sequence alignment containing both groups and tries to modify the domain organization by maximizing the sum of the domain-level alignment scores (domain-specific sum-of-pairs or DSP score) of the multiple sequence alignment (15) (Figure 2A). During this optimization procedure, DomRefine also tries to split a cluster into smaller groups according to the phylogenetic gene tree constructed from the multiple sequence alignment (15).

Figure 2: Modification of the domain-level classification in the standard ortholog table by DomRefine. (A) An example of modification by DomRefine. Here, two clusters A and B are merged into a new cluster AB. In this case, the number of clusters is reduced from two to one (cluster-level modification) and the numbers of domain-reorganized genes and of classification-changed genes are two and four, respectively (gene-level modification). (B) The effect of cluster-level modification by DomRefine. (C) The effect of gene-level modification by DomRefine.

Mentions:
Although MBGD can provide various specialized ortholog tables, the standard (default) ortholog table covering the entire taxonomic diversity is the most important for general use. Unfortunately, however, preserving the quality of ortholog classification generally becomes more difficult when a more diverse set of genomes is to be incorporated. Thus, improving the domain-level classification in the standard ortholog table is a critical issue. In this release, the DomRefine pipeline was used to improve the domain-level classification generated by DomClust to create a refined version of the standard ortholog table. DomRefine takes DomClust output as input, and for each pair of domain-level ortholog groups that are adjacent in at least one common protein, it constructs a multiple sequence alignment containing both groups and tries to modify the domain organization by maximizing the sum of the domain-level alignment scores (domain-specific sum-of-pairs or DSP score) of the multiple sequence alignment (15) (Figure 2A). During this optimization procedure, DomRefine also tries to split a cluster into smaller groups according to the phylogenetic gene tree constructed from the multiple sequence alignment (15).

Bottom Line:
Because many of the recently accumulating genomic data are draft genome sequences for which some complete genome sequences of the same or closely related species are available, MBGD now stores draft genome data and allows the users to incorporate them into a user-specific ortholog database using the MyMBGD functionality.In this function, draft genome data are incorporated into an existing ortholog table created only from the complete genome data in an incremental manner to prevent low-quality draft data from affecting clustering results.In addition, to provide high-quality orthology relationships, the standard ortholog table containing all the representative genomes, which is first created by the rapid classification program DomClust, is now refined using DomRefine, a recently developed program for improving domain-level clustering using multiple sequence alignment information.