A basic problem of the metagenomic approach in microbial ecology is the assignment of
genomic fragments to a certain species or taxonomic group, when suitable marker genes
are absent. Currently, the (G+C)-content together with phylogenetic information and
codon adaptation for functional genes is mostly used to assess the relationship of different
fragments. These methods, however, can produce ambiguous results. In order to evaluate
sequence-based methods for fragment identification, we extensively compared (G+C)contents
and tetranucleotide usage patterns of 9,054 fosmid-sized genomic fragments
generated in silico from 118 completely sequenced bacterial genomes (40,982,931 fragment
pairs were compared in total). The results of this systematic study show that the
discriminatory power of correlations of tetranucleotide-derived z-scores is by far superior
to that of differences in (G+C)-content and provide reasonable assignment probabilities
when applied to metagenome libraries of small diversity. Using six fully sequenced fosmid
inserts from a metagenomic analysis of microbial consortia mediating the anaerobic
oxidation of methane (AOM), we demonstrate that discrimination based on
tetranucleotide-derived z-score correlations was consistent with corresponding data from
16S ribosomal RNA sequence analysis and allowed us to discriminate between fosmid
inserts that were indistinguishable with respect to their (G+C)-contents.