Non-redundant Genome Datasets

The files here consist of PERL modules that give lists of genome datasets filtered at different Genomic Similarity Score (GSS) thresholds. These files correspond to those used in the study.

These PERL modules contain eukaryotes, but the study included only prokaryotic genomes.

The files named NRDOMAINS contain the lists of non-redundant genomes. The files named REPRESENTS contain the information about which genome is representing another genome in the non-redundant genome dataset. For instance, the line:

"E_coli_O157H7_EDL933" => "E_coli_K12" means that the E_coli_O157H7_EDL933 genome is represented by E_coli_K12; E_coli_O157H7_EDL933 is redundant.

The numbers of the form: 0_XX are GSS thresholds. For instance, NRDOMAINS_0_70 means a non-redundant genome dataset obtained with a GSS threshold of 0.70.

The REDUNDANCY table contains the list of non-redundant genomes obtained at a GSS threshold of 0.70.

PERL Modules

SCORES-GNMS: This file contains the GSS used to build the non-redundant genome datasets.

DOMAINS: The complete set of genomes available at the time of the study.

Operon Predictions

These files contain operon predictions at a confidence value of 0.90 for all the genomes available at the time we started the work. The predictions were obtained with phylogenetic profiles generated using a non-redundant genome dataset filtered at a genomic similarity score of 0.70.