A powerful non-homology method for the prediction of operons in prokaryotes

MOTIVATION: The prediction of the transcription unit organization of genomes is an important clue in the inference of functional relationships of genes, the interpretation and evaluation of transcriptome experiments, and the overall inference of the regulatory networks governing the expression of genes in response to the environment. Though several methods have been devised to predict operons, most need a high characterization of the genome analysed. Log-likelihoods derived from inter-genic distance distributions work surprisingly well to predict operons in Escherichia coli and are available for any genome as soon as the gene sets are predicted. RESULTS: Here we provide evidence that the very same method is applicable to any prokaryotic genome. First, the method has the same efficiency when evaluated using a collection of experimentally known operons of Bacillus subtilis. Second, operons among most if not all prokaryotes seem to have the same tendencies to keep short distances between their genes, the most frequent distances being the overlaps of four and one base pairs. The universality of this structural feature allows us to predict the organization of transcription units in all prokaryotes. Third, predicted operons contain a higher proportion of genes with related phylogenetic profiles and conservation of adjacency than predicted borders of transcription units.