Our work aims at finding universal rules of genome organisation. We extended the concept of gene essentiality to that of persistence in many genomes. We uncovered new universal motifs: flexible patterns densely covering genomes. We also found universal rules in the distribution of amino acids in proteins. In an international collaboration with the Genoscope we sequenced the genome of a bacterium from Antarctica, Pseudoalteromonas haloplanktis TAC125, trying to understand resistance to cold and aging.

The aim of genomics is to understand the functional organisation of genes within chromosomes, explaining how this organisation produces life. Bacteria are ideal for such studies as they existed formore than three thousand billion years and are highly diverse. Understanding how genes interact makes it possible to evaluate the adaptive potential of bacteria, both in the environment and in and on our bodies (they are everywhere and our bodies contain at least ten times more bacteria than human cells). Despite the negative connotations associated with bacteria, the fashion for nutraceutics ("medical" foods) is based on the implicit idea that bacteria are most often beneficial, even if, on occasion, they can become highly pathogenic. Remarkably, there are few differences between commensal bacteria and bacteria responsible for diseases. One of the aims of comparative genomics is to understand how differences in genome organisation can determine whether a bacterium is innocuous (or beneficial to the host) or virulent. We started by trying to uncover universal rules of genome organisation.

Considering the genome as a whole, we discovered that the enigmatic periodic bias of 10-11.5 in the distribution of nucleotides found in almost all genomes is caused by ubiquitous patterns that we termed "class A flexible patterns". Each pattern is composed of up to ten conserved nucleotides or dinucleotides distributed into a discontinuous motif. The reason why they had escaped identification until that work is that, being flexible, they cannot generate a rigid consensus sequence of the type which is universally considered by investigators analysing genome sequences. We further analysed the overall amino acid composition of the proteomes (the total protein complement) of model prokaryotes supposed to have undergone separate evolution for more than one billion years. Using multivariate analyses, we showed that the electric charge of amino acids measured against hydrophobicity creates a homogeneous cluster, made exclusively of proteins that are core components of the cytoplasmic membrane of the cell. A second bias is imposed by the G+C content of the genome, acting at the first codon position, indicating that protein functions are so robust with respect to amino acid changes that they can accommodate a large shift in the nucleotide content of the genome. A remarkable role of aromatic amino acids was uncovered. Expressed orphan proteins are enriched in these residues, suggesting that they might participate in a process of gain of function during evolution. We propose that many of these proteins are "gluons" stabilising multicomponent complexes, thus labellling the "self" of a species. All these studies substantiate the relevance of our programme: genomes are not random bags of genes but organized entities. Among the many possible factors that might play a selective role, we retained two for further experimental studies : temperature and chemical reactivity. The metabolism of sulfur was chosen as this atom is extremely reactive, while it is an obligatory component of cells and involved in the synthesis of all proteins at the initiator methionine amino acid.

All this requires reference models in which we understand all we can do about the organism. Two major classes of bacteria are distinguished by a specific staining method developed by the Dane Christian Gram. Gram-positive bacteria are common in foods (lactobacilli and streptococci are present in yoghurts and cold meats, for example). In some cases, they may be pathogenic (Staphylococcus aureus). The model for these bacteria is Bacillus subtilis, for which the Unit has been a driving force behind genomic studies. Our current research aims to determine how the genes of this organism are organised, both by computer-based (in silico) studies based on the analysis of gene sequences and their products (mRNA and protein), and by the study of sulfur metabolism, which is highly structuring. We began by establishing a number of selective rules forcing genes to prefer one DNA strand rather than the other. These rules are due to the exertion of a selection pressure that favours the progression of the transcription fork in the same direction as transcription, preventing conflicts leading to the production of truncated mRNA molecules, which in turn generate truncated proteins. We further extended the concept. For this we reversed the analysis, considering genes which are preferentially located in the leading strand, while not "laboratory-essential". We showed that the characteristics of persistence, conservation, expression and location are shared between persistent non-essential genes and experimentally essential genes. Persistent non-essential genes are related to maintenance and stress responses. This outlined the limits of current experimental techniques to define gene essentiality and highlighted the essential role of genes implicated in maintenance which, although dispensable for growth, are not dispensable from an evolutionary point of view. Sulfur metabolism genes are grouped in functional islands. Within these islands, we recently characterised mostly genes encoding transport proteins and some of the proteins regulating their expression including a master control gene. We now extend our studies to pathogenic organisms of the same class. We have also characterised the recycling of the first methionine present in all proteins when they are being synthesized in vivo.

Escherichia coli is the model gram-negative bacterium and is today the best understood organism in the world. As part of a transverse research programme, we analysed families of Gram-negative bacteria in an attempt to determine what makes some bacteria beneficial and others not (for example, most strains of E. coli are harmless, but certain strains of E. coli cause colibacillosis, a well-known disease). We studied the determinants of pathogenesis in a related bacterium, Photorhabdus luminescens. This bacterium is extremely pathogenic in insects and would be highly dangerous to humans if it were able to grow at our body temperature, which is fortunately not the case. Constructing DNA arrays of most of the genes, we developed large-scale expression profiling studies of the organism. We characterised a series of genetic control systems (mediated by PhoP-PhoQ; AstR-AstS and H-NS), to identify the keys to the remarkable pathogenicity of this organism. This work is continued using the silk worm as the host organism. One of the main advantages of this approach is that it enables us to study bacterial virulence without using mammals, whilst generating results that can be extrapolated to these animals.

To further explore its role in cold conditions in a global context, we undertook in collaboration with the Genoscope and several European universities the sequencing, annotation and physiological analysis of the fast growing Antarctica bacterium, Pseudoalteromonas haloplanktis TAC125. We discovered that a remarkable strategy for avoidance of Reactive Oxygen Species generation is developed by these bacteria, with concerted elimination of the ubiquitous sulfur-related molybdopterin-dependent metabolism, substantiating our emphasis on sulfur metabolism as a preferred target of integrative processes in bacteria. The P. haloplanktis proteome revealed an amino acid usage bias specific to psychrophiles, consistently appearing apt to accommodate asparagine, a residue prone to age in proteins through cyclisation and deamidation. It may be of interest to remark that the workforce needed for sequencing and annotating TAC125 was 100-times less than for sequencing B. subtilis. This can place genome programmes in the perspective of the next four years.

Sulfur metabolism aside, we have for several years documented analyses suggesting that biochemical structures making planes (often hexagons) or tubes must play a role in the genome structure organisation and we noticed that uridylate kinase had an hexagonal structure. Furthermore, the sources of nucleotides in the cell are likely highly organized, in particular those needed for DNA synthesis. The very fact that nucleosides diphosphates, not triphosphates are the precursors of deoxyribonucleotides creates a series of paradoxes in the pyrimidine metabolism (UDP is made in de novo pyrimidine biosynthesis, whereas CDP is not, while DNA must avoid U and incorporate C). We were therefore very interested in the structuring role of nucleotide kinases. We determined the structure of hexameric uridylate kinase, revealing a large number of unexpected properties. Uridylate kinases make an original class in most bacteria, and it was interesting to compare them with other nucleotide kinases: the structure of a GMP kinase was also solved. Remarkably, uridylate kinase is coded by a gene (pyrH) which, in Bacteria as distant as Firmicutes and Proteobacteria, belongs to an operon involved in translation, while uridine containing nucleotides have not, until now, been involved in the translation process.

Photos :

The two chromosomes of Pseudoalteromonas haloplanktis TAC125, fast growing bacteria isolated from station Dumont d'Urville, Antarctica. Circles display (from the outside): (1) predicted coding genes transcribed in the clockwise direction; (2) predicted coding genes transcribed in the counterclockwise direction. Genes are color-coded according to their different functional categories. (3) tRNAs (green) and rRNA (pink) on chrI / genes coding for proteins similar to phage proteins (red) on chrII; (4) and tonB and tonB-like genes in grey. Chromosome II gene names similar to those of the R1 plasmid replication apparatus (unidirectional) are colored in green.