share

Insights from Genomic Data

The complete assembly of the entire human genome sequence by Venter et al. confirms recent estimates that the total number of human protein coding genes might be less than 30,000 -- only one-third more that the nematode Caenorhabditis elegans. Claverie points out that such a low gene number could drastically modify our understanding of organism complexity and evolution as well as our current interpretation of transcriptome analyses. He suggests that there may be severe consequences for the long-term sustainability of the biomedical industry in the postgenomic era.

Courseaux and Nahon analyze the structural organization, pattern of expression, and origin of two genes that have emerged during primate evolution by a combination of retrotransposition of an RNA sequence, sequence mutations, and de novo creation of splice sites in adjoining sequences. These findings shed light on the first steps in the origins of new genes, and offer clues to the process by which humans and their close primate relatives diverged genetically from other mammals.

Cells are continually exposed to environmental and endogenous insults that damage DNA. Left unrepaired, this damage would eventually lead to genome instability, with devastating consequences for both the cell and the organism. Wood et al. have surveyed the human genome sequence and compiled a comprehensive list of genes that help the cell recognize and repair DNA damage. Ongoing studies of how the products of these repair genes interact with one other promises to shed new light on fundamental cellular control mechanisms that go awry in cancer, as well as in normal aging.

Comparison of the proteins coded in the human genome with those from the fruit fly and worms (nematodes) confirms that, in the course of evolution, the process of programmed cell death, or apoptosis, has become more complex. Aravind et al. found that nematode cells function with just one protein in the NACHT family of nucleoside triphosphatases in their apoptotic arsenal. The human genome, however, shows no fewer than 18 proteins that belong in this family and that are related to NAIM (neuronal apoptosis inhibitory protein), a protein defective in spinal muscle atrophy. Oddly, homologs of proteins in the human apoptotic machinery are found in some bacteria, suggesting that there has been relatively recent transfer of such genes between these organisms.

Once gene sequences are determined, the next question is often, how do these data relate to expression? Caron et al. describe the integration of existing serial analysis of gene expression (SAGE) data, which shows the level of messenger RNA expression, with the human gene map to show the pattern of genome-wide expression. This Human Transcriptome Map (HTM), created from both normal and diseased tissue types, indicates that highly expressed genes tend to be clustered in specific chromosomal regions, or RIDGEs. This finding is contrary to analyses in yeast and suggests that the human genome is organized in a higher order structure.