Bacterium Tells All, Human Tells a Lot

The idea behind sequencing an organism’s genome--decoding, letter by letter, the message contained in every last one of its genes--is that it would tell us a lot about how the organism works. We would know the basic structure of every last one of its proteins: every enzyme in its metabolism, every signal-bearing hormone coursing through its body, every receptor poised to receive those signals on the surface of its cells. We would own that organism--or at least we’d have the owner’s manual. Geneticists have dreamed of such knowledge for decades and have talked about it as a real possibility for seven years or so. This past year they got it for the first time, albeit only for a bacterium. And they made huge strides toward getting it for a far more interesting organism, Homo sapiens.

Both milestones were achieved by an upstart organization called the Institute for Genomic Research in Maryland. The institute’s president, Craig Venter, had left the National Institutes of Health a few years back because it wouldn’t fund his novel approach to genome sequencing; it was too radical and unlikely to succeed, the reviewers thought. But Venter laughed last. In July he and his team announced that they had sequenced all 1.8 million base pairs--the rungs of the DNA double helix, and the letters of the genetic alphabet--that make up the single circular chromosome of Haemophilus influenzae, a bacterium that in its wild form causes ear infections and meningitis. Then, just two months later, Venter and his collaborators contributed partial sequences of nearly 30,000 human genes to a 379-page atlas of the human genome published by the journal Nature.

The other part of the atlas was a physical map of the genome created by a group of researchers from the Center for the Study of Human Polymorphism in Paris. That map was an attempt to arrange large fragments of mostly unsequenced DNA--constituting three-quarters of our genome--in their proper slots on the 23 pairs of human chromosomes. Until now it’s been assumed that you would always need such a map before you could sequence a genome so that you would know where to put the gene sequences when you found them.

But Venter’s approach to the Haemophilus genome--whole genome shotgun sequencing, he calls it--was to scrap the mapping process altogether. We took millions of copies of the whole chromosome from Haemophilus, he says, and we used sound waves to break them apart into tiny pieces. And instead of trying to map those, we just sequenced them.

Putting the pieces together was the tricky part. But Venter and his colleagues knew that the many copies of the bacterial chromosome wouldn’t all break apart in exactly the same way. That guaranteed there would be fragments from different copies that had overlapping ends. In principle, once all the fragments had been sequenced, the researchers could piece together the whole chromosome the way you might piece together a single panoramic photograph from a series of overlapping frames. There would be no need for a physical map.

The problem was there were an awful lot of pieces. For Haemophilus, we had about 26,000 pieces, all 500 base pairs long, says Venter. You can imagine it’s not a trivial exercise to compare the exact order of something 500 letters long with the exact order of hundreds of thousands of others until you find the right matches, and then keep doing that until you have all of them compared. The solution was an ingenious bit of software, designed at Venter’s institute, that drastically cuts down on the amount of computer time by screening the pieces and comparing only the most likely matches. With this new approach, it took us a year to do Haemophilus, and that’s just because we were working out all the methods, Venter says. In October his team published the complete genome of a second bacterium, Mycoplasma genitalium, and by the end of the year they hoped to have done a third.

The human genome is another matter entirely, of course: at 3 billion base pairs instead of a couple million, it is too large to be sequenced all at once by Venter’s shotgun technique. Fortunately only about 3 percent of that is actual genetic information that gets translated into proteins; the rest is regulatory sequences, old, nonfunctioning genes, or outright nonsense. When a cell needs a particular protein, it first transcribes the appropriate gene into a tightly edited molecule of pure information called messenger RNA. Venter’s team took advantage of that natural process. Instead of blasting away at whole chromosomes, they extracted mRNA from 37 different human organs and tissues, copied it back into DNA, and sequenced that.

The laboratory copying process isn’t perfect, though--it tends to abort before a strand of mRNA, representing one whole gene, has been completely copied into DNA. So Venter’s team mostly got pieces of genes and then used their fancy software to assemble the pieces as best they could. In the end they were able to put together at least some part of 29,599 genes. (We’re estimated to have 80,000 or so in all.) Moreover, because they had an idea of what cells the genes came from and what kinds of gene sequences produce what kinds of proteins, they were able to draw some conclusions about what all those genes are good for. Venter estimates that about 16 percent of them play a role in our metabolism, 12 percent are used for communication from cell to cell, and 4 percent are devoted to helping us and our cells reproduce. Fully a third of our genes, he suspects, will turn out to be active only in the brain; he’s found 3,000 of them already.

The beauty of genome sequencing, Venter says, is that one day we will not only understand better how a particular species of organism works but also, by comparing their genomes, how one species evolves into another. It’s also true, though, that Venter and his crew stand to profit handsomely from their research. Already scientists at Human Genome Sciences in Rockville, Maryland--a biotech company affiliated with Venter’s research institute--have begun working on a more effective vaccine against Haemophilus. (The existing one works for only one particular meningitis-causing strain.) The company has filed for patents on more than a hundred of the human genes that Venter’s team has uncovered. And if Venter keeps up the pace he set in 1995, there will be many more patents to come.