The Beginnings of Life on Earth

Christian de Duve

This article originally appeared in the September-October 1995 issue of American Scientist.

Advanced forms of life existed on earth at least 3.55 billion years ago. In rocks of that age, fossilized imprints have been found of bacteria that look uncannily like cyanobacteria, the most highly evolved photosynthetic
organisms present in the world today. Carbon deposits enriched in the lighter carbon-12 isotope over the heavier carbon-13 isotope—a sign of biological carbon assimilation—attest to an even older age. On the other hand, it is believed that our young planet, still in the throes of volcanic eruptions and battered by falling comets and asteroids, remained inhospitable to life for about half a billion years after its birth, together with the rest of
the solar system, some 4.55 billion years ago. This leaves a window of perhaps 200-300 million years for the appearance of life on earth.

This duration was once considered too short for the emergence of something as complex as a living cell. Hence suggestions were made that germs of life may have come to earth from outer space with cometary dust or even, as proposed by Francis Crick of DNA double-helix fame, on a spaceship sent out by some distant civilization. No evidence in support of these proposals has yet been obtained. Meanwhile the reason for making them has largely disappeared. It is now generally agreed that if life arose spontaneously by natural processes—a necessary assumption if we wish to remain within the realm of science—it
must have arisen fairly quickly, more in a matter of millennia or centuries, perhaps even less, than in millions of years. Even if life came from elsewhere, we would still have to account for its first development. Thus we might
as well assume that life started on earth.

How this momentous event happened is still highly conjectural, though no longer purely speculative. The clues come from the earth, from outer space, from laboratory experiments, and, especially, from life itself. The history
of life on earth is written in the cells and molecules of existing organisms. Thanks to the advances of cell biology, biochemistry and molecular biology, scientists are becoming increasingly adept at reading the text.

An important rule in this exercise is to reconstruct the earliest events in life's history without assuming they proceeded with the benefit of foresight. Every step must be accounted for in terms of antecedent and concomitant events. Each must stand on its own and cannot be viewed as a preparation for things to come. Any hint of teleology must be avoided.

Building Blocks

The early chemists invented the term "organic" chemistry to designate the part of chemistry that deals with compounds made by living organisms. The synthesis of urea by Friedrich Wöhler in 1828 is usually hailed as the first proof that a special "vital force" is not needed for organic syntheses. Lingering traces of a vitalistic mystique nevertheless long remained associated with organic chemistry, seen as a special kind of life-dependent chemistry that only human ingenuity could equate. The
final demystification of organic chemistry has been achieved by the exploration of outer space.

Spectroscopic analysis of incoming radiation has revealed that the cosmic spaces are permeated by an extremely tenuous cloud of microscopic particles, called interstellar dust, containing a variety of combinations of carbon,
hydrogen, oxygen, nitrogen and, sometimes, sulfur or silicon. These are mostly highly reactive free radicals and small molecules that would hardly remain intact under conditions on earth, but would interact to form more stable, typical organic compounds, many of them similar to substances found in living organisms. That such processes indeed take place is demonstrated by the presence of amino acids and other biologically significant compounds on celestial bodies—for example, the meteorite that fell in 1969 in Murchison, Australia, Comet Halley (which could be analyzed during its recent passage by means of instruments carried on a spacecraft), and Saturn's satellite Titan, the seas of which are believed to be made of hydrocarbons.

It is widely agreed that these compounds are not products of life, but form spontaneously by banal chemical reactions. Organic chemistry is nothing but carbon chemistry. It just happens to be enormously richer than the chemistry of other elements—and thus able to support life—because of the unique associative properties of the carbon atom. In all likelihood the first building blocks of life arose as do all natural chemical compounds--spontaneously,
according to the rules of thermodynamics.

The first hints that this might be so came from the laboratory, before evidence for it was found in space, through the historic experiments of Stanley Miller, now recalled in science textbooks. In the early 1950s, Miller was a graduate student in the University of Chicago laboratory of Harold Urey, the discoverer of heavy hydrogen and an authority on planet formation. He undertook experiments designed to find out how lightning—reproduced by repeated electric discharges—might have affected the primitive earth atmosphere, which Urey believed to be a mixture of hydrogen, methane, ammonia and water vapor. The result exceeded Miller's wildest hopes and propelled him instantly into the firmament of celebrities. In just a few days, more than 15 percent of the methane carbon subjected to electrical discharges in the laboratory had been converted to a variety of amino acids, the building blocks of proteins, and other potential biological constituents. Although the primitive atmosphere is no longer believed to be as rich in hydrogen as once thought by Urey, the discovery that the Murchison meteorite contains the same amino acids obtained by Miller, and even in the same relative proportions, suggests strongly that his results are relevant.

Miller's discovery has sparked the birth of a new chemical discipline, abiotic chemistry, which aims to reproduce in the laboratory the chemical events that initiated the emergence of life on earth some four billion years ago.
Besides amino acids and other organic acids, experiments in abiotic chemistry have yielded sugars, as well as purine and pyrimidine bases, some of which are components of the nucleic acids DNA and RNA, and other biologically significant substances, although often under more contrived conditions and in lower yields than one would expect for a prebiotic process. How far in the direction of biochemical complexity the rough processes studied by abiotic chemistry may lead is not yet clear. But it seems very likely that the first building blocks of nascent life were provided by amino acids and other small organic molecules such as are known to form readily in the laboratory and on celestial bodies. To what extent these substances arose on earth or were brought in by the falling comets and asteroids that contributed to the final accretion of our planet is still being debated.

The RNA World

Whatever the earliest events on the road to the first living cell, it is clear that at some point some of the large biological molecules found in modern cells must have emerged. Considerable debate in origin-of-life studies
has revolved around which of the fundamental macromolecules came first—the original chicken-or-egg question.

The modern cell employs four major classes of biological molecules—nucleic acids, proteins, carbohydrates and fats. The debate over the earliest biological molecules, however, has centered mainly on the nucleic acids, DNA and RNA, and the proteins. At one time or another, one of these molecular classes has seemed a likely starting point, but which? To answer that, we must look at the functions performed by each of these in existing organisms.

The proteins are the main structural and functional agents in the cell. Structural proteins serve to build all sorts of components inside the cell and around it. Catalytic proteins, or enzymes, carry out the thousands of chemical reactions that take place in any given cell, among them the synthesis of all other biological constituents (including DNA and RNA), the breakdown of foodstuffs and the retrieval and consumption of energy. Regulatory proteins command the numerous interactions that govern the expression and replication of genes, the performance of enzymes, the interplay between cells and their environment, and many other manifestations. Through the action of proteins,
cells and the organisms they form arise, develop, function and evolve in a manner prescribed by their genes, as modulated by their surroundings.

The one thing proteins cannot do is replicate themselves. To be sure, they can, and do, facilitate the formation of bonds between their constituent amino acids. But they cannot do this without the information contained within
the nucleic acids, DNA and RNA. In all modern organisms, DNA serves as the storage site of genetic information. The DNA contains, in encrypted form, the instructions for the manufacture of proteins. More specifically, encoded within DNA is the exact order in which amino acids, selected at each step from 20 distinct varieties, should be strung together to form all of the organism's proteins. In general, each gene contains the instructions for one protein.

DNA itself is formed by the linear assembly of a large number of units called nucleotides. There are four different kinds of nucleotides, designated by the initials of their constituent bases: A (adenine), G (guanine), C (cytosine) and T (thymine). The sequence of nucleotides determines the information content of the molecules, as does the sequence of letters in words.

Within all cells, DNA molecules are formed from two strands of DNA that spiral around each other in a formation called a double helix. The two strands are held together by bonds between the bases of each strand. Bonding is quite specific, so that A always bonds with T, and G is always partnered with C on the opposite DNA strand. This complementarity is crucial for faithful replication of the DNA strands prior to cell division.

During DNA replication, the DNA strands are separated, and each strand serves as a template for the replication of its complementary strand. Wherever A appears on the template, a T is added to the nascent strand. Or, if T is on the template, then A is added to the growing strand. The same is true for G and C pairs. In the characteristic double-helical structure of DNA, the two strands carry the same information in complementary versions, as do the positive and negative of the same photograph. Upon replication, the
positive strand serves as template for the assembly of a new negative and the negative strand for that of a new positive, yielding two identical duplexes.

In order for DNA to fulfill its primary role of directing the construction of proteins, an intermediate molecule must be made. DNA does not directly participate in protein synthesis. That is the function of its very close chemical relative RNA.

Expression of DNA begins when an RNA molecule is constructed bearing the information for a gene contained on the DNA molecule. RNA, like DNA, is made up of nucleotides, but U (uracil) takes the place of T. Construction of the RNA molecule follows the same rules as DNA replication. The RNA copy, called a transcript, is a complementary copy of the DNA, with U (instead of T) inserted wherever A appears on the DNA template.

Most RNA transcripts, often after some modification, provide the information for the assembly of proteins. The sequence of nucleotides along the coding RNA, aptly called messenger RNA, specifies the sequence of amino acids in
the corresponding protein molecule—three successive nucleotides (called a codon) in the RNA specify one amino acid to be used in the protein. The process is known as translation, and the correspondences between codons
and amino acids define the genetic code.

Not all RNA molecules are messengers, however. Some of the RNAs participate in protein synthesis in other ways. Some actually make up the cellular machinery that constructs proteins. These are called ribosomal RNAs, and they may
include the actual catalyst that joins amino acids by peptide bonds, according to the work of Harry Noller at the University of California at Santa Cruz. Other RNAs, called transfer RNAs, ferry the appropriate amino acids to the
ribosome. As cell biology has progressed, even more functions for RNA have been discovered. For example, some RNA molecules participate in DNA replication, while others help process messenger RNAs.

Scientists considering the origins of biological molecules confronted a profound difficulty. In the modern cell, each of these molecules is dependent on the other two for either its manufacture or its function. DNA, for example, is merely a blueprint, and cannot perform a single catalytic function, nor can it replicate on its own. Proteins, on the other hand, perform most of the catalytic functions, but cannot be manufactured without the specifications encoded in DNA. One possible scenario for life's origins would have to include the possibility that two kinds of molecules evolved together, one informational and one catalytic. But this scenario is extremely complicated and highly unlikely.

The other possibility is that one of these molecules could itself perform multiple functions. Theorists considering this possibility started to look seriously at RNA. For one thing, the molecule's ubiquity in modern cells suggests that it is a very ancient molecule. It also appears to be highly adaptable, participating in all of the processes relating to information processing within the cell. For a while, the only thing RNA did not seem capable of doing was catalyzing chemical reactions.

That view changed when in the late 1970s, Sydney Altman at Yale University and Thomas Cech at the University of Colorado at Boulder independently discovered RNA molecules that in fact could catalytically excise portions of themselves or of other RNA molecules. The chicken-or-egg conundrum of the origin of life seemed to fall away. It now appeared theoretically possible that an RNA molecule could have existed that naturally contained the sequence information for its reproduction through reciprocal base pairing and could also catalyze the synthesis of more like RNA strands.

In 1986, Harvard chemist Walter Gilbert coined the term "RNA world" to designate a hypothetical stage in the development of life in which "RNA molecules and cofactors [were] a sufficient set of enzymes to carry out all the chemical reactions necessary for the first cellular structures." Today it is almost a matter of dogma that the evolution of life did include a phase where RNA was the predominant biological macromolecule.

Origin and Evolution of the RNA World

As certain as many people are that the RNA world was a crucial phase in life's evolution, it cannot have been the first. Some form of abiotic chemistry must have existed before RNA came on the scene. For the purpose of this
discussion, I shall call that earlier phase "protometabolism" to designate the set of unknown chemical reactions that generated the RNA world and sustained it throughout its existence (as opposed to metabolism--the set of reactions, catalyzed by protein enzymes, that support all living organisms today). By definition, protometabolism (which could have developed
with time) was in charge until metabolism took over. Several stages may be distinguished in this transition.

In the first stage, a pathway had to develop that took raw organic material and turned it into RNA. The first building blocks of life had to be converted into the constituents of nucleotides, from which the nucleotides themselves had to be formed. From there, the nucleotides had to be strung together to produce the first RNA molecules. Efforts to reproduce these events in the laboratory have been only partly successful so far, which is understandable in view of the complexity of the chemistry involved. On the other hand, it is also surprising since these must have been sturdy reactions to sustain the RNA world for a long time. Contrary to what is sometimes intimated, the idea of a few RNA molecules coming together by some chance combination
of circumstances and henceforth being reproduced and amplified by replication simply is not tenable. There could be no replication without a robust chemical underpinning continuing to provide the necessary materials and energy.

The development of RNA replication must have been the second stage in the evolution of the RNA world. The problem is not as simple as might appear at first glance. Attempts at engineering--with considerably more foresight and technical support than the prebiotic world could have enjoyed--an RNA molecule capable of catalyzing RNA replication have failed so far.

With the advent of RNA replication, Darwinian evolution was possible for the first time. Because of the inevitable copying mistakes, a number of variants of the original template molecules were formed. Some of these variants
were replicated faster than others or proved more stable, thereby progressively crowding out less advantaged molecules. Eventually, a single molecular species, combining replicatability and stability in optimal fashion under prevailing conditions, became dominant. This, at the molecular level, is exactly the mechanism postulated by Darwin for the evolution of organisms: fortuitous variation, competition, selection and amplification of the fittest entity. The scenario is not just a theoretical construct. It has been reenacted many times in the laboratory with the help of a viral replicating enzyme,
first in 1967 by the late American biochemist Sol Spiegelman of Columbia University.

An intriguing possibility is that replication was itself a product of molecular selection. It seems very unlikely that protometabolism produced just the four bases found in RNA, A, U, G and C, ready by some remarkable coincidence to engage in pairing and allow replication. Chemistry does not have this kind of foresight. In all likelihood, the four bases arose together with a number of other substances similarly constructed of one or more rings containing carbon and nitrogen. According to the present inventory, such substances could have included other members of the purine family (which includes A and G), pyrimidines (which include U, T and C), nicotinamide and flavin, both of which actually engage in nucleotide-like combinations, and pterines, among other compounds. The first nucleic acid-like molecules probably contained an assortment of these compounds. Molecules rich in A, U, G and C then were progressively selected and amplified, once some rudimentary
template-dependent synthetic mechanism allowing base pairing arose. RNA, as it exists today, may thus have been the first product of molecular selection.

A third stage in the evolution of the RNA world was the development of RNA-dependent protein synthesis. Most likely, the chemical machinery appeared first, as yet uninformed by genetic messages, as a result of interactions among certain RNA molecules, the precursors of future transfer, ribosomal and messenger RNAs, and amino acids. Selection of the RNA molecules involved could conceivably
be explained on the basis of molecular advantages, as just outlined. But for further evolution to take place, something more was needed. RNA molecules no longer had to be selected solely on the basis of what they were,
but of what they did; that is, exerting some catalytic activity, most prominently making proteins. This implies that RNA molecules capable of participating in protein synthesis enjoyed a selective advantage, not
because they were themselves easier to replicate or more stable, but because the proteins they were making favored their replication by some kind of indirect feedback loop.

This stage signals the limit of what could have happened in an unstructured soup. To evolve further, the system had to be partitioned into a large number of competing primitive cells, or protocells, capable of growing and of multiplying
by division. This partitioning could have happened earlier. Nobody knows. But it could not have happened later. This condition implies that protometabolism also produced the materials needed for the assembly of the membranes surrounding the protocells. In today's world, these materials are complex proteins and fatty lipid molecules. They were probably simpler in the RNA world, though more elaborate than the undifferentiated "goo" or "scum" that is sometimes suggested.

Once the chemical machinery for protein synthesis was installed, information could enter the system, via interactions among certain RNA components of the machinery—the future messenger RNAs—and other, amino acid-carrying RNA molecules—the future transfer RNAs. Translation and the genetic code progressively developed concurrently during this stage, which presumably was driven by Darwinian competition among protocells endowed with different variants of the RNA molecules involved. Any RNA mutation that made the structures of useful proteins more closely dependent on the structures of replicatable RNAs, thereby increasing the replicatability of the useful proteins themselves, conferred some evolutionary advantage on the protocell concerned, which was allowed to compete more effectively for available resources and to grow and multiply faster than the others.

The RNA world entered the last stage in its evolution when translation had become sufficiently accurate to unambiguously link the sequences of individual proteins with the sequences of individual RNA genes. This is the situation that exists today (with DNA carrying the primary genetic information), except that present-day systems are enormously more accurate and elaborate than the first systems must have been. Most likely, the first RNA genes were very short, no longer than 70 to 100 nucleotides (the modern gene runs several thousand nucleotides), with the corresponding proteins (more like protein fragments, called peptides) containing no more than 20 to 30 amino acids.

It is during this stage that protein enzymes must have made their first appearance, emerging one by one as a result of some RNA gene mutation and endowing the mutant protocell with the ability to carry out a new chemical reaction or to improve an existing reaction. The improvements would enable the protocell to grow and multiply more efficiently than other protocells in which the mutations had not appeared. This type of Darwinian selection must have taken place a great many times in succession to allow enzyme-dependent
metabolism to progressively replace protometabolism.

The appearance of DNA signaled a further refinement in the cell's information-processing system, although the date of this development cannot be fixed precisely. It is not even clear whether DNA appeared during the RNA world or later.
Certainly, as the genetic systems became more complex, there were greater advantages to storing the genetic information in a separate molecule. The chemical mutations required to derive DNA from RNA are fairly trivial. And
it is conceivable that an RNA-replicating enzyme could have been co-opted to transfer information from RNA to DNA. If this happened during the RNA world, it probably did so near the end, after most of the RNA-dependent machineries had been installed.

What can we conclude from this scenario, which, though purely hypothetical, depicts in logical succession the events that must have taken place if we accept the RNA-world hypothesis? And what, if anything, can we infer about
the protometabolism that must have preceded it? I can see three properties.

First, protometabolism involved a stable set of reactions capable not only of generating the RNA world, but also of sustaining it for the obviously long time it took for the development of RNA replication, protein synthesis and translation, as well as the inauguration of enzymes and metabolism.

Second, protometabolism involved a complex set of reactions capable of building RNA molecules and their constituents, proteins, membrane components and possibly a variety of coenzymes, often mentioned as parts of the catalytic armamentarium of the RNA world.

Finally, protometabolism must have been congruent with present-day metabolism; that is, it must have followed pathways similar to those of present-day metabolism, even if it did not use exactly the same materials or reactions. Many abiotic-chemistry experts disagree with this view, which, however, I see as enforced by the sequential manner in which the enzyme catalysts of metabolism must have arisen and been adopted. In order to be useful and confer a selective advantage to the mutant protocell involved, each new enzyme must have found one or more substances on which to act and an outlet for its product or products. In other words, the reaction it catalyzed must have fitted into the protometabolic network. To be sure, as more enzymes were added and started to build their own network, new pathways could have developed, but only as extensions of what was initially a congruent network.

The Thioester World

It may well be, then, that clues to the nature of that early protometabolism exist within modern metabolism. Several proposals of this kind have been made. Mine centers around the bond between sulfur and a carbon-containing entity called an acyl group, which yields a compound called a thioester. I view the thioester bond as primeval in the development of life. Let me first briefly state my reasons.

A thioester forms when a thiol (whose general form is written as an organic group, R, bonded with sulfur and hydrogen, hence R-SH) joins with a carboxylic acid (R'-COOH). A molecule of water (H2O) is released in the process, and what remains is a thioester: R-S-CO-R'. The appeal in this bond is that, first, its ingredients are likely components of the prebiotic soup. Amino acids and other carboxylic acids are the most conspicuous substances found both in Miller's flasks and in meteorites. On the other hand, thiols may be expected to arise readily in the kind of volcanic setting, rich in hydrogen sulfide (H2S), likely to have been found on the prebiotic earth. Joining these constituents into thioesters would have required energy. There are several possible mechanisms for this, which I shall address later. For the time being, let us assume thioesters were present. What could they have done?

The thioester bond is what biochemists call a high-energy bond, equivalent to the phosphate bonds in adenosine triphosphate (ATP), which is the main supplier of energy in all living organisms. It consists of adenosine monophosphate (AMP)--actually one of the four nucleotides of which RNA is made--to which two phosphate groups are attached. Splitting either of these two phosphate bonds in ATP generates energy, which fuels the vast majority of biological energy-requiring phenomena. In turn, ATP must be regenerated for work to continue.

It is revealing that thioesters are obligatory intermediates in several key processes in which ATP is either used or regenerated. Thioesters are involved in the synthesis of all esters, including those found in complex lipids. They also participate in the synthesis of a number of other cellular components, including peptides, fatty acids, sterols, terpenes, porphyrins and others. In addition, thioesters are formed as key intermediates in several particularly ancient processes that result in the assembly of ATP. In both these instances, the thioester is closer than ATP to the process that uses or yields energy. In other words, thioesters could have actually played the role of ATP in a thioester world initially devoid of ATP. Eventually, their thioesters could have served to usher in ATP through its ability to support the formation of bonds between phosphate groups.

Among the substances that form from thioesters in present-day organisms are a number of bacterial peptides made of as many as 10 or more amino acids. This was discovered by the late German-American biochemist Fritz Lipmann, the "father of bioenergetics," toward the end of the 1960s. But even before that, Theodor Wieland of Germany had found in 1951 that peptides form spontaneously from the thioesters of amino acids in aqueous solution.

The same reaction could be expected to happen in a thioester world, where amino acids were present in the form of thioesters. Among the resulting peptides and analogous multi-unit macromolecules, which I like to call multimers to emphasize their chemical heterogeneity, a number of molecules could have been structurally and functionally similar to the small catalytic proteins that inaugurated metabolism. I therefore suggest that multimers derived from thioesters provided the first enzyme-like catalysts for protometabolism.

The thioester world thus represents a hypothetical early stage in the development of life that could have provided the energetic and catalytic framework of the protometabolic set of primitive chemical reactions that led from the first building blocks of life to the RNA world and subsequently sustained the RNA world until metabolism took over.

This hypothesis implies that thioesters could form spontaneously on the prebiotic earth. Assembly from thiols and acids could have occurred, although in very low yield, in a hot, acidic medium. They could also have formed in the absence of water, for example, in the atmosphere. Perhaps a more likely possibility is that thioesters formed, as they do in the present world, by reactions coupled to some energy-yielding process. The American chemist Arthur Weber, formerly of the Salk Institute, now at the NASA Ames Research Center in California, has described several simple mechanisms of this sort that could have operated under primitive-earth conditions.

So far, these ideas are highly speculative, being supported largely by the need for congruence between protometabolism and metabolism, by the key--and probably ancient--roles played by thioesters in present-day metabolism, and by the likely presence of thioesters on the prebiotic earth. But some experimental evidence has been obtained that supports the thioester-world model.

I have already mentioned the work of Wieland, Lipmann and Weber. Recently, highly suggestive evidence has come from the laboratory of Miller, where researchers have obtained under plausible prebiotic conditions the three molecules—cysteamine, b-alanine and pantoic acid—that make up a natural substance known as pantetheine. They have also observed the ready formation of this compound from its three building blocks under prebiotic conditions. It so happens that pantetheine is the most important biological thiol, a catalytic participant in a vast majority of the reactions involving thioester bonds.

A Cosmic Imperative

I have tried here to review some of the facts and ideas that are being considered to account for the early stages in the spontaneous emergence of life on earth. How much of the hypothetical mechanisms considered will stand the test of time is not known. But one affirmation can safely be made, regardless of the actual nature of the processes that generated life. These processes must have been highly deterministic. In other words, these processes were inevitable under the conditions that existed on the prebiotic earth. Furthermore, these processes are bound to occur similarly wherever and whenever similar conditions obtain. This must be so because the processes are chemical and are therefore ruled by the deterministic laws that govern chemical reactions and make them reproducible.

It also seems likely that life would arise anywhere similar conditions are found because many successive steps are involved. A single, freak, highly improbable event can conceivably happen. Many highly improbable events—drawing
a winning lottery number or the distribution of playing cards in a hand of bridge—happen all the time. But a string of improbable events—the same lottery number being drawn twice, or the same bridge hand being dealt twice
in a row—does not happen naturally.

All of which leads me to conclude that life is an obligatory manifestation of matter, bound to arise where conditions are appropriate. Unfortunately, available technology does not allow us to find out how many sites offer appropriate conditions in our galaxy, let alone in the universe. According to most experts who have considered the problem—notably, in relation with the Search for Extraterrestrial Intelligence project—there should be plenty of such sites, perhaps as many as one million per galaxy. If these experts are right, and if I am correct, there must be about as many foci of life in the universe. Life is a cosmic imperative. The universe is awash with
life.

Bibliography

de Duve, C. 1991. Blueprint for a Cell: The Nature and Origin of Life. Burlington, N.C.: Neil Patterson Publishers, Carolina Biological Supply Company.

de Duve, C. 1995. Vital Dust: Life as a Cosmic Imperative. New York:Basic Books.