The path to sequencing nucleic acids

New research frontiers

Although Sanger did not progress very far in understanding the function of insulin through sequencing, his sequencing methods generated great excitement among x-ray crystallographers investigating the three-dimensional structure of deoxyribonucleic acid (DNA), a nucleic acid. One of the first people inspired by Sanger's work was the British physicist and crystallographer Francis Crick, who in 1953 together with James Watson, an American biologist, put forward what was to become a famous model for the molecular structure of DNA based on x-ray diffraction images produced by Rosalind Franklin and Maurice Wilkins. Crick was located a short distance from Sanger, at the Medical Research Council (MRC) Unit for the Study of the Molecular Structure of Biological Systems based in the Cavendish Laboratory of Cambridge University’s Physics Department

DNA had first been discovered in the late nineteenth century, yet it remained little studied for many decades. In part this was because proteins, rather than DNA, were considered to hold the genetic blueprint for organisms. As Sanger admitted in 1997, when he had started working in the Department of Biochemistry in the 1940s he had 'thought of DNA as an inert substance'. Indeed, he continued, 'the notion' that DNA contained 'all the information for making a complete organism would have been thought of as science fiction.' (Garcia-Sancho, 2006).

Attitudes to DNA began to change in the wake of some experiments on pneumococcal bacteria carried out by Oswald Avery, Colin MacLeod and Maclyn McCarty in 1944. Their findings established that DNA could transform the properties of cells. As a result a number of researchers began investigating the structure of DNA hoping that this would reveal how the molecule worked.

The model put forward by Crick and Watson in 1953 showed that DNA had two strands made up of chemical sub-units known as nucleotides. These two strands coiled around each other, linked together by hydrogen bonds, in a spiral configuration called a double helix. Each strand contained four complementary nucleotides: adenine (A), cytosine (C), guanine (G) and thymine (T). The two strands were oriented in opposite directions so that adenine always joined thymines (A T) and cytosines were linked with guanines (C G). It was this structure, they argued, which enabled each strand to reconstruct the other and facilitated the passing on of hereditary information from parent to offspring (Watson, Crick, 1953).

Pencil sketch of the DNA double helix by Crick. It shows a right-handed helix and the nucleotides of the two anti-parallel strands, 1953, Crick's notebook. Credit: Wellcome Library, file PP/CRI/H/1/16.

A more recent representation of the structure of DNA showing how its nucleotides are arranged. Credit: U.S. National Library of Medicine.

Following their elucidation of the structure of DNA, Crick and Watson began to investigate how DNA directed the formation of proteins within a cell. This they saw as fundamental to understanding how DNA dictated metabolism and other functional processes (Garcia-Sancho, 2012). On starting to deliberate the question, in October 1954 Crick attended a series of lectures given by Sanger about sequencing insulin. Inspired by Sanger's results with insulin, Crick began to develop a theory from the mid-1950s which argued that the arrangement of nucleotides in DNA determined the sequence of amino acids in proteins and that this in turn regulated how a protein folded into its final shape; it was this shape which determined the function of a protein. He further hypothesised that an intermediary molecule helped the DNA to specify the sequence of the amino acids in a protein (Crick, 1958).

Sequencing sickle-cell haemoglobin

Sanger's work on insulin not only helped shaped Crick's sequence hypothesis, but also provided an important experimental approach to test it. Before proceeding any further, however, Crick needed to find a way to demonstrate how a single mutant gene could alter the sequence of amino acids in the protein it coded for. He soon latched on to the idea of doing this by examining how an inherited genetic defect affected the sequence of amino acids in a protein (de Chadarevian, 1996).

Vernon Ingram, pictured here, fled to England as a refugee from Nazi Germany. He joined the MRC Unit in 1952. Credit: Ingram Family; Weatherall, 2010

Crick quickly turned to looking at sickle-cell anaemia, a common genetic blood disorder. Just a few years before, in 1949, William Castle, an American haematologist, had spotted that haemoglobin, a protein molecule that exists in red blood cells and delivers oxygen to cells in the body, was shaped like a sickle in blood taken from patients with sickle-cell anaemia. This, he believed, might be due to the protein being deprived of oxygen. The haemoglobin also differed from normal adult haemoglobin when subjected to electrophoretic tests and displayed unusual properties when investigated under polarised light. Subsequent research by Jim Neel, an American geneticist, suggested the abnormal haemoglobin was linked to an inherited genetic defect (Watson, Crick, 1953).

In 1949 Linus Pauling and other chemists suggested that the difference between normal haemoglobin and that taken from sickle-cell patients could be down to a difference in their number of amino acids. It was unknown, however, how many amino acids were involved. Was it just one amino acid or more? Many were sceptical that the alteration of just one amino acid, out of approximately 300, could produce a molecule as lethal as sickle-cell haemoglobin. Techniques for sequencing the protein's amino acids, however, were not sufficiently sophisticated to settle the matter (Crick, 1958; Ingram, 2004).

Crick did not have to go very far to start work on the issue. Haemoglobin was already being intensively studied by others within the MRC Unit. This research was aided by the fact that haemoglobin was easy to obtain and was one of the easiest proteins to prepare in pure form. Fortuitously, the MRC Unit also had plentiful supplies of several specimens of sickle-cell haemoglobin, which had been left behind by a former visitor. This was the virologist, Tony Allison, who had spotted a correlation between the sickle-cell genetic trait and resistance to malaria while working in Kenya in the early 1950s. Among those looking at sickle-cell haemoglobin in the MRC Unit were its director, Max Perutz, who was studying its structure with x-ray crystallography, assisted by Vernon Ingram, a German American postdoctoral protein biochemist in the laboratory (de Chadarevian, 1996; Ingram, 2004; Allison, 2004).

Perutz and Crick soon assigned Ingram the task of determining the difference in the amino acid composition between sickle-cell and normal haemoglobin. With haemoglobin being ten times larger than either of the two insulin chains sequenced by Sanger, the project proposed by Perutz and Crick presented a significant challenge. Perutz and Crick recommended that Ingram deploy Sanger's latest fingerprinting techniques which he was already refining to characterise some other large protein fragments (Ingram, 2004).

Rather than sequencing the whole haemoglobin protein which, as Ingram recalled, was 'a Herculian task', he decided to cleave it into manageable peptide fragments, using a pancreatic enzyme called trypsin. With this he obtained 26 peptide fragments. Ingram, with some help from Sanger, then began separating the fragments, using paper electrophoresis and chromatography.. His ultimate goal was to find a peptide fragment with a demonstrable electrophoretic difference. This involved 'characterizing each peptide by its position on a two-dimensional map, a sheet of “blotting paper”'(Ingram, 2004). After many hours of painstaking work he determined that the difference between normal and sickle-cell haemoglobin was down to the replacement of 'only one of nearly 300 amino acids' (Ingram, 1957). Ingram's finding was a significant breakthrough. Not only did it confirm Crick's sequence hypothesis, but it was also the first time that anyone had managed to break the genetic code, the process by which cells translate information stored in DNA into proteins (de Chadarevian, 1996).

This shows Ingram's sequencing results from normal haemoglobulin (labelled A on the left) and sickle-cell haemoglobin (labelled S on the right). At the bottom are tracings of the top chromatograms. Dotted lines indicate peptides that only became visible after heating the chromatogram. Credit: Ingram, 1958, figure 3.

The Laboratory of Molecular Biology (LMB)

Sanger's informal collaboration with Ingram took place at a time when Crick and others in the MRC Unit were strongly urging Sanger to consider moving from his base in the Dunn Institute Department of Biochemistry to their Unit. As early as 1955 John Kendrew, a budding protein crystallographer, invited Sanger to join the MRC Unit in the Cavendish Laboratory. He was eager to have Sanger on board to help in his project to unravel the amino acid sequence of myoglobin, a protein found in heart tissue and other muscles. He believed Sanger's sequencing method could provide an important tool to work out the protein's three-dimensional structure which he believed could not be determined with x-ray crystallography alone. In addition to Kendrew, Sydney Brenner, a South African biologist, and Crick were keen to have Sanger join them in their work on sequencing some proteins they had produced with some mutant viruses. They hoped to demonstrate that the order of changes seen in a sequence of a mutated gene corresponded with that of the amino acids in the protein it coded for (de Chadarevian, 1996).

Despite such overtures, Sanger was reluctant to move. In part this was because he felt that the group's strong orientation towards physics and x-ray crystallography, was far removed from his own interests. The situation changed in 1957, however, when Cambridge University began negotiations to build a new centre designed to accommodate the Cavendish group alongside biologists. This plan was greatly welcomed by the group in the MRC Unit because up to then they had been working in a cramped and overcrowded prefabricated hut. The new centre was to be called the Laboratory of Molecular Biology (LMB). Its creation signalled a new alliance that had begun to emerge between protein crystallographers, molecular geneticists and protein chemists all working towards the development of a new discipline – molecular biology. Given the institution's strong biological orientation Sanger soon dropped his reservations about joining the Cavendish team (de Chadarevian, 1996).

The creation of the LMB came at an opportune moment for Sanger. By his own admission he had been rather unproductive in the years immediately following his sequencing of insulin. While he had improved his sequencing techniques by specifically incorporating radioactive labelling, he was frustrated by his lack of progress in comparing the sequences of insulin from different species. Importantly, he was no closer to understanding how the protein's sequence related to its function. The LMB offered him far more space and facilities than he had in the Biochemistry Department as well as a chance to escape the department's pressure on its staff to teach (Sanger, 1992; Garcia-Sancho, 2010).

"The Hut" building, located in front of the Austin Wing of the Cavendish Laboratory of Physics, 1962. Ingram sequenced the structure of sickle-cell haemoglobulin in this hut. Credit: LMB.

A new horizon: nucleic acids

Sanger moved to the LMB in 1962, soon after it opened. The move provided fresh avenues to explore the process of protein synthesis that he had been studying using insulin. Importantly, he could now call on the expertise of the former Cavendish researchers who were trying to unravel the genetic code. How DNA specified the structure of proteins lay at the heart of their research (Garcia-Sancho, 2010).

Prior to Sanger's move, Brenner and Crick organised two seminars designed to get Sanger and other members of his group up to speed on the latest findings on DNA and cell replication (Garcia-Sancho, 2010). By now scientists had begun to understand how it was that DNA, a nucleic acid that is enclosed in the nucleus of the cell, could make proteins outside of a cell's nucleus, in the cytoplasm, the fluid beyond the cell's nucleus. This process involves a number of mechanisms. The first is the ribosome which exists on the outside a cell's nucleus and is responsible for protein synthesis in cells. Ribosome had first been discovered in 1955 by George E Palade, a Rumanian-American cell biologist based at the Rockefeller Institute in New York. The second mechanism is ribonucleic acid (RNA), another nucleic acid. RNA is very similar to DNA in that it contains the same number of nucleotides, but it only has one strand. Two types of RNA had been discovered in 1956. The first was messenger RNA (mRNA), discovered by Elliot Volkin and Lazarus Astrachan at the Oak Ridge National Laboratory. Found inside a cell's nucleus, mRNA is responsible for carrying the genetic code to the ribosome to build a protein. The second was transfer RNA (tRNA), originally known as soluble RNA or S-RNA, found by Paul Zamecnik, Mahlon Hoagland and other colleagues at Massachusetts General Hospital attached to Harvard University. Located in the cell's cytoplasm, tRNA helps transfer specific amino acids from the cytoplasm to the ribosome where they are joined in a specific order to make a protein.

Diagram showing the relationship between DNA and mRNA in protein synthesis. Adapted from illustrations from R. Hesketh, The War on Cancer, New York (2012) p.65, and K. Spencer Joyce, Woods Hole Oceanographic Institution.

During his early years of sequencing, Sanger had relatively little interest in nucleic acids. Indeed, when attending conferences on proteins and nucleic acids he would eagerly await the ending of presentations on nucleic acids so that the discussion could turn to proteins. His attitude began to change, however, as a result of his interactions with the Cavendish researchers. As he put it, 'with people like Francis Crick around, it was difficult to ignore nucleic acids or fail to realise the importance of sequencing them.' (Sanger, 1988).

Sequencing nucleic acids initially seemed a much more formidable challenge to Sanger than proteins. One of the major obstacles arose because there was no suitable pure small nucleic acid available to experiment on. Another issue was the composition of nucleic acids. As nucleic acids were made up of just four sub-units, nucleotides, he was concerned that he might struggle to break them down into sufficiently large fragments with enough of an overlap with other fragments. Such overlaps had been crucial in his determining the sequence in insulin. The fact that nucleic acids had just four components, however, meant that he might find the final analysis easier than when he had analysed the 20 amino acids in insulin (Sanger, 1988).

Sequencing RNA

Sanger's notebooks on insulin, written during his final two years at the Dunn Institute, indicate that he had already begun to wrestle with nucleic acids well before his move to the LMB. He did this work while trying to refine his sequencing techniques, which included experimenting with ways to label proteins and enzymes with a radioactive phosphorous isotope known as 32P. Progress, however, was slow. He soon realised that one solution might be to focus on tRNA. Zamecnik and Hoagland's findings suggested it contained just 60 nucleotides, which seemed a manageable number to sequence. (Finch 2008).

Once he joined the LMB, Sanger put all his energy into sequencing RNA. He was aided in this by John Smith, another researcher who was also hard at work on the nucleic acid in the laboratory and was happy to teach Sanger some of his skills in fractionating nucleotides. Sanger also had the help of Leslie Smith, one of his insulin collaborators, who had just spent a sabbatical year in Zamecnik's laboratory. In 1963 Sanger's work on RNA was given an extra boost when a doctoral student George Brownlee arrived, and opted to work on nucleic acid under his supervision (Sanger, 1988, Finch 2008).

Much of Sanger's team's early effort was directed towards purifying tRNA from different species, including yeast and Escherichia coli. Most of this work was done by Smith, but progress was slow so Sanger soon focused on ways to develop a rapid and simple fraction technique for sequencing the nucleotides in RNA. The question was which method to explore.

Many scientists then working on sequencing nucleic acids deployed methods similar to those used to sequence proteins. These used an enzyme to break up the nucleic acid into partial fragments and then to separate the nucleotides in the fragments. One of the most popular enzymes used for this work was ribonuclease T1, an enzyme discovered in 1957 by Kimiko Sato-Asano and Fujio Egami based at Nagoya University. The advantage of this enzyme was that it cut nucleic acids at a very specific point on its nucleotide chain, where guanine was present (Sanger, Dowding, 1996).

Once the nucleic acid was partially digested, it was then commonly fractionated on ion exchange columns. The use of columns had largely replaced the two-dimensional paper fraction technique Sanger had used for insulin. Sanger, however, saw the use of columns as somewhat laborious. He much preferred a paper-based method. As he put it, 'I still had a preference for paper techniques, especially for preliminary experiments, as they were quicker and, in general, gave more information – though of a qualitative rather than [a] quantitative nature’. Some scientists were already experimenting with a paper-based method for fractionating single nucleotides with the help of ultraviolet light. Sanger, however, did not find this method very sensitive and remarked that he 'found it impossible to see any distinct spots from partial digests of RNA' (Sanger, Dowding, 1996).

One way Sanger thought the process might be improved would be by attaching a radioactive label to RNA. This label would act as a probe to detect the nucleic acid or any fragments derived from it. He believed this would provide a more rapid approach than the paper-based methods which relied on the detection of nucleic acids based on their absorption of ultraviolet light. Sanger's reasoning was based on some experiments he had already conducted on enzymes and proteins with the radioactive phosphorous isotope 32P. He believed that the same label had the potential to incorporate well into RNA, because every one of the nucleotides in RNA contained phosphorous atoms. Furthermore, 32P could be easily detected in autoradiographs (Sanger et al, 1965; Sanger, Dowding, 1996; Sanger, 1988).

In 1965 Sanger was joined by another collaborator, Kjeld Marcker, a Danish postdoctoral researcher, and together they set about testing his radioactive sequencing approach. Their first efforts were directed towards tagging RNA with 32P attached to an amino acid, methionine. Results from this research proved confusing, however, because methionine kept appearing as an extra spot in the paper electrophoresis read-outs. Further investigation revealed the amino acid had been potentially modified during the experiment. Following this, Sanger worked out a way to synthesize radioactive RNA by adding 32P inorganic phosphate to some Escherichia coli and yeast that were growing in culture (Sanger, 1992; Finch 2008).

In tandem with the radioactive labelling work, Sanger began exploring different separation techniques to facilitate sequencing. By early 1965, he and Brownlee, together with Bart Barrell, a laboratory technician who had joined Sanger the previous year he, had successfully devised a two-dimensional partition method which used ionophoresis on cellulose acetate followed by ionophoresis on ion exchange paper (Sanger, Dowding, 1996).

Looking back on his working life, Sanger commented that he could not remember many moments of particular elation, but one that stuck out in his memory came from the time he had worked on the two-dimensional partition technique. He recalled 'one occasion when Bart Barrell, who usually developed the day's autoradiographs first thing in the morning, came into my lab brandishing a beautiful sheet of film with clear, round, well-separated spots on it. This was certainly exciting after the streaky, unresolved pictures we had been getting before'. The great advantage of the method was that it was quick. Furthermore, 'it avoided a good deal of final analysis' (Sanger, 1988). An outline of the technique was published in the Journal of Molecular Biology in September 1965 (Sanger et al, 1965).

While the new system appeared to be robust for sequencing, Sanger's team was unable to test it on any tRNA, his original target, because they had not yet managed to purify the nucleic acid. The best they could manage was a test run with a sample of ribosomal RNA which was easy to prepare in radioactive form. While too large a molecule to provide any useful sequencing information, the ribosomal RNA confirmed the utility of the system (Sanger, 1988).

In the end the first RNA to be sequenced was alanine tRNA purified from yeast, achieved by Robert Holley and colleagues at Cornell University. This was the first determination of the nucleotide sequence of a nucleic acid and the culmination of seven years' hard work. Three of those years had been spent in purifying the nucleic acid by using a countercurrent distribution system. The following four years had been devoted to sequencing, using a similar procedure to that adopted by Sanger for insulin, whereby the nucleic acid was first cut up into 16 small fragments with enzymes and then assembling them together like a puzzle. The RNA was found to contain 77 nucleotides (Holley et al, 1965; Kresge, Simoni, Hill, 2005).

Sanger in his laboratory, 1969. Credit LMB.

Two years after Holley's success, Sanger's team announced the successful sequencing of another short RNA, 5S ribosomal RNA from Escherichia coli. Much of the work had been carried out by Brownlee as part of his doctorate. The RNA contained 120 nucleotides. It was substantially larger than any tRNA sequenced so far, which ranged from 77 to 85 nucleotides (Brownlee et al, 1967). The sequencing of 5S ribosomal RNA was greatly aided by the 32P label. Importantly, it also showed up well-defined spots on the autoradiographs, making it possible to identify individual nucleotides on the basis of their position and to work out their sequence order directly. Thereafter, 32P became a standard tool for sequencing RNA, until it was displaced by fluorescent labelling (Sanger, 1992; Sanger, Dowding, 1996).

In addition to proving Sanger's radioactive sequencing approach effective, the results from 5S ribosomal RNA demonstrated the power of another fractionation system called homochromatography. Sanger devised this method to obtain longer fragments when separating products partially digested by T1 ribonuclease from one another. The method was a type of displacement chromatography. It rested on the displacement of oligonucleotides (small groups of nucleotides) fixed on some ion-exchange paper by some unlabelled oligonucleotides (Brownlee, 2015).

Breaking the genetic code

Soon after 5S ribosomal RNA was sequenced, a number of researchers in the LMB managed to sequence additional tRNAs using the rapid fractionation methods devised by Sanger and his team. Based on this work it seemed that the next logical step would be to compare the sequence of a messenger RNA with that of a protein it coded for. Such work, would provide a further key to breaking the genetic code. RNA taken from bacteriophages, viruses that infect and replicate in bacteria, appeared a promising source for such an experiment. Sequencing RNA from a bacteriophage posed a significant challenge, however. It was significantly larger than any other RNA sequenced so far. Indeed, as Sanger remembered, 'one didn't think it would be possible to simply partially hydrolyse it and get fragments out because of its great size. You would think you would get a hopeless mixture of large fragments if you did a partial T1 digest' (Sanger, 1992).

Jerry Adams, who joined Sanger's laboratory in 1967 as a postdoctoral researcher, decided to try his hand at partially digesting RNA sourced from a R17 bacteriophage with the T1 enzyme. To everyone's surprise he managed to get a number of fragments with up to 50 nucleotides which separated well on acrylamide gels. One of the fragments appeared quite pure, making it a suitable candidate for sequencing. This was done by Sanger in collaboration with Adams, Peter Jeppesen and Bart Barrell. With the RNA turning out to have 3300 nucleotides this work was a major undertaking. It was, however, greatly aided by Sanger's method of homochromatography (Sanger, 1992; Finch 2008).

By chance, in the course of sequencing the RNA, the team found the fragment had a nucleotide sequence that corresponded with a sequence of amino acids already found in the coat protein of the R17 phage. This was very exciting, because as Sanger pointed out, it 'was the first time that a nucleotide sequence had been determined and shown to be related by the genetic code to a known amino acid sequence in a protein'. While the genetic code had already been broken by others, the work provided sound confirmation of the code (Sanger, 1992).

This comes from Sanger's RNA notebook. It records work to correlate the sequence of RNA from bacteriophage R17 with the corresponding amino acid sequence it codes for in the coat protein of the bacteriophage. Credit: Wellcome Library,file SA/BIO/P/30.

References

Allison, A C (2004) 'Two lessons from the interface of genetics and medicine', Genetics, 166: 1591-99. Back