Researchers make their own E. coli genome, compress its genetic code

2019-05-16

Ars Technica

Enlarge/ Like any other E. coli, but different.
2 with 2 posters participating
The genetic code is the basis for all life, allowing the information present in DNA to be translated into the proteins that perform most of a cell's functions. And yet it's... kind of a mess. Life typically uses a suite of about 20 amino acids, while the genetic code has 64 possible combinations. That mismatch means that redundancy is rampant, and a lot of species have evolved variations on what would otherwise be a universal genetic code.
So is the code itself significant, or is it something of a historic accident, locked in place by events in the distant evolutionary past? Answering that question hasn't been an option until recently, since individual codes appear in hundreds of thousands of places in the genomes of even the simplest organisms. But as our ability to make DNA has scaled up, it has become possible to synthesize entire genomes from scratch, allowing a wholesale rewrite of the genetic code.
Now, researchers are announcing that they have redone the genome of the bacteria E. coli to get rid of some of the genetic code's redundancy. The resulting bacteria grow somewhat more slowly than a normal strain but were otherwise difficult to distinguish from their non-synthetic peers.
Codes and redundancy
The genetic code is spelled out in sets of three DNA bases. Each of the three positions can hold any of the four bases, meaning there are 4 x 4 x 4 possible combinations, or 64. By contrast, there are only 20 amino acids, while at least one of the remaining codons has to be used to tell the cell to stop translating the code. That leaves a mismatch of 43 codes that aren't strictly needed. Cells use those extra codes as redundancy; instead of one stop code, most genomes use three. Eighteen of the 20 amino acids are coded by more than a set of three bases; two have as many as six possible codes.
Is this redundancy useful? The answer is "sometimes." For example, many DNA sequences do double-duty, encoding both a protein and regulatory information that controls gene activity or allowing specific RNA structures to form. The flexibility of redundancy makes it easier for one sequence to serve two purposes. The redundancy can also allow fine-tuning of gene activity, as some codes are translated into proteins more efficiently than others. These factors suggest that the genetic code's redundancy could have evolved to be essential for an organism.
Testing whether that is the case, however, is a bit of a nightmare. Even the most compact genomes have hundreds of genes (E. coli strains have between 4,000 and 5,500), and all of the individual codes can occur multiple times within each. Editing each of these is possible but would be phenomenally time-consuming.
So the researchers simply recoded things on a computer. Focusing on one of the amino acids that has multiple redundant codes, they tweaked sequences so that more than 18,000 individual uses of two of the codes were replaced by a redundant option. With the synthetic genome designed, it was just a matter of splitting it up into pieces that could be ordered from a DNA synthesizer.
This is easier than it sounds, according to one of the researchers involved (and regular Ars reader) Wolfgang Schmied. With a project like that, where you ask questions about the rules of the genetic code, "you have to at some point commit to ordering a genome worth of synthetic DNA," he told Ars, "which is a rather large financial commitment and not an easy button to press." Yet press it they did.
Some assembly required
Unfortunately, there's a big gap between what a DNA synthesis machine can output and the multi-million base-long genome. The group had to do an entire assembly process, stitching together small pieces into a large segment in one cell and then bringing that into a different cell that had an overlapping large segment. "Personally, my biggest surprise was really how well the assembly process worked," Schmied said. "The success rate at each stage was very high, meaning that we could do the majority of the work with standard bench techniques."
During the process, there were a couple of spots where the synthetic genome ended up with problemsin at least one case, this was where two essential genes overlapped. But the researchers were able to tweak their version to get around the problems that they identified. The final genome also had a handful of errors that popped up during the assembly process, but none of these altered the three base codes that were targeted.
In the end, it worked. Rather than using 61 of the 64 potential codes for amino acids, the new organismdubbed Syn61only used 59. The researchers were then able to delete the genes that normally allow E. coli to use the redirected codes. Normally, these genes are essential; in Syn61, they could be deleted without issue. That's not to say the Syn61 strain is fine; it grew more slowly than its normal peers. But this is probably the result of all the cases described earlier, where DNA sequences were performing more than one function. It's possible that, over time, the strain can evolve back to a normal growth rate.
Aside from answering questions about basic biology, the Syn61 strain may ultimately be useful. There are far more amino acids out there than the 20 life uses, and many of these have interesting chemical properties. To use them, however, we need spare genetic codes that can be redirected to the artificial amino acidsprecisely what this new work has provided.
Nature, 2019. DOI: 10.1038/s41586-019-1192-5 (About DOIs).