Genome engineering (GE), an emerging discipline in which a DNA sequence is altered at a single position, has a wide variety of potential uses, such as the correction of gene sequences in patients suffering from genetic diseases, the modification or insertion of genes in plants, and the generation of unique cell lines for treatment of diseases such as cancer. GE requires the development of molecular tools that can search out and bind to one unique site within a complex genome while avoiding 'off-target' interactions across the remaining billions of DNA bases present in a cell’s nucleus. Using ALS Beamline 5.0.2, researchers have solved the structure of one such tool called a TAL protein, learning its mechanism of action for DNA recognition and binding.

Editing Genomes

This year, Nature Methods named targeted gene modification technologies, or genome editing, as the method of the year. There are currently three proteins that can be used to do this: zinc fingers, ‘meganucleases,” and TALs. TAL proteins are the most recently discovered, and have the ability to find one single DNA base among the billions contained in each cell’s genome, binding a long DNA target site with exceptional specificity. When coupled to a DNA-cutting endonuclease, the protein makes a single cut at that one site in the genome, which then leads to a disruption or a sequence modification at that site.

The challenge with using any of these proteins for genome editing is that they need to reprogrammed to recognize the DNA sequence of interest. TAL effectors appear to be very easy to reprogram because they have a very simple, repetitious protein sequence that corresponds to the DNA sequence of their target. The discovery of this very simple code that relates the TAL protein sequence to the target DNA sequence clarifies how scientists can most effectively manipulate TAL proteins, adding domains such as nuclease enzymes that can act on specific DNA targets.

Site-specific DNA cleaving enzymes (endonucleases) can induce gene modifications at individual target sites. So far, three types of gene-targeting endonucleases have been developed for GE: zinc finger nucleases (ZFNs), homing endonucleases (HEs, also called 'meganucleases'), and most recently, TAL effector nucleases (TALENs). These newest arrivals are derived from naturally-occurring, highly-specific DNA targeting proteins that are produced by bacterial plant pathogens, called Xanthomonads, that use TAL effectors coupled to an endonuclease (making a TALEN) to assume control of various plant genes on during an infection.

Side and top-down views of the structure of the PthXo1 DNA binding region in complex with its target site.

The structure of TALEN PthXo1 bound to its DNA target has been solved by researchers from the Fred Hutchinson Cancer Research Center and Iowa State University. Considered alongside the structure of a related TALEN (dHax3), which was solved by a separate group, the two structures offer a solid basis for understanding the mechanism of action of TALENs and provide crucial details for the creation of the next generation of TALENs for GE.

TAL effectors possess extraordinary organization in which a series of almost-perfect repeats of 33 to 34 amino acids (called 'TAL repeats') in the middle of the protein are thought to each bind to a series of DNA bases across the protein's target site. The TAL repeats are almost identical to one another, the exception being two amino acids called the repeat variable diresidues (RVDs), which have been found to dictate exactly which of the four DNA bases (A, G, C, or T) are recognized by each protein repeat. This simple code for DNA recognition by a TAL protein was reported in late 2009, but the structural basis for DNA recognition, as well as the manner by which they bind their long DNA targets, was a complete mystery.

This mystery was solved using x-ray diffraction data collected at ALS Beamline 5.0.2 to determine the structure of PthXo1 bound to its target DNA sequence. The collection of x-ray data at the ALS was accompanied by a high-throughput computational structure prediction strategy to determine the structure, enabling a hypothesis for the mechanism of action by which TAL proteins recognize and bind long DNA targets while devoting only two amino acids to each DNA base.

The interaction of a TAL repeat that harbors an 'HD' RVD sequence with a cytosine base from its DNA target.

The structure demonstrates that the TAL repeats self-associate to form a right-handed superhelix that wraps around the turns of a largely-unperturbed DNA double helix. The first amino acid in each RVD points away from the DNA to form a stabilizing contact with the preceding TAL protein backbone, while the second residue makes critical contacts to a single base on one DNA strand. The PthXo1 structure contains seven RVD types and two non-canonical protein-DNA base pairings, thereby illustrating the basis of TALEN DNA recognition. Finally, degenerate N-terminal TAL repeats interact with the beginning of the DNA target site and appear serve as the 'nucleation site' for binding.

Research funding: Grants for this research funded by the National Institutes of Health and the National Science Foundation. Operation of the ALS is supported by the U.S. Department of Energy, Office of Basic Energy Sciences.