DNA loop formation, mediated by protein binding, plays abroad range ofroles in cellular function from gene regulation to genome compaction. While DNAflexibility has been well investigated,there has been controversyin assessing theflexibility of very small loops. We have engineered a pair of artificialcoiled-coilDNA looping proteins (LZD73 and LZD87), with minimal inherent flexibility, tobetter understand the nature of DNA behavior in loopsofless than 460

bp. Ringclosure experiments (DNA cyclization) were used to observe induced topologicalchanges inDNA upon binding to and looping around the engineered proteins. The

length of DNA required toform a

loop in our artificially rigid systemwas found to besubstantially longer thanloops formed with natural proteinsin vivo. This suggests theinherent flexibility of natural looping proteins plays a substantial role in stabilizingsmall loop formation. Additionally, by incrementally varying the binding site

separation between 435

bp and 458

bp, it was observed that the LZD proteins couldpredictablymanipulate the DNA topology. At the lengths evaluated, the distributionof topological products correlates to the helical repeat of the double helix (10.5

bp).The dependence on binding site periodicity isan unequivocal demonstration of DNAlooping andrepresents the first application of a rigid artificial protein in this capacity.By constructing these DNA looping proteins, we have created a platform foraddressing DNA flexibility in regards to DNA looping. Future applications for thistechnology include a vigorous study of the lower limits of DNA length during loopformation and the use of these proteins in assembling protein:DNA nanostructures.

Forwelcoming me into his lab and providing the necessary wisdom to guideme through this journey, I will be forever grateful to Dr. Jason Kahn, my thesisadvisor and mentor.Thank you for your vision and effort in making this project andmy professional development the success that it was.

far from ideal. For starters,DNA is a very long molecule, narrow in width, and has a fairly short helical repeat(10.5bp), meaning it is heavily twisted. In a closed circle, such as a genome, pullingapart the strands for replication or transcription places immediatestrain, in the formof over-twisting, on the remaining double stranded portion of the molecule.

The difficulties in separating the double helix over an entire genome werediscussed by Watson and Crick almost immediately after their groundbreakingannouncement of its structure(J. D. Watson & Crick, 1953a;1953b).The doublehelix, a consequence of the conjunction of asymmetrical building blocks, demands a

substantial amount of energy and protein regulation in maintaining the equipoise

between being genetically accessible and structurally compact.

Indeed, whileproteins possess a remarkable tendency to mutate their shape, function, and relativesize, DNA has remained nearly static in all physical aspects except for length. Asorganisms have grown in size and complexity over the eons, they have adapted totheir burgeoning genome not by improving its underlying structure but rather byincreasing and diversifying the proteins that organize and maintain it.

Indeed, while proteins possess a remarkable tendency to mutate their shape,function, and relative size, DNA has remained nearly static in all physical aspectsexcept for length. As organisms have grown in size and complexity over the eons,they have adapted to their burgeoning genome not by improving its underlyingstructure but rather by increasing anddiversifying the proteins that organize andmaintain it. From histones or H-NS proteins that compact it to topoisomerases andgyrases that balance its strain, DNA is a highly regulated polymerthat is ultimatelyunder the control of proteins. Without aresponsive and energetically demandingsystem to maintainthis spatial organization, or topology of DNA, life could neverhave developed into the complexity observed today.

The advent of modern sequencing technology is delivering a wealth of data onthecontent of genomes across scores of species. The explosion of available

4

information has the potential to shower benefits on our civilization from theidentification andelimination of genetic disorders to a unifiedtheory of evolution.But the path fromthe genetic code toliving organism is, like the molecule itself,

hardly linear. The networks of genes and intricate feedback systems required fordevelopment demand coordination that is only beginning to be understood. There is amarked disconnect between the two-dimensional nature of genetic sequence and thethree-dimensional life form to which it gives rise. Like all DNA, the human genomemeasures2 nm in width but has a length that is orders of magnitude greater (108forHomo

sapiens). That this moleculeserves its function while compacted to fitinside a6

µmnucleus, attests to thecomplexity of itsprotein-regulatedstructure andunderscores the need to comprehend the mechanisms behind its order. DNA

structure, itstopology, geometry, and geography, representthe foundation uponwhich genetic information is built, stored, and accessed. If we cannot observe,predict, and ultimately control the structure of DNA, the acquisition of its entiresequence will remain a feat of limited application.

1.2

DNA Topology: Maintaining Order Within a Cell

The helical repeat of DNA, a direct property of the twisting nature of thedouble helix, dictates that, when in an aqueousenvironment, the two strands willcross one another roughly once every 10.5

bp. A second type of crossover eventoccurs when two separate double helix strands make a close approachat a node. Thiselement of structure is referred to as writhe. As the molecule is compacted, theformation of these crossover

nodesbecomes increasingly common. Depending on theorientation of the crossover event, nodes may have either positive or negativequality.

5

As illustrated in

Figure1.2, the frequency and geometry of nodes result in thequantitative value of writhe. The amount of writhe reflects the degree ofsupercoiling, which is the underlying feature of DNA topology. This essentialcomponent of compaction was first described in the 1960’s while studying the twostructurally distinct forms of genetically identical polyoma virus DNA

(Vinograd,Lebowitz, Radloff, Watson, & Laipis, 1965). But if these two identical sequences ofDNA had different structural features, there must be a way to quantify the difference.

The means of quantifying the structural differenceslies in the number of timesthe two strands cross each other through both helical repeat (the twist component) and

Figure1.2

Plectonemic supercoiled DNA illustration. Each line represents double strandedDNA. The contribution of writhe in supercoilingis quantified by the formation of both (+)and (–) nodes leading to an increase or decrease in the linking number, respectively

through node formation (the writhe component). If two ends of a linear fragment ofDNA are joined together in a closed circle,

then the two strands of the double helix

6

are linked together by the number of times the strands cross, as per the helical repeat.This quantity must be an integer (as there are no partial crossovers in a closed circle)and represents the linking number of circular DNA lying in a plane. But fixing DNAto two dimensions is not an element of the real world. In fact, genomic DNA crossesover itself constantly in its natural environment. These crossover nodes are alsolinked in a closed circle of DNA and, as such, can be added to the number of helicalrepeat crossing events to provide anabsolute linking number (Lk) for any givenclosed circle of DNA. DNA nodes, however, can have eitherpositive ornegativevalues depending on the orientation ofthe cross over.

As illustrated inFigure1.2, a positivenode increasesthe overall Lk value,whileanegative change in writhe and an overall decrease in the linking number.Because the absolute value of Lk cannot change without breaking one or both strandsof DNA, the linking number is an excellent means of quantifying DNA topology.Asseen inFigure1.3, plasmid DNA withpopulations that differ in their linking numbers

can be easily resolved using agarosegel electrophoresis in the presence of anintercalatingagent such as chloroquine.That the linking number remains unchanged(ΔLk = 0) in agivenclosed circle of DNA, however,does not mean that the twist(Tw) and writhe (Wr) components remain static. The two elements can be readilyinter-converted accordingto the following formula:

Eq.1

for ΔLk = 0,ΔTw =-ΔWr

This ability to relieve torsional stress by converting it to writhe is essential butclearly insufficient for dealing with the topological strain that arises duringreplication. To accommodatethis systemic energetic barrier, the cell must employ a

7

means of changing the linking number such thatovertwisting caused by the strandseparation during replicationcan be relieved. If the strands could break then thechange in either or both the twistand the writhe would result in a change in thelinking number according to the following:

Eq.2

ΔLk =ΔTw + ΔWr

It was suggested in 1954, that cells may use an approach where one or bothstrands of the helix are broken so that torsional strain may be relieved throughuntwisting

(Delbrück, 1954). Nearly two decades would have to pass before thistheory could be validatedwhen, in 1971, an enzyme termed the ω-protein wasisolated fromE. coli

(Wang, 1971). This enzyme, subsequently renamed DNATopoisomerase I, possesses an ability to relax supercoiled DNA by nicking one strandand allowing it to rotate about the axis of the intact strand. Because this enzymefacilitated thebreaking of one of the strands, the linking number could be changed.

Figure1.3Supercoiled DNA depicting various degrees of supercoiling resolved on agarosegelwith chloroquine.To form a distribution of topoisomer products, plasmid DNA wasincubated with Topoisomerase I for an increasing amount of time (lanes 5,6,7).Thisgel ismeant toillustratehow individual topoisomer populations can be resolved.

8

This was a monumental achievement for the nascent field of DNA topology andrepresented the first of a large and complex class of topoisomerase enzymes.

1.3

Balancing Supercoiling with Topoisomerase

Though it is unsurprising that the topoisomerase class of enzymes exists, itisnonetheless fascinating to consider the many wayscells have evolved tomaintain the

topological balance throughout their genome. The immediate need for supercoiling isobvious, compaction, and nearly all cells maintain their genome as negativelysupercoiled DNA (left-handed nodes). This topological state is maintained by theATP-dependent enzyme DNA gyrase (Topoisomerase IIA) in bacterial and by histonewrapping in eukaryotes

(Camerini-Otero & Felsenfeld, 1977; Gellert, Mizuuchi,O'Dea, & Nash, 1976). But chromosomal condensation is far from the onlyapplication of this structural phenomenon. For example, transcription factor bindinghas been shown, in some cases, to be dependent on the degree of negativesupercoiling at the promoter site

(Lamond, 1985). Furthermore, the opening of atranscription bubble by RNA polymerase II requires a degree of local untwisting andcorresponding torsional strain that is compensated by the inherent negative writhe

(Choder & Aloni, 1988). Though a preponderance of organisms maintainhomeostasis with negatively supercoiled DNA, those living in extremely hightemperatures, such as members of theSulfolobusgenus, have evolved a reversegyrase, whose ATP-dependent activity introduces positive supercoiling

increasing the meltingtemperature to maintain genomic stability at very hightemperatures.

Theessential function and ubiquitous activity of topoisomerases has madethem viable targets forcytotoxic drugs. Because DNA gyrase and the closely relatedTopoisomerase IV are both unique to the bacterial kingdom, inhibitors specific totheir function,

such as fluoroquinolones like Cipro, have been put to use asbroadspectrum antibiotics

(Maxwell & Lawson, 2003). Work on inhibiting eukaryotictopoisomerases has led to clinical applications in anti-cancer trials, as Topoisomeraseactivity is essential for replication

(Hande, 1998). It is also possible that proteinengineering work with Topoisomerases may prove useful in the future of geneticmanipulation. One could see value in a Topoisomerase that possessed bindingspecificity that wouldlimit its function to a predeterminedlocation on the genome.In gene therapy,a targeted sequence may be histone-bound and inaccessible. A

reverse gyrase enzyme that could target the region and induce positive supercoilingcould aidin displacing the histones and allowing access to the area of interest. If weare to attain the ability to access and control genetic material on a level that stretchesacross the entire genome, topoisomerases may well play a pivotal role. However, forall their influence on DNA topology, the topoisomerase enzymeslack sequencespecificityand thus act globally. In an eventwhere topology must be controlled at alocal level,

such as the regulation of a specific gene, nature has adapted a secondmethod of topological control, the DNA looping proteins. Protein-mediated loopformation provides a meansof locking DNA in position. Manipulating DNA through

10

this approach offers specificity and reversibility and may serve an alternate platformto

affectDNAstructure by design.

1.4

Looping Proteins and Their Influence on Topology

The phosphate backbone of the double helix presents the molecule withseveral advantages within in a cell. The negative charge it carries contributesfavorably to its solubility and makes its diffusion through cellular membrane unlikely.Forproteins seeking to have some effect on DNA, this charge densityserves as abeacon. It is not difficult to imagine how early peptides with dense regions ofarginine and lysine could have first adapted to binding DNA.From transcriptionfactors, to histones, to DNA repair enzymes, proteins have evolved to interact withDNA to perform a myriad of functions. As organisms evolved and their genomesexpanded, proteins with DNA binding ability becameincreasingly valuable in theeffort to maintain order.

Supercoiled DNA can be viewed as energetically primed. As discussed, it iseasier to compact, transcribe, and replicate DNA that is negatively writhed. Thisenergy is locked in position because the linking number of DNA cannot changeunless one or both of the strands are broken. But DNA is not an infinitely stablemolecule and the threat of single-strand nicking or double-strand breaks places thegenome in structural peril. Fortunately,proteinshave adapted to protect againstthesecommon threats by forming loops to lock DNA in position. DNA looping proteinsare therefore able to create isolated regions of topology where the actionson regionsare structurally separate from another. InE. coli,electron micrographs were able toobserve such loops forming arounda central core in the nucleoid

(Kavenoff &

11

Bowen, 1976).This work, and others like it, led to the formation of the rosette theoryto describe bacterial DNA structure. While still not fully understood, loop formationthroughout theprokaryote genome is a highlyregulated phenomenon, managed by anumber of key proteins such as H-NS and HU

The compaction of DNA, a global event in principle, ismanaged, with fewexceptions, by proteins that bind to DNAwithout regard for sequence recognition.Gene transcription, a process requiring access to a linear form of DNA, can be viewedas a local event and, in contrast, typically involves proteins that bind in a sequencespecific manner. Because bothof these extremes must coexistfor survival, thegenomeis in constant stateofbalance between a need for compaction and aneed forexpansion. As discussed, the mechanismsemployed to spatially manage DNA are

impressivebut relatively few in number. However, for the purpose oftranscription,

12

the required specificity implicit in regulating thousands of unique genes has led to animmense diversity of control mechanisms. Leaving aside the discussion of signalingpathways that may add layers of complexity to gene regulation, the essence oftranscription can be distilled to the notion of a genetic circuit, capable of being turnedon or off.

Early insight into this regulatory approach came in 1961, from Jacob andMonod and their work withE. coli. They noticed that the expression of threeproteins, β-Galactosidase, permease, and transacetylase was enhanced in the presenceof lactose

(Jacob & Monod, 1961). They theorized that the expressionofthe threegenes, now known aslacZ,

lacY, andlacAfrom thelacoperon, were activated bylactose andrepressed by some unknown agent in the absence of lactose. This agentwas later identified asthe lac repressorprotein(LacI) whose own expression wascoded by thelacIgene at the upstream portion of thelacoperon. Its repressionactivity was linkedto its ability to bind specifically to region of DNA within thelac

operon,where it blocked RNA polymerase from binding

(Gilbert & Maxam, 1973;Gilbert & Müller-Hill, 1966). Furthermore, the identification of two other localbindingsites for LacI within thelacoperon suggested possible DNA loopconformationsin vivoand that these sitesprovided enhanced repression throughcooperativity(Krämeret al., 1987; Oehler, Eismann, Krämer, & Müller-Hill, 1990).Looping was proven by a clever experiment that showed that repression levels of theregulated genelacZweredependent on the periodicity of the LacI binding sites

(Bellomy, Mossing, & Record, 1988). This experiment was further refined and thelimits of looping tested well below the 91 bp that separate thebinding site in the wild

That looping existed and could occur atsuch small lengthsled to an evolutionof our understanding of thelacoperon system. Its newly uncovered complexityconfirmed DNA looping to be a means of enhancingthe regulatory power of proteinsinvolved in gene transcription.

While arguably the most characterizedDNA looping protein, LacI is not alonein its mechanism. AnotherE. colitranscription pathway, the Gal repressosome

utilizes looping and wrapping of DNAaround the gal repressor protein (GalR)in itsregulatory role

(Haber & Adhya, 1988). This model is distinct from thelacoperon inthat a secondary protein, HU, is involved in binding and kinking DNA within the loopFigure1.4(From Müller et al., 1996) Repression levels of chromosomallacZexpressionwith increasing spacing between the LacI operatorsites. The repression is shown to bedependent on the phasing of the operators sites and correlates to the helical repeat of DNApresenting a classic demonstration of loop formation.

The relatively recent technique, chromosome conformation capture (3C), inwhich chromosomal DNA is covalently cross-linked to bound proteins and then thoseinteractions are mapped by digestion, ligation, and PCR, has provided asystematicapproach to DNA loopingin vivoand has begun to elucidate its frequency

(Davisonet al., 2012; Tolhuis, Palstra, Splinter, Grosveld, & de Laat, 2002; K. Yun, So, Jash,& Im, 2009). The prevalence of looping in eukaryotes, and its capacity to exist oversurprising large distances of tens or hundreds of kilobases, further underscores thesignificance of DNA looping as a means of spatial control within a cell.

1.6

Implications of Looping Size andSynthetic Manipulation

DNA looping over very large lengths, such as those discovered using the 3Cmethod, must overcome entropic hurdles to bring together these distant sites. Thelarge lengths do mitigatethe energetic cost of bending ortwisting DNA, and it can beconcluded that looping

DNA many times longer than its persistence of 50

nm(roughly 150

bp) is independent of the geometry of the bound DNA

(Hagerman,1981).In contrast, looping events of much smaller scale, such as the 91

(Oehler et al., 1990; Shore & Baldwin, 1983a). The existence oflooping well under the persistence length, such as the formerly mentionedLacI-mediated loop, has been explained,in part,by attributing a fraction of the energeticcost to flexibility inherent in the looping protein

is thecase, the ability of the protein to assume multiple conformations stabilized thesmall loop

(Rutkauskas et al., 2009).The LacI protein, which is a tetrameric proteinheldtogether by a leucine-rich four-helix bundle (4HB), contains two regions ofconsiderable flexibility: the hinge region separating the DNA binding domain fromthe N-terminal core domain and the proline-rich linker connecting the C-terminal coredomain to the 4HB. Recent work involving DNA fragments with inherenttopological strain inducedby poly-adenine tracts (A-tracts), suggests that both anopen and closed form of LacI may form depending on the contour of the DNA

(Haeusler et al., 2012). In mutation studies involving the spacing of the LacI operatorand its effect on repression rates, it was found that loops could form in vivo at lengthsas short as 57 bp(Müller et al., 1996). Looping has beenconfirmed by the fact thatrepression levels depended on the periodic spacing of the operators and correlated tothe helical repeat of DNA

(Bellomy et al., 1988). This result is truly remarkable giventhat this represents distances slightly over one third the persistence length.

Acompeting theory of enhancedDNA flexibilityat short lengths hasbeenputforth to alternatelyexplain the existence of very small loops. In this model,theformation of spontaneous kinks in DNAresults in enhanced bending effects at shortlengths.Thetheory was supported using DNA cyclization experiments of very shortlengths (85-105 bp) whereuni-molecular, or cyclized products formed with far higherfrequency than predicted by common models used to describe DNA behavior such asthe Worm-like Chain(WLC)model

(Cloutier & Widom, 2004; 2005; Wiggins et al.,2006). The ratio of the formation of uni-molecular products and bimolecular productis expressedbythej-factor and has been used determine the torsional rigidity of DNA

16

and calculate its persistence length

(Shore & Baldwin, 1983b; 1983a). Thespontaneous kink theory is currently a source of contention and the approach used todemonstrate it has been openly challenged

(Du, Smith, Shiffeldrim, Vologodskaia, &Vologodskii, 2005). A DNA looping protein could be used to investigate this shortsequence enhanced flexibility, butonly if the protein served as a rigid link betweenthe bound DNA. Naturally occurring looping proteins rely on inherent flexibilityand/or additional DNA binding proteins to alter the loop topology and increasestabilityas seen in thelacoperon and Gal repressosome

(Becker, Kahn, & Maher,2005; Roy et al., 2005). These natural adaptations result in such proteins beinginapplicable for studying DNA flexibility in isolation.Lacking availability of apreexistingrigid DNAlooping protein, our lab set out to engineer an artificialalternative.

1.7

Incorporating Rigidityinto a DNA Looping Protein

De novoprotein design will, bydefinition, begin at the level of its buildingblocks. Because this protein must meet certain structural specifications, namelyuniform rigidity, forethought must go into how the amino acidsequence willultimately fold. Of the limited secondary structures observed in peptide folding, itseemed logical to commence with a comparison of their relative flexibility. While noorganic polymer withcellular origins canbe considered truly rigid, as compared tomacroscopic thingssuch as lumberand steel, the relative stiffness of microscopicpolymers can be rated using metrics such aspersistence length.The persistencelength can be thought of as a way of expressing the energy required to bend a

17

polymer. As seen in equation 3, the free energy of bending is directly correlated to thepersistence length,a, over the contour, L, with a total bend angle, ΔΘ

(Kahn &Crothers, 1998):

Eq. 3

€ΔGaRT2L(ΔΘ)2

Molecular-dynamics simulations performed on peptides that consisted ofacontinuousα-helix concluded the structure to have a persistence length of 100 nm, ortwice that of DNA

(Choe & Sun, 2005). Furthermore, similar analysis on the structureof a coiled-coil ofα-helices, like that in the leucine zipper motif, increased thepersistence length to nearly 150

This region has been frequently used to study coiled-coil structure and played a majorrole in deciphering the amino acid trigger-sequence that dictates the oligomerizationstate in coiled-coil structures of two or more helices

(Ciani et al., 2010). Figure1.5

was generated using the crystal structure solved by Burkard and colleagues andillustrates the large coiled-coil feature of coxtexillin

(Burkhard, Kammerer,Steinmetz, Bourenkov, & Aebi, 2000).Like nearly all coiled-coil dimers, cortexillinassociates in aparallelorientation anddisplays a left-handed geometry along thehelical axis. The crystal structure has been used to calculatea rotational period ofroughly 49 aa (or 7 heptad repeats) for every 180° of twist. This rotational featurewas taken into consideration when designing the length of our looping proteins andits effect on binding site orientation.

1.8

DNA Binding with Basic Leucine Zipper Proteins (bZip)

The bZip structural motif is a DNA binding domain used in a class oftranscription factors whose origins have been traced back one billion years

19

(Amoutzias et al., 2007).Because the leucine zipper is a coiled-coil structure, use of abZip DNA binding domain is appealing in the design of a rigidDNA looping protein.

In an effort to minimize the potential for flexibility, the peptide structure should becontinuous in nature, meaning that the coiled-coil motif is to be maintained for all,ornearly all of the structure.c-Myc is a DNA binding protein found in humans that wasfirst identified by way of its sequencesimilarity with the oncogene v-Myc from theavian myelocytomatosis virus

(Dalla-Favera et al., 1982). Structurally this protein issignificant because its similarity toCCAAT-enhancer binding protein (C/EBP),specifically the placement of leucine residuesat thedposition of the heptad repeat(abcdefg)over the span of four helical repeats, led to the discovery of the leucinezipper motif and its recurrent associationwith DNA binding regions

(Landschulz,Johnson, & McKnight, 1988). Further characterization of the structure uncovered theimportance of the electrostatic interactions between

residuesof oneα-helixwith thea’andd’residues of its pairingα-helix.Additionally,electrostatic interactions of thee

residues of one helix with theandg’

residues of the helix lead to greater stability.To the N-terminal of the leucine zipper,the DNA binding region of this motif makes frequent useof the basic amino acidslysine and arginineas contact pointswith the DNA phosphate backbone. It is thecombination of a basic binding site and the leucine zipper that has led thistothis

Figure1.6A graphical representation of theresidue interactions of the GCN4leucine zipper.Left, anα-helixdiagram depicting the hydrophobic burying of the a and d residues in thecoiled-coil. Right, a space filling illustration showing both the hydrophobic burying of the aand d (red and blue spheres) as well as the interaction between the g and e’ residuesbetweenalpha helices (green and yellow spheres).

There exists a great deal of variety among the bZip members. All are capableof dimerization but many, such as the human fos/jun pair as heterodimers

(Ellenberger, Brandl, Struhl, & Harrison, 1992; O'Shea,Rutkowski, Stafford, & Kim, 1989). Among the DNA binding regions there alsoexists a degree of structural variance. Previous work with c-Myc suggested that itwas capable of forming a tetramer that could bind DNA at two points to form a loop

Loop-Helix motif for the transition from the helical binding region to the helical

zipperdomain

(Fisher, Parent, & Sharp, 1993). This structural feature was thendemonstrated in the solved crystal structure as a heterodimer with its proteincounterpart Max(Nair & Burley, 2003). Here, the loop region

junctionlikely plays arole in stabilizing the interaction and enhances the binding but may affordthe proteinflexibilityand as suchshould be avoided in our design. Moreover, recent work withc-Myc and its sometimes dimerization partner Max demonstrated that while theproteins could fold in a matter that allowed for binding two strandsof DNA, in astructure termed a “sandwich complex”, the binding was found to be too weak tosupport the formation of a DNA loop

(Lebel, McDuff, Lavigne, & Grandbois, 2007).This prior work would exclude c-Myc from further consideration in the designprocess, but it was illuminating in suggesting a route to combine two DNA bindingsites along a coiled-coil motif.

The yeast transcription factor GCN4 was identified by its association with theHis3gene and its role in regulating amino acid biosynthesis during periods ofstarvation

(Hope & Struhl, 1985). Further analysis indicatedthat it bound to DNA indimeric form

(Hope & Struhl, 1987). The following year, the c-Myc & C/EBPcorrelation led to the announcement ofthe bZip familymotif and it was quickly notedthat the DNA binding region of GCN4 aligned with this proposed structure. Thestructure of the leucine zipper region of the protein was then solved in 1991, whichsolidified its status in the bZip family(O'Shea, Klemm, Kim, & Alber, 1991). Thecomplete bZip domain

bound to the pseudo-palidromic AP1DNA(5’-ATGACTCAT-3’) wassolved the following year by Ellenberger, et al. andrevealeda

Figure1.7The crystal structure of GCN4 bZip domain illustrates a continuousα-helicalstructure between thecoiled-coil and the DNA bindingsite. The continuousα-helix isintended to confer rigidity to the proteins. Image created using Pymol with PDB:1DGC

points(Keller, König, & Richmond, 1995).This workwas able to provideacontactmapfully elucidating the interactionbetween one of theα-helices and half of thepalindromic binding sites.This is depicted inFigure1.8, takenfromKeller, et al.1995.The continuous extension ofα-helical structure between the coiled-coil regionand the basic DNA binding site is of particular interest because this structure confersthe greatest chance of maintaining rigidity if applied to a DNA loopingprotein.

23

GCN4 was,therefore,selected as the starting template for our artificial DNA loopingprotein. For a means of combining two DNA binding-sites our design turnedelsewhere.

of size possibilities and the incorporation of highly specific sequence recognition,such a system offers tremendous potential for eliciting control over DNA. It willundoubtedly take a great deal of bioengineering to convert a looping concept into aclinical reality, but it can beginwith a simplestatementof purpose: design anartificial DNA looping protein and investigate howit can manipulate DNA structure.This thesisdescribesthe design, purification, and expression of a series ofartificialproteins (Chapter 2) the binding characterization of the various peptides(Chapter 3),evidence of transient DNA loop formation (Chapter 4), and subsequent analysis of thetopological manipulation induced by loop formation with our proteins(Chapter 5).Bycreating an artificial DNA looping protein, we have createda platform for affectingDNA topology by design. Additionally, the binding-site specificity and ability of theprotein to alter the DNA binding site orientation through design modifications makesthiswork potentially well suited todeveloping self-assembling protein:DNAnanostructures.

The argument for using the coiled-coil structure in designing a rigidDNAloopingprotein is presented in sections 1.7 and 1.8. The application of this conceptresulted in two majordesign approaches: atetrameric designand dimeric design.Both of these structures would be assembled using homodimers with GCN4 DNAbindingdomains. Future work with this project may find the use of hetero-multimeric assembly appealing, as this would provide greater variety to the DNAbinding sequence, which in our design is limited to palindromic sequences. Such adesign was not consideredin our application here. This chapter will describe thedesign and synthesis of the tetrameric and dimeric DNA looping protein designs usedin this project.

2.2

Design of a Tetrameric DNA Looping Protein

The LacI DNA looping proteinfolds into a stable tetramer as a dimer of dimers,in whichdimeric core domains are held together bya leucine-rich four-helix bundle

GCN4 coiled-coil (green) was addressed by incorporating a heptad repeat linker-(magenta) to allow the coiled-coil helices to partially separate as they transitioned tothe 4HB.Figure 2.2 is a schematic representing the assembly of the tetrameric DNAlooping proteins.

The dense packing of hydrophobic residues in an extended leucine zipper may

present solubility issue for our peptides. To account for the possibility of an insolubleproduct and the unknown element of transitioningbetween a coiled-coil and 4HBdomain, four mutants were designed where each incorporated a unique linker.

Genesexpressing these four mutants weresynthesized andcloned into plasmid pRSETAby

Figure2.3

Modular assembly of the 4-helix bundle (4HB) proteins (A). Sequences given forthe 4 constructs with the various domains underlined according to purpose: yellow–commonN-terminal 6X histag and Enteropeptidase site (dashed underline), red–basic bindingregion,green

–leucinezipper region, magenta–linker, blue–4 helix bundle region.

Jason Kahn, expressed and purified as described in section 2.4.Figure2.3illustratesthe modular design of the four constructsLZEE,LZAR, 4HEE, and 4HAR.

29

2.3

Designof the Dimeric Looping Protein

As indicated inFigure2.3, three of the four tetrameric constructs expressed asinsoluble peptides.This conclusion is taken from SDS PAGE analysis of the solublelysis and insoluble pellet done during purification (Figure2.8). While purification ofthese peptides was achievable using 6 M guanidine, efforts to refold the proteins uponremoval of the guanidine proved unsuccessful. Additionally,binding analysis of thesoluble LZEE construct provided evidence that the protein was not folding into atetrameric state capable of binding two DNA fragments (see section 3.2.1).

It was thereby necessary to develop a

second

approach todesigning anartificiallooping protein. This subsequent engineering effortwas more an extension of theprevious design rather than a complete restructuring. The arguments for the coiled-coil motifconferring rigidity were sound and the strongbinding of the GCN4 basicbinding site had no shortcomings. The problem resided with the tetrameric domainand the likely possibilitythatdimerizationrather than tetramerization of LZEEresulted in amore stable structure. Insteadofatetramericlinking domainwe turnedtoa simpler assembly, a dimeric leucine zipper dual-binding (LZD) protein.

2.3.1

ThereverseGCN4 DNA Binding Protein

The inspiration for the next stepcame from work on the GCN4 peptide byMartha Oakley. Her group’s investigation intothe folding of bZip peptides led her toask whether there was an inherent thermodynamic reason that all bZip DNA bindingproteins positionthe basic regionto the N-terminalsideof the leucine zipper domain

(Hollenbeck, Gurnon, Fazio, Carlson,& Oakley, 2001). In an experiment that can

30

only be described as essential to this project, her lab reconstructed the GCN4 peptideby inverting theorder of thetwo domains and positioning the binding region at the C-terminal of the peptide, as illustratedinFigure2.4.

Figure2.4Modular assembly ofreverseGCN4 created by Hollenbeck and Oakley (2001).The reversal of positions of the basic binding region (yellow) and the leucine zipper region(green) was performed to access whether there was a thermodynamic reason for theevolutionof the N-terminal basic region arrangementamongnaturalbZip DNA binding proteins.

The protein was simply namedreverseGCN4 or rGCN4. To avoid confusionwith recombinant nomenclature, it will only be referredto hereasreverseGCN4.Empirical work with theα-helical phasing of the basic regions with respect to theleucine zipper usingbinding assays involving DNA with variants of an invertedCREB siteproduced a peptide that could bind DNA with near wild-type affinity (Kd

Thisdesignshould not beconfused with work that reversesthe sequence ofamino acidsfrom C to N-terminal. This structural change has previously been donewith the leucine zipper sequence of GCN4 in creating aretroGCN4 peptide, whichfolds into a stable4-helix bundle(Mittl et al., 2000).

ThereverseGCN4 artificial protein presented a perfect opportunity to simplifyour looping protein into a dimeric structure. By fusing the GCN4 bZip peptide withthereverseGCN4 peptide sequence the folded dimer should contain twoDNAbinding domains.The amino acid sequence separating the two binding sites wasdetermined by aligning thereverseGCN4 sequence with GCN4 bZip resulting in a 73amino acid sequence from this beginning of the N-terminal binding site to the end of

32

the C-terminal binding site. The protein design was termed LZD73. A geneexpressing this peptide was cloned into pRSETA that incorporated an N-terminal 6Xhis-tag and Enteropeptidase cleavage site (-DDDKD-). The left-handed geometry ofthe coiled-coil motif presented a uniqueopportunity to

adjusttheangles between thetwo DNA strands.Because the coiled-coil wraps around itself andthe binding site ofthe DNA is

perpendicular to the coiled-coil axis, an extension of the coiled-coilshould result in a change in the relative binding.To investigate this possibility, asecond looping protein mutantwas designed to incorporate anadditional 14 aminoacids between the GCN4 leucine zipper and thereverseGCN4 linking sequence.Keeping with the nomenclature establishedwith LZD73, the additional 14 aminoacids isreflected in the name LZD87.An N-terminal overlap ofmodelsfor LZD 73and LZD87 bound to CREB and Inv-2 DNA is depicted

Figure2.6.

Figure2.6Overlay of renderings forLZD73 (green) and LZD87 (blue) DNA binding proteinsbound to 20 bp DNA with either CREB or Inv-2 site sequence at the N-terminal and C-terminal, respectively.Pymol image illustrates the coiled-coil left-handed orientation andhow the length change has leads to arotationof the relativebinding sites

33

The effects of the addition of 14 amino acids can be seen in the change in bindingorientation of bound DNA segments.

Figure2.7Aillustratesthe modular assembly of these two genes and2.7B liststheamino acid sequence for each. By extending the leucine zipper domain by twoheptad repeats, the hydrophobic content of the peptide was increased.

(B)Sequences used in the design with the underlined regions corresponding to the modularillustration depicted in (A).

The solubility problems encountered in the 4HB mutant work raised concernsthat this might lead to similar folding difficulties. In order to maximize the likelihoodthatthis mutant would be soluble, the 14

aa sequencewas taken directly in framefrom LZEE, the soluble 4HB peptide. For visualization purposes, two models weregenerated using Pymol(see Figure 2.6). This image is meant to be illustrative anddoes not reflect any knowledge of the actual binding site angleorientation.

In thefigure above, theN-terminals have been aligned to highlight the binding siteorientation differences at the C-terminal domain.

34

2.4

Expression of 4HB and LZD proteins

All reagents were purchasedfrom Fisher Scientific with the exception of [γ-32P]-ATP, which was purchased from PerkinElmer. Polynucleotide kinase waspurchased from New England Biolabs (NEB). Protein chromatography wasperformed on the AKTA FPLC using columns purchased from GE Healthcare.Centrifugal filters werepurchased from Millipore. Bio-spin 6 columns werepurchased from Bio-Rad.

2.4.1

4HB Mutant Expression

Each of the four 4HB sequences denotaed previously were prepared byoligonucleotide synthesis and mutually primed extension to give the plasmidspLZEE, pLZAR,p4HEE, and p4HAR. The expressed sequence contained an N-terminal 6X histidine tag for metal chelate affinity purification as well as anEnteropeptidase binding/cleavage sequence (-DDDDKD-) between the his tag and the4HBopen reading frame. The plasmidsweretransformed into electrocompetentBL21 DE3 (pLysS) cells by electroporation. The ORF sequence for each of theseproteins is found in Appendix A. After rescue with SOC (1

mL) and1 hr at 37°C

with shaking, the cells (15

µL) were streaked on LB agarcontainingampicillin (100

mg/L)and chloramphenicol (40

mg/L). The plates werethen incubated overnight at37°C. A single colony was selected the following day and expandedovernight in a 5mL LB culture (+Amp/+Cam) with agitation, at 37°C. The culture was then used toinoculate a pre-warmed1 LLB (+Amp/+Camagain) solution in a4 LErlenmeyerflaskin the morning and allowed to grow for 4-6 hours until the optical density(OD600) reached 0.6. Expressionwas induced by the addition of IPTG(0.5

mM)and

35

the cells were allowed to express for 3 hours. The cells were harvested bycentrifugation for 15 minutes at 12,000

Plasmids containing the sequences coding for LZD73, LZD87,and the singlebinding C-terminal control

reverseGCN4were transformed intoE.

coliBL21 DE3(pLysS) cells,selected forexpansion and then grownin 5

mLstarter culture asdescribed forthe 4HB mutants. Because of slower growth relative to the previousmutants, the timescale for pre-induction growth and expression length was adjustedaccordingly to maximize yield. This retarded growth for cells carrying the LZDprotein genes is likely due to leaky expression of the high-copy pRSETA expressionsystem. It can be inferred that the LZD proteins are toxic for the host cells. It ispossible that use of pLysE in place of pLysS could increasethe growth rate during thepre-induction stage. RelativetopLsysS, pLysE has ahigher expression of T7lysozyme, which binds to and inhibits T7 RNA polymerase. The basal expression ofT7 RNA polymeraseduring pre-induction growthleads toleaky expression of thetargetpRSETA-basedgene,and the leaky expression of a toxic protein is the likelycause of the diminished growth rate.For pre-induction growth, the 5

mL starterculture was used to inoculate 1

L of 37°CLB that had been pre-warmed overnight.This step isperformed early in the morning, because growth is very slow at this step.After 10 hours of shaking at 37°C, the cells typically have reached an OD600

between 0.4-0.6. Expressionis induced at thispoint by the addition ofIPTG (0.5

36

mM)and the protein was given an extended, 18 hr,expression time(overnight). Thefollowing morning, the cells were harvested as performed for the 4HBmutants.Yields of cell paste(by weight)were similar to those of the 4HB despite the total

growth time being more than doubled.

2.4.3

Extraction and Purification of 4HBProteins

A typical purification scheme begins with 1.5

g cell paste. The cells werethawed and resuspended in 20 volumes (30

mL for 1.5

g cell paste) of lysis buffer (10

mM MES pH 6.0,0.5

M NaCl, 20

mM imidazole) and ruptured by French Press (3passes) under 15,000PSI, with ice bath chilling. Care must be taken to ensure a slow,drop–wise, use of the French Press, as haste leads to poor lysis quality. The lysatewas then centrifuged for 30 minutes at 22,000xgand the soluble supernatantdecanted and filtered through 0.2µmmembranesyringe-based disc

filter (Whatman)prior to chromatography. Analysis of the lysis material (soluble supernatant andinsoluble pellet) revealed that only LZEE was soluble upon lysis.