Introduction, goals, and standard techniques

Scope of document

This is a description of several proposed strategies for producing molecular
building blocks (MBBs) consisting of DNA-protein conjugates with a different
specific DNA sequence attached to each of several specific attachment sites
on a protein, for use in the proposed process, DNA-Guided
Assembly of Proteins (DGAP). (That process is described in a separate
document, which should be read before this one.)

Although we (I and my partners) prefer some methods over others, I describe
a variety of methods of different kinds for making MBBs, in part because
some methods differ in which proteins can be used, or in the number of
DNA attachment sites per protein that they allow, and in part because we
may encounter unexpected difficulties in our preferred methods, and thus
would like to have fallback methods available. (We have other methods or
variations not included here for reasons of brevity.)

The methods described cover a variety of levels of difficulty or risk
to develop, cost to practice, and level of generality of MBBs that can
be made, as well as of implementation techniques. Many applications of
DGAP will be made possible even if only one of these methods is implemented.
We expect that most of them could eventually be implemented, and we might
implement more than one if the new applications made possible by the newly
accessible kinds of MBBs justified it.

To develop any of these methods, we expect that we will need to work
in an existing lab with the help of researchers and lab technicians experienced
in the specific kinds of techniques we will use. Furthermore, many of the
specific protocols, described here in general terms, will need to be chosen
and developed with expert advice, including protocols for separation (discussed
here very generally) and other kinds of characterization and verification
(not discussed at all), as well as the synthesis of linker molecules and/or
the genetic engineering of core proteins for MBBs, which would be needed
for some of the methods.

(Other issues not discussed in this document include the details of
the P-sites and the covalent crosslinking between P-sites on different
assembled MBBs, the geometry of attachments between MBBs, and any of the
specific possible applications of these MBBs assembled using the DGAP process.)

Design criteria for MBBs

This document does not discuss the specific applications of DGAP in which
the MBBs described here are meant to be used. However, all of these applications
have the common feature that the specific 3-dimensional arrangement of
proteins is important, whereas the specific identities of the proteins
used are of much less importance (except possibly for a few MBBs per design),
since most MBBs are used purely as mechanical elements in the designs,
i.e. as scaffolding for other molecules. Similarly, the specific DNA sequences
attached to each MBB are important only for guiding the proper assembly
of several MBBs into a larger structure. Accordingly, the proteins used
in most MBBs can be chosen for ease of production of the MBBs, and so can
some portions of the attached DNA sequences, provided some other portions
of the DNA sequences are able to be different for each attachment site
of each specific kind of MBB.

In order to uniquely orient each MBB using the DNA (attached to the
C-sites), at least three C-sites are required. Many structures will be
easier to build from MBBs with at least 4 C-sites, distributed (very roughly)
tetrahedrally, so that the protein can be pulled stably in any direction
using the closest three of the four sites. Four C-sites should be sufficient
for most structures, but up to eight or so sites could be useful in some
cases.

Core proteins and attachment chemistries mentioned in this document

In the methods described in this document, the protein used as the core
of an MBB will be either a streptavidin tetramer, with its biotin- binding
sites used as DNA-attachment sites, or an unspecified asymmetric protein
with several cysteine residues on the outer surface, probably introduced
by genetic engineering at positions chosen for this application, used as
DNA-attachment sites. In the terminology of the document describing DGAP
in general, these DNA-attachment sites are the C-sites; the P-sites will
either be special groups included between the DNA and the group that attaches
it to the C-site, or will be other surface amino acids of the MBB protein.

(Attachment of suitably modified DNA to the sulfhydryl groups of cysteines
or to the biotin-binding sites of streptavidin are standard techniques
[Hermanson
1996].)

It is also possible that specific functionalization of surface lysine
residues could be used to form an additional C-site on certain core proteins,
after genetic replacement of endogenous lysines and introduction of new
lysines at desired positions for C-sites. Similar genetic modifications
have been done for other reasons [Gaertner et.al. 1992].
The discussion of lysine functionalization by anhydrides in
Hermanson
[1996, p. 145] implies that specificity for this residue is possible,
though we have not yet investigated this sufficiently.

Genetic engineering of proteins to modify surface amino acids

There are standard techniques for generating substantial quantities of
proteins with several specifically-designed amino-acid replacements, involving
site-directed mutagenesis of cloned genes and insertion into bacterial
plasmids for expression. For example,
Saraswat et.
al. [1992] used this technique to replace the two endogenous cysteines
of a natural protein with alanine, and to add new cysteines at each of
5 different positions chosen for purposes related to their application
(one in each of 5 new protein species), obtaining yields of 70-80 mg protein
/ liter of bacterial culture. Other examples include
[Kanaya
et. al. 1992] and [Gaertner et. al. 1992].

There is no inherent limit to the number of amino acids that can be
modified with this technique, since modified genes can be amplified between
sequential replacements, if necessary. Provided that the replacements are
isolated surface amino acids, it is likely that the modified protein will
fold in the same way as the native one [Handel 1995, personal communication].

General strategies for site-specific attachment of DNA sequences

For the asymmetric molecules, attachment sites can be chosen at geometrically
distinguishable positions on the protein, and the methods described must
produce MBBs in which the DNA sequence attached at each site is not only
different from the other sites, but has a specific correspondence to the
site, which we determine in advance.

For a streptavidin tetramer whose biotin-binding sites are used, the
molecule's symmetry (with three 2-fold rotation axes -- less symmetrical
than a regular tetrahedron) renders each site indistinguishable, but if
one site is chosen arbitrarily, the other three are distinguishable from
each other (they are all at different distances from the chosen site);
this means that to have only one species of MBB, it is still necessary
to produce only one geometrical arrangement of attached DNA sequences,
out of the 6 arrangements that would be possible given only that exactly
one copy of each DNA sequence is attached to each tetramer.

Some of the methods described here achieve the necessary specificity
of DNA attachment by taking advantage of the different distances between
different pairs of attachment sites, either during the construction of
MBBs, or by separating the correct ones from the others after they have
been constructed nonspecifically. One method makes use of site-specific
blocking groups, one of site-specific attachment chemistries, and one of
the ability to assemble certain multi-protein complexes from distinct subunits.

Attachment Methods

The methods are given in order of ease of description. Method 3 is the
most complex to describe, but it is the one we will try first since it
is likely to be both the easiest method to develop, and the one for which
new MBBs differing only in attached DNA sequences can be made most quickly.
Accordingly, we have investigated it more thoroughly than the other possible
methods given here.

Method 1. DNA-conjugation to each monomer in a multi-subunit complex.

Core protein: some multi-subunit protein (to be chosen), in which several
subunits occur only once in the complex, and in which subunits are available
separately and can be mixed to reconstitute the complex in a unique arrangement.

Other requirements: ability to attach DNA to one site on each of several
subunit proteins (using any of the methods mentioned previously, i.e. to
an introduced surface cysteine, or to the amino or carboxyl terminus);
some knowledge of structure of subunits and complex.

Outline of method:

Attach a different DNA sequence to each subunit in isolation, then mix
the subunits so as to reconstitute the complex. Probably necessary to covalently
crosslink the subunits to stabilize the complex.

We have not yet searched for a suitable protein complex (for which there
is enough structural knowledge), so we cannot be sure one exists.

Unless DNA attachment was entirely at chain termini, genetic engineering
of subunits would be necessary (e.g. to introduce cysteines).

If the complex has to be stabilized by crosslinking, whether it would be
acceptable for this crosslinking to be non-site-specific is application
dependent (and not yet known). If site-specific crosslinking is necessary,
this might require genetic engineering to remove or introduce crosslinking
sites. If these were cysteines, the C-sites could not be cysteines.

Possible variations:

This method might also be applied to proteolytic fragments of some monomeric
protein which can reassociate and be covalently linked into the native
form [e.g. Gaertner et. al. 1992].

If some other method described here can be applied to each subunit individually,
so that each subunit carries several attached DNA strands, combining the
subunits as in this method might be used to generate MBBs with many more
attached DNA strands than would be possible with other methods used alone.

Method 2. Use different attachment chemistry at each of three sites.

Core protein: a monomer with an N-terminal serine or threonine whose amino
and carboxyl termini are exposed on its surface, provided they are sufficiently
separated (more precisely, when the last few amino acids are flexible:
provided that the points at which each chain-terminus becomes flexible
are sufficiently separated), and with a single surface cysteine (sufficiently
separated from the termini).

C-sites: the amino terminus, the carboxyl terminus, and the single surface
cysteine

Outline of method:

Attach a different DNA sequence to each C-site by using an attachment
protocol which affects only that kind of site. (Sequences with no significant
complementarity should be used.)

Details:

For all attachments of DNA described in this method, the general strategy
will be to modify each C-site (in a way specific for that site) to introduce
a functional group not otherwise found on the protein, and then to conjugate
the resulting modified protein to DNA (with a suitable functional group
attached in a separate prior step).

Attachment of functional groups to cysteines (and the introduction of
cysteines by genetic engineering) was discussed in the Introduction.
Attachment of functional groups specifically to the amino terminus can
be done by mild oxidation of an N-terminal serine or threonine [Fields
& Dixon 1968;
Geoghegan & Stroh 1992;
Gaertner et. al. 1992]. Attachment of functional
groups specifically to the carboxyl terminus can be done by reverse proteolysis
followed by hydrazone bond formation, under mild conditions [Rose
et. al. 1991;
King et. al. 1986].

The specific choice of functional groups and final conjugation chemistries
have not yet been made, but several alternatives appear to be available
[Hermanson
1996]. If each attached DNA and its linker is stable under the procedures
for attachment of subsequent DNAs (as is reasonable to expect given the
mild conditions of the functionalization procedures referred to), it is
likely that the same final conjugation chemistry can be used in each case,
with each DNA added before subsequent C-sites are functionalized. If not,
it will be necessary to attach all the DNAs at the end and thus to use
three different final conjugation chemistries.

Since these methods have less than 100% yield, the correct MBBs should
be purified at the end, for example by affinity separation using each required
DNA sequence in turn, or gel-retardation (possibly using all complementary
sequences at once), or by overall charge or molecular weight (e.g. by DNA-denaturing
electrophoresis). (It will probably be desirable to purify the MBBs at
various intermediate stages as well, especially during development of the
protocols.)

The bases of the flexible portions of the termini (and also the sulfhydryl
group of the cysteine) need to have some minimum separation, which I estimate
to be 6 to 8 Angstroms, in order to permit use of the resulting MBBs in
the DGAP process.

Possible Problems:

Only three attachment sites can be functionalized. For a few applications
this will be sufficient (since 3 fixed points are enough to hold the protein
in a unique orientation, as mentioned in the Introduction),
but for most applications we would prefer to have at least four distinct
C-sites.

Possible Variations:

If two surface cysteines were present, this method could be used to produce
MBBs with four C-sites but with only three different kinds of DNA sequences
attached to them (with two copies of one sequence on the cysteine C-sites).
For some applications this would be preferable to having only 3 sites in
spite of the nonuniqueness of two of the sequences.

It may be possible to use a single lysine as a fourth C-site with its own
specific attachment chemistry.

This method might be combined with the hybridization detection used in
Method
3 to allow two (or possibly more) cysteines to receive different sequences
in a distinguishable way (assuming the appropriate C-site-pair distances
were sufficiently different, as described in Method
3), making possible MBBs with 4 (or possibly more) distinct C-sites.

Method 3. Separation of various DNA-conjugated species of streptavidin,
based on distances between DNA attachment sites.

This is the method we will try first, since
it is likely to be the easiest one for making many MBBs differing only
in DNA sequences, which is desirable for constructing large assemblies
of MBBs. (Unfortunately it is not the simplest method to describe.)

Mix streptavidin tetramers in solution with two species of doubly-biotinylated
ssDNA (described below; Fig. 5). The two biotins on each ssDNA will be
designed to be close enough that they must bind to a pair of binding sites
on a single side of a streptavidin tetramer (or to sites on two different
tetramers) [Green et. al. 1971]. The desired MBB
(Fig. 2) consists of single streptavidin tetramer conjugated to one ssDNA
of each species, with a specific one of two possible geometrical arrangements,
given that each ssDNA has bound to two sites on one side. Desired arrangement
will allow hybridization between parts of the two ssDNAs on a single MBB,
which will be impossible in the other arrangement due to different distances
between different pairs of biotin-binding sites, or due to two copies of
the same species of ssDNA being conjugated to one tetramer. Thus under
non-denaturing conditions, only the desired end product particles will
consist of just one streptavidin tetramer, conjugated to the right amount
of DNA, and with the desired hybridization of some of that DNA; this will
allow the correct MBBs to be separated from the others.

This method takes advantage of the ease of obtaining biotinylated DNA and
attaching it to this protein. (For various reasons (mentioned below) we
may prefer to use either streptavidin, avidin, or deglycosylated avidin;
the following discussion applies in any of these cases.)

Figures 1a-d depict the structure of streptavidin in a schematic form.

Fig. 1a and 1b

Fig. 1a (corresponding top view shown in Fig. 1c) shows the
approximate locations of the bound-biotin carboxyl groups ("biotin
binding sites" B1-B4), which are on alternate vertices of an imaginary
rectangular solid, embedded in the streptavidin tetramer, with the
dimensions shown. Part of this solid, showing site B2, is also visible
in Figs. 1b and 1d. (Fig. 1d also shows site B1.) (These dimensions
were computed from measured inter-atomic distances in the PDB files
referred to in the main text, but the atoms used to represent the binding
sites were not parts of the carboxyl groups themselves, but were
sulfur atoms within sulfate ions bound in approximately the same
locations. The resulting error in binding site locations is estimated
to be less than 1 A (Angstrom) in any direction, based on comparisons
between PDB files containing either bound sulfate or bound biotin.)

Fig. 1b (corresponding top view shown in Fig. 1d) shows a highly
schematic view of a streptavidin tetramer, along with the estimated
length of each segment of the shortest paths (over the protein
surface) which connect various pairs of binding sites. All path
segments not shown are related by symmetry to one of the ones
shown. (The paths themselves can be best seen in Fig. 2, although
only the shortest path, from B1 to B3, is represented there.)

Fig. 1c and 1d : Top Views

The actual shape of the tetramer looks quite different from the shape shown,
but the locations and lengths of the shortest over-surface paths (as visually
inferred from the PDB file) are approximately correct. The figure has the
same 2-fold rotational symmetries as the tetramer, as well as, for simplicity
of presentation, additional mirror symmetries (of the overall shape only,
not of the binding site locations) which the tetramer does not have.

Sites B1 and B3 are connected by three segments in succession of lengths
13 A, 22 A, and 13 A (the last segment is on the bottom and thus not visible
in the figure). An alternative path from B1 to B3, going behind the protein
(not shown), is much longer, with segment lengths of 24 A, 22 A, and 24
A.

Sites B1 and B4 are connected by segments of lengths 13 A, 22 A, and
24 A (as well as by another path of the same length behind the protein,
not shown, with segment lengths 24 A, 22 A, 13 A).

All other pairs of sites are related by symmetry to one of these pairs
(or to the B1-B2 pair, with a much shorter single-segment path of 22 A,
not shown).

The procedure described in the main text depends on the difference between
the shortest-path length connecting sites B1 and B3, and that connecting
sites B1 and B4. This difference is estimated as 11 A (24 A minus 13 A).
The accuracy of this estimate depends only on the accuracy of the 24 A
and 13 A segment length estimates (since the 22 A segment and one of the
13 A segments are shared by both paths being compared).

The actual path taken by the ssDNA backbones and dsDNA helix (as shown
in Fig. 2) would of course be longer due to the DNA's necessary separation
from the protein surface, adding perhaps 4 A per corner turned, but this
effect is approximately the same for both paths. The effective path lengths
will have to be determined by experiment, as discussed in the main text, but
it is likely that the actual path-length difference will be almost as great
as that for the idealized paths shown in this figure.

We use the following facts about the structure of the
streptavidin tetramer (Fig. 1) (a modified PDB file showing the features
discussed is available on request):

Each monomer provides one biotin binding site, and all monomers (and thus
all binding sites) are equivalent to each other (i.e. are related by some
symmetry transformation of the tetramer as a whole).

In spite of the equivalence of individual sites, different pairs of sites
can occur in three different relative positions (i.e., given any one site,
each of the other three sites is distinguishable relative to the chosen
site). We use the different over-surface (geodesic) distances between different
pairs of sites to distinguish (by presence or absence of hybridization)
between the conjugates in which the biotinylated ssDNA has been attached
in the desired or undesired arrangement. (More precisely, we use differences
between the lengths of specific combinations of ssDNA/dsDNA which would
be sufficiently long (taking the lack of flexibility of dsDNA into account)
to join different pairs of sites.)

The tetramer has 222-point symmetry, i.e. three mutually-perpendicular
2-fold axes of rotational symmetry. It is therefore chiral, even ignoring
the protein itself and considering only the spatial locations of the biotin
binding sites. These sites (i.e. the biotin carboxyl groups protruding
from the protein surface) are located at 4 of the 8 corners of an imaginary
rectangular solid (embedded within the protein) of dimensions 9 by 20 by
28 Angstroms (Fig. 1a, and calculations from PDB coordinates). The pairs
of sites on the same face of the protein are at corners diagonally across
from each other on the 9 by 20 Angstrom faces of this imaginary solid.
(Thus the sites in each pair are about 22 Angstroms apart.)

Figures 2a-d depict the structure of desired end product MBB in a schematic form.

Fig. 2a and 2b

Fig. 2c and 2d : Top Views

Fig. 2 uses the same views as Fig. 1, but shows the locations of ssDNA
and dsDNA in the desired product of the initial mixing step (with the hybridization
which will only be possible for this product). ssDNA12 (see Fig. 5 in text
for nomenclature) is shown in blue, and ssDNA34 in red. The ssDNA ends
are labelled with 3' and 5' and with e1-e4 (for end 1 through end 4) as
in Fig. 5. The backbones of the hybridized dsDNA region are shown as small
rectangles forming a double helix (with the major groove facing the protein
surface). The segments which remain unhybridized are shown as straight
or wavy lines depending on whether they will be stretched to almost their
maximal lengths (true for the ssDNA bases shown as dots in Fig. 5) or will
remain free to move (true for the ssDNA bases shown as X's in Fig. 5).
The wavy lines (ssDNA X bases) are the ones intended to be left free for
further hybridization when the product MBB is assembled with other MBBs
in the DGAP process.

The two species of doubly-biotinylated ssDNA to be attached have structures
as follows:

X (upper-case): any DNA base which will be left unhybridized in the desired
MBB structure (and is used only when the MBB is later assembled by the
DGAP process)

x (lower-case): any DNA base which will be hybridized in the desired MBB
structure (choice of specific sequences will be discussed later)

. (period): a DNA base used solely as a spacer (i.e. it may be replaced
by some other flexible linker molecule if necessary)

(The precise number of bases to be used in specific segments will have
to be determined by experiment, though the lengths shown above are meant
to be approximately correct.)

(Biotin is often attached to DNA with a long linker arm; in the structures
above I am assuming it can be attached directly into the ssDNA backbone
with no additional linker arm. This is a reasonable assumption given the
wide range of biotin attachment configurations in use. If this is not true
we will need to use linkers other than ssDNA bases in place of some of
the DNA bases shown by periods above.)

Figure 3 shows some of the species of conjugates that can be obtained
as products of the initial mixing step. (As mentioned previously, the separation
between the two biotins on each ssDNA molecule is kept short enough that,
if both biotins bind to one streptavidin tetramer, they must bind at either
sites 1 and 2, or at sites 3 and 4, since all other pairs of sites are
separated by a greater distance than the biotins are.)

Figs. 3 and 4 show various products of the initial mixing step (and
subsequent hybridization). The individual subfigures are discussed in the
main text.

The ssDNAs are colored and labelled as in Fig. 2. Hybridized dsDNA regions
are shown as colored ssDNA backbones on the surface of an imaginary cylinder
representing the shape of the helix. Biotin is shown as a small T-shape.

Streptavidin tetramers are shown even more schematically than in the
previous figures, as squares with four T-shaped holes representing biotin-binding
sites. These squares are oriented as if the rectangular solid in Fig. 1a
was seen from the left side (not from the front or back). The biotin sites
in the tetramers shown in Figs. 3a and 3b are labelled B1-B4 accordingly.
Note that four different labellings would be valid, due to the symmetry
of the tetramer; in particular, Fig. 3b is identical to Fig. 3a if rotated
180 degrees around a vertical axis. (The other axes of 2-fold rotational
symmetry are the horizontal axis and the axis perpendicular to the plane
of the figures.)

It will be desirable to do the mixing at sufficiently low concentrations
of all ingredients that each ssDNA molecule or streptavidin tetramer usually
encounters only one other molecule at a time, to maximize the chance that
both biotins of one ssDNA molecule bind to the same streptavidin tetramer.
(A low concentration is also necessary to avoid aggregation of streptavidin
due to low solubility.) The initial mixing should probably be done under
denaturing conditions for DNA, so that individual ssDNA molecules are usually
encountered separately, but we will determine by experiment whether this
is actually better, and which ingredients should be in excess, for maximizing
the yield of the desired product and for ease of the final separation steps.

For ease of discussion, I will describe the separation in two steps even
though one combined step may suffice. The first step will remove all structures
other than the ones with exactly one protein and two ssDNAs per particle.
(Two such undesired particles are shown in Figs. 3h and 3i.) (We have not
yet determined which separation technique to use.
Niemeyer
et. al. [1994] have demonstrated separation of streptavidin-DNA conjugates
carrying varying numbers of DNA molecules by both ion-exchange chromatography
and non-denaturing PAGE. Isoelectric focusing might also be expected to
be useful.
Niemeyer et. al. [1994] have also demonstrated
gel-retardation of streptavidin-DNA conjugates by complementary DNA, which
might be necessary for separation of streptavidin-PNA conjugates if we
use PNA for other reasons.)

The remaining particles include some with two copies of the same ssDNA
(Figs. 3d-3g), and some with one ssDNA of each kind (Figs. 3a-3c); of the
latter, some have the desired geometrical arrangement of ssDNA (Figs. 3a
and 3b) and some have the other arrangement (Fig. 3c). (All arrangements
not shown are equivalent by symmetry to some arrangement which is shown.)

In order to distinguish between these species, we will design the ssDNA
sequences so that sequences 1 and 3 (where sequence n means the DNA between
end n and the nearest biotin) can hybridize as shown in Figs. 3a and 3b
(and Fig. 2), but only if the biotins nearest to ends 1 and 3 are attached
to sites 1 and 3 (or the symmetrically equivalent pairs of sites, 3 and
1, 2 and 4, or 4 and 2). This is possible because these pairs of sites
are significantly closer (along a path over the protein surface) than the
other pairs to which these biotins could be attached (see Fig. 1). (The
sensitivity of using the presence or absence of hybridization to distinguish
these inter-site distances is discussed below.)

Nondenaturing electrophoresis can be sensitive to differences in hybridization,
so we should be able to detect this difference directly, perhaps in the
same separation step in which we remove particles with the wrong amount
of DNA or protein. Also, particles whose 1 and 3 strands (or two 1 strands
or two 3 strands) are prevented from hybridizing to each other are likely
to form dimers (or larger polymeric aggregates) in which strands attached
to different proteins hybridize (Fig. 4); such particles would certainly
be separable in the initial step.

The end result will be the separation of the desired product from all
other products of the initial mixing step. In some applications the product
can be used directly (in spite of its attached double helix). For other
applications we may want to alter the ssDNA using further routine methods,
such as ligation to dsDNA with a long overhang (though whether the ligase
will be sterically hindered is unknown). Other possible modifications to
the final MBBs are discussed below.

Possible problems and variations

Undesired hybridization between otherwise-correct MBBs:
We are assuming that preparation conditions can be found such that hybridization
will usually occur between DNA sequences attached to a single particle
whenever that is possible, and occur between two particles, if at all,
only when neither single particle has the right arrangement of sequences
to permit hybridization. Although entropy of particle motion would disfavor
dimers, other factors might invalidate this assumption, which will thus
have to be tested. It may be possible to initiate rapid hybridization while
keeping the particles at sufficiently low concentration (or in a low-mobility
environment such as a gel) that most MBBs whose strands can hybridize intra-molecularly
do so, and then to "lock in" these hybridizations even after the particles
are more concentrated (either by maintaining strong enough hybridization
conditions, or by stabilizing the hybridized regions with intra-base-pair
disulfide bonds [Goodwin et. al. 1994]).

Sensitivity of hybridization to inter-site distance:
The path between the desired pair of sites is about 11 Angstroms shorter
than the path between the undesired pair of sites. Since the length of
the desired ssDNA/dsDNA configuration (as in Fig. 2) can be adjusted in
steps of about 5.9 Angstroms [Smith 1996,
Saenger
1984] by adding extra bases into single arms of the ssDNA region, or
by even finer increments by incorporating non-DNA linkers into the backbone
of the ssDNA part, it will be possible to find lengths of DNA which can
fit, in hybridized form, in only one of these two cases. (The distance
estimates are not precise, but the difference between the two paths is
better known than the path lengths themselves because the paths are composed
of identical segments for most of their lengths (Figs. 1 and 2).)

The dehybridization of one base pair of the dsDNA region would allow
the total length of the ssDNA/dsDNA combination to increase by only about
2.5 A (the difference between the length of ssDNA per base, 5.9 A, and
the rise of one base pair in dsDNA, 3.4 A). Thus, to make up for the 11
A difference, at least 4 of the base pairs would have to separate, so the
two ssDNA strands could not hybridize when their ends were separated by
the longer path length. (Both ssDNA and dsDNA can be stretched to 7 A per
base under sufficient tension [Smith 1996, Saenger
1984], but this seems unlikely to be preferred over a lack of hybridization.)

Possible errors in estimates of inter-site distances:

I estimated over-surface distances on the protein by visual inspection
of 3D protein models based on PDB files (to guess the shortest paths between
attachment sites; Fig. 1) and calculation of line-of-sight distances between
specific atoms appearing to lie on those paths. (I added sufficient length
to account for the actual path of a chain of bonded atoms being separated
from the surface by 4 A due to steric hindrance.) (Some parts of the actual
paths for the ssDNA segments must be slightly bent outwards compared to
the calculated paths, but this effect appears to be about the same for
each path.)

There are numerous sources of possible error in this estimation. Furthermore,
I have neglected consideration of possible interactions between the DNA
and protein other than steric hindrance, notably electrostatic forces,
which might strongly favor some paths and oppose others. Thus the actual
threshhold lengths for hybridization will have to be experimentally determined.
(I have developed outlines (not included here) of preliminary experimental
protocols to determine the path lengths necessary for hybridization, in
which each path's required length of DNA can be determined independently.
Therefore it should not be necessary to try all pairs of path lengths in
combination. Given the range of likely path lengths, a few trials in succession
should suffice to measure them, once the experimental technique itself
is debugged.)

Since the estimate of path-length-difference is more reliable than the
estimates for the path lengths themselves (as discussed above), and since
this difference is sufficient to prevent hybridization of 4 base pairs,
the margin of error suggests that this method is likely to be workable.

If it appears that electrostatic effects are causing problems, we have
the option of using NeutrAvidin (a form of deglycosylated avidin available
from Pierce Chemical Co., with an isoelectic point much closer to pH 7
than streptavidin), and/or PNA (which is uncharged), as well as increasing
the ionic strength.

Stability of biotin-streptavidin attachment:

The biotin-streptavidin interaction has a half life for exchange of
biotin of only a few days at 25 degrees C (precise value depends on pH)
[Green 1990, Jones & Kurzban
1995]. At 4 degrees C the half life is much longer (undetectable, according
to
Jones & Kurzban [1995]). The biotin-avidin
interaction is much more stable, with a half-life of 200 days at pH 7 and
25 degrees C [Green 1990]. It may be desirable to
stabilize streptavidin-based MBBs with additional covalent crosslinks between
the protein and the biotin-DNA conjugate, or to stabilize assemblies of
MBBs by covalent crosslinks between the proteins (as assumed will be desirable
in general for use of the DGAP process). Details of possible covalent crosslinks
have not been developed. Genes of both avidin and streptavidin are available
for genetic engineering if necessary for surface residue replacements [Green
1990,
Chandra & Gray 1990]. (Having to genetically
engineer the protein would remove some of the advantage of this method
in ease of development, compared to some of the other ones discussed here,
which can furthermore be applied to a wider variety of proteins. However,
once the modified protein was developed, this method would still be easier
to practice than the others when many MBBs differing only in DNA sequences
were desired.)

Stability of tetramers:

We cannot exclude the possibility that the individual monomers in streptavidin
or avidin tetramers might rearrange
[Jones & Kurzban
1995], or even exchange between proteins, at some slow rate, rendering
our "building blocks" unstable. If so, we will have to stabilize the tetramers
with covalent crosslinks of some kind, such as disulfide bonds between
genetically-introduced cysteines.

Oligo-tetramers:

Aggregation of streptavidin tetramers into higher-order forms, perhaps
covalently crosslinked, has been reported [Bayer et.
al. 1989]. If necessary, we can remove these from our starting materials
by gel filtration [Bayer et. al. 1990]. However,
since the final separation must remove tetramers linked intermolecularly
by doubly-biotinylated ssDNAs, it will probably not be necessary to purify
these from the starting material in a separate step.

Alternative separation methods:

The two ssDNA sequences shown are of the same length, but it would be
possible to use sequences of two different lengths, which would aid in
the separation of the particles with two copies of one sequence, and would
provide more information about the yield of each form (shown in Fig. 3)
after the initial mixing.

Application to other proteins:

An analogous method should be possible for other tetrameric proteins
with the same 222-point symmetry as streptavidin, using a DNA- attachment
method other than biotin binding. Low protein concentration will favor
attachment to one protein of both conjugation groups on each ssDNA during
the initial mixing/binding step, even if this conjugation is slow and possibly
reversible. Subsequent steps will be precisely analogous except for the
required DNA lengths being different.

Stabilization of hybridized region of MBB:

It may be possible to stabilize the hybridized dsDNA region in the final
MBB with intra-base-pair disulfide bonds [Goodwin et.
al. 1994] in case this would help with assembly of several MBBs.

Digestion of hybridized region with restriction enzymes:

Alternatively (or in addition) we may want to digest the hybridized
dsDNA region with a restriction enzyme, in which case that region will
have to be made at least 8 base pairs long [catalog, New England Biolabs,
1996/97, p. 238], and we will have to test for steric hindrance of the
enzyme by the core protein. (If steric hindrance occurs, this technique
could probably still be used if restriction itself was used as the test
for correctness of an MBB.) The hybridized regions left after restriction
would be sufficiently short (4 base pairs minus half of the sticky-end
length made by the enzyme) not to interfere with hybridization of the resulting
ssDNAs to ssDNAs introduced later, e.g. from other MBBs in a DGAP assembly.
Although the restriction-shortened ssDNAs would have the same sequences
for short lengths at their ends (due to the restriction site being palindromic),
they could still be different farther from the end, and thus be specific
for hybridization to different external sequences.