Forward. One of the
more powerful approaches which has been developed over the last
twenty years is the ability to make, isolate and use DNA that are
complementary (cDNA) to messenger RNAs (mRNAs) that encode proteins of
interest. At the end of the page we will briefly summarize the reason
that cDNAs can be extremely valuable in experimental design, although
many of these should already be obvious to most readers. It is also
true that there are dozens of different approaches to isolating cDNAs
of interest and these will be briefly described in the second part of
the section. We will begin by describing how a cDNA for a known
protein can be isolated using amino acid sequence information, which,
historically, was the first way that a cDNA encoding for a known
protein was isolated. In this first section we will also consider how
cDNAs are made and how cDNA libraries are constructed.

Some key terms. Let
us begin with three definitions:

Clone. First, what does it mean to clone? Cloning
refers to the isolation of a genetically homogeneous strain of any
organism. Within a clone, all organisms are identical to all other
organisms at a genetic level. It is possible to clone bacteria or
phage or even higher plants by isolating a single cell and
allowing that single cell to produce a colony, or a plaque, or an
entire plant. Since most plants are derived from a single cell
with a unique genotype, the act of rooting leaves to produce a
collection of identical African violets is cloning.

cDNA cloning is isolating and
amplifying a single, self-replicating organism that includes
within its DNA, a cDNA that is of interest to the experimenter.

In some cases cDNA cloning may simply
refer to the isolation of any single cDNA, since, in some
circumstances, an experimentalist may be interested in any cDNA
produced by a particular tissue. More frequently, the challenge
of cDNA cloning is not the isolation of any cDNA but the
selection of a single cDNA that is of interest to the
experimentalist for a particular reason. In the same way it is
possible to isolate clones that are not cDNA clones but rather
are genomic clones.

Genomic clones are simply DNA derived
directly from a genome. Genomic DNA would incorporate some
sequences such as introns or regulatory sequences that would
not be found in cDNAs.

Likewise, the isolation of a
monoclonal antibody refers to the isolation of a single cell
that expresses a mRNA for a unique antibody. Thus, making
monoclonal
antibodiesan exercise in cloning.

Library. The second concept that is important in
understanding the strategy needed to isolate a cDNA clone, a
genomic clone, or even a monoclonal antibody is the
idea of a library.* A library is defined simply as a collection of
different DNA sequences that have been incorporated into a
vector.

Vector. A vector* is simply a self-replicating organism which is
usually designed for the convenience of its experimental purpose.
For experimental convenience, vectors are usually derivatives of
viruses (plasmids, bacteriophages, animal viruses, retroviruses).
Since the essence of being able to isolate clones is the ability
to replicate to make large amounts of biological material, the
essence of a vector is that it must incorporate some mechanism of
reproduction. (i.e., one must expand the clone to make many copies
of the same organism). Thus, one would expect that vectors would
incorporate an origin of DNA replication. Since vectors are an
important experimental approach, a considerable amount of effort
has gone into designing vectors that are particularly easy to use
in an experimental sense. It would be impossible to provide even a
brief description of the tricks that have been incorporated into
various classes of vectors. The incorporation of selectable
markers is certainly a significant experimental advantage in many
cases.

Cloning strategy.
The underlying experimental approach to cloning can be divided into
four parts.

First, it is necessary to produce or
obtain a library including the sequence of interest.

Second, it is necessary to isolate
clones that may be of interest.

Third, it is essential to develop a
formal test to ensure that the clones that have been isolated are
indeed the correct clones.

Fourth, it is essential to put the cDNA
that has been isolated to some interesting biological use.

cDNA libraries.
Let's consider the important aspects of constructing a cDNA library.
A cDNA library simply contains sequences that are complementary to
mRNAs. There are a number of different criteria that might be used to
judge the quality of a cDNA library. A cDNA library is generally
better if the size of the inserts (that is the amount of continuous
cDNA in each clone) is large, ideally full-length. Ideally, no member
of the library should include cDNAs derived from different mRNAs
(this could be confusing). The library should be sufficiently large
that it contains the cDNA of interest (or,more precisely, it should
have enough independently derived clones that it contains the cDNA of
interest). In general this means that it should be representative of
all the mRNAs present in a particular tissue. Of course, choosing a
tissue that has a relatively large amount of the mRNA of interest is
an important experimental choice. In general it is easier to isolate
a cDNA from a library where it is represented many times than from a
library where it is present rarely. Some characteristics of a library
depend on the vector chosen. Vectors are frequently chosen because
they allow the screening of a large number of independent members of
the library with experimental ease. Some vectors are designed to
express only the cDNAs, while others have been modified to express
not only the cDNA but also to express it in a context so the cDNA is
made into a protein or a fusion protein. (Fusion proteins will be
discussed below.) Before using a cDNA library it is wise to determine
if it is a good quality library. More than one student has wasted
months of time screening a library that had no inserts or inserts so
short that they were of little value.

Making cDNA.
Generation of cDNAs can also be done by a wide variety of processes,
but, in virtually all cases, cDNA is generated by the enzyme
reverse
transcriptase* (RT) which has the
ability to use the information in an RNA to generate a complementary
DNA. Thus, reverse transcriptase is a RNA-dependent DNA polymerase.
Like all DNA polymerases it cannot initiate synthesis de novo but
depends on the presence of a primer. Since many mRNAs have a poly-A
tail at the 3' end (see polyadenylation*), oligo-dT is frequently used to prime DNA synthesis
(it is also possible, and frequently essential, to generate cDNAs by
using either random primers or primers designed to amplify a specific
mRNA). Once the initial cDNA has been generated it is generally
necessary to produce a second strand of DNA. Again, there are many
strategies for doing this, but a convenient mechanism involves
exposure of the DNA/RNA hybrid to a combination of RNAase-H and DNA
polymerase. RNAase-H has the ability to cause single-stranded nicks
in the RNA, and DNA polymerase can then use these single-stranded
nicks to initiate " second strand" DNA synthesis. This two-step
procedure has been optimized to maximize fidelity and length of
cDNAs.

Incorporating cDNA into the
vector. The
next challenge is to incorporate this collection of cDNA s into a
vector* so that it can be manipulated. One of the most
convenient ways of doing this is to attempt to manipulate the cDNAs
so that each one has a unique restriction site at those ends. To do
this, the cDNAs are frequently methylated with a specific methyl
transferase that incorporates a methyl group into particular
restriction site to protect them from the restriction enzyme that
will be used later. Any 3' or 5' extensions must be then either
eliminated by nuclease treatment or filled in with polymerase. This
produces a "blunt ended" molecule in which the 3' and 5' bases are in
"register". It is then possible to ligate a synthetic oligonucleotide
to the ends of this cDNA . Blunt end ligation is generally a low
efficiency process; but, by using a high concentration of these
synthetic oligonucleotides, it is possible to drive the reaction to
near completion. These synthetic oligonucleotides can either be
'linkers' (which are synthesized to have one blunt end and one end
that have an 'overhang' (i.e., region of single stranded DNA) that is
complementary to that produced by restriction enzymes or they can be
'adapters' (which are a double-stranded DNA molecule that can be
treated with a nuclease to produce the appropriate overhang).

The value of producing an overhang is that
it will facilitate the introduction of the cDNA into a vector. The
vector can also be prepared by treating it with the same nuclease, or
a nuclease that produces the same restriction site, to produce a
single-stranded region that is complementary to the single-stranded
region in the cDNA. Mixing the cDNA of interest with the vector in
the presence of ligase allows incorporation of the cDNA into the
vector. One of the experimental difficulties in doing this is that
the vector itself will have a high tendency to re-ligate to form a
vector without any cDNA insert. This is frequently minimized by
treating the vector with the phosphatase to remove the terminal
phosphates. These phosphates are required for ligase to act, so this
strategy prevents this unwanted side reaction.

The choice of the vector used also has an
important impact on experimental outcome. Initially, plasmids were
chosen as vectors and were modified to include markers that could be
used to determine whether a plasmid had been introduced into a
bacterial cell or whether there was a cDNA insert in the cloning
site. More recently, derivatives of bacteria phage lambda has been
made that can be effective vectors for cDNA cloning. The advantage of
bacteria phage lambda is that it is possible to isolate more
independent clones from a given amount of mRNA/cDNA and to screen a
higher number of clones using hybridization techniques. The extent of
understanding of lambda and lambda genetics has made it possible to
isolate lambda derivatives where some non-essential genes have been
removed making it possible to carry inserts of up to 11 kb of cDNA,
which is a convenient size and sufficient for the isolation of most
cDNAs. The lambda genome is a linear molecule when it is packaged
into the bacteriophage and the cDNA can be incorporated into the
central region of the DNA. The lambda "arms" (the more distal parts
of the DNA) encode all the essential information for replication of
lambda in an infectious cycle. The cloning site in lambda -gt10 was
chosen to interrupt genes that are essential for lambda to undergo
lysogeny*. If the lambda arms re-ligate in the absence of an
insert, and an appropriate host is chosen (hfl, for high frequency of
lysogeny), then these particles will not form plaques. Thus only
particles carrying an insert will form plaques. The remarkable power
of bacteriophage lambda as a vector is that once the cDNA has been
ligated into the lambda arms, the DNA can then be incorporated into a
phage particle in vitro. Extracts prepared from cells that have all
the necessary proteins for the assembly of lambda can then be mixed
with the library DNA and ATP and particles will be assembled! These
particles can then be used to infect E coli and each individual
plaque is an independent clonal population which represents a single
cDNA species. This ability can be used both to amplify the cDNA
library (which is somewhat dangerous because repeated amplification
can lead to a loss of some cDNA sequences) and for the screening of
the cDNA library to isolate the cDNA of interest.

Screening. A lambda
-gt10 library can be conveniently screened by plating it at
relatively high concentrations on a bacterial lawn of E coli. High
density screening allows the experimentalist to screen between
100,000 and 1,000,000 independent plaques on a single plate and makes
it theoretically possible to screen for a cDNA that is present only
at one copy per cell in a particular tissue. Screening is done by a
"replica plating" procedure. After the phage infect E coli and form
individual plaques, a perfect spatial representation of the infected
plaque can be produced by placing a piece of nitrocellulose on top of
the lawn of E-coli. Nitrocellulose binds DNA with great avidity and
so some of the DNA of each plaque can be transferred to
nitrocellulose paper or even several different nitrocellulose papers.
each nitrocellulose sheet should have a representation of the
original pattern of infected cells on a petri dish. The DNA from the
library can then be cross-linked to the filter and extraneous protein
can be washed off. The plaques of interest can then be screened using
a hybridization assay.

This takes us to the question of how a
library can be screened to isolate candidates for the cDNA of
interest. One of the most straight forward ways to do this is to take
advantage of DNA hybridization. If one can design an oligonucleotide
that is complementary to the mRNA of interest this can be used to
screen the library. Such an oligonucleotide probe can be designed by
sequence information from the amino acid sequence of a known protein.
In the 50s and 60s biochemical methods were developed to produce
amino acid sequence of overlapping fragments of known purified
proteins. Our task is much simpler. It is now necessary only to know
the amino acid sequence of a couple of regions of the protein. To do
this, a purified protein is generally digested either with proteases
or biochemical method to produce a series of peptides. Unlike
proteins, which must be treated with care to ensure that they retain
their native conformation, peptides can be treated as bio-organic
molecules. They can be fractionated by fairly standard procedures
using HPLC (high pressure liquid chromatography) which is capable of
resolving individual peptides. If a series of individual peptides can
be resolved, the sequence of those peptides can be determined, or at
least partially determined, by Edmund degradation. This series of
reaction cleaves individual amino acids one at a time from a peptide
and the resultant amino acid derivatives can be identified. This
procedure can produce sequence information on a series of peptides.
To do this intelligently it is essential that each of the peptides is
derived from a single protein molecule, and the criterion for
insuring that this is likely to be the case were discussed in the
section on protein purification. Edmund degradation works via
removing single amino acids from the N-terminal end and can in some
cases be applied to an intact protein, however, generally the
N-terminal amino group is chemically modified so this approach
usually fails.

Designing a probe. A
probe is an oligonucleotide that is designed to be complementary to
the mRNA of interest so that it can be used to screen a library. Of
course, any mRNA produces a unique polypeptide when it is translated;
but the reverse is not true. Because the triplet code is degenerate,
there are many mRNA sequences that might produce the same amino acid
sequences. Because of this the design of an oligonucleotide probe is
not straight forward, but a clever experimentalist can make good
choices in designing a probe. There are basically two strategies that
can be used. Either the experimentalist can choose to design a
relatively short oligonucleotide that hopefully will have a high
degree of homology to the mRNA of interest or the experimentalist can
choose to design a longer probe that is more likely to have some
regions that are not complementary to the mRNA of interest but
hopefully will have at least some sequences that can form a stable
duplex. In many cases it makes sense to make a mixture of different
probes, which are homologous, but have different bases in positions
where it is not possible to make a good prediction of which one
should be present. This is called degeneracy. A probe can frequently
be 64 or 128 fold degenerate; but too much degeneracy reduces the
specific activity of a probe and increases the chance of
hybridization with the 'wrong' cDNA. The choice of which strategy
depends on the amino acid sequences that are available.

There are a number of other factors that
should also be taken into consideration. In many organisms, there is
a preference for the use of particular triplets over the use of other
triplets (codon utilization). Designing a probe that has homology to
a known mRNA is generally not recommended since this may lead to the
cloning of the wrong cDNA. Testing of any probe for its
correspondence to known sequences in the data base is, thus,
essential. Using amino acids or amino acid combinations that have
fewer potential triplet coding sequences or lower degree of
degeneration (i.e., potential sequences) is of great importance. If
multiple related probes are possible, it is often sensible to screen
with a degenerate oligonucleotide. Once a probe or a series of probes
are designed they can be synthesized chemically and labeled to high
specific activity with 32 P. The
oligonucleotide probes can then be incubated with the nitrocellulose
filters to allow hybridization. Conditions are chosen to try to
maximize the specificity of the hybridization, but allow for some
potential mismatch. Most importantly, conditions should be chosen so
that hybridization which is non-specific or occurs only with high
degree of mismatch is not allowed. Thus, filters are washed to remove
unlabeled or non-specifically bound oligonucleotides. The filters can
then be autoradiographed to identify the regions of the filter
corresponding to a, hopefully, specific signal. Since the filter is a
replicate of the original plate, the experimentalist can then return
to this plate and isolate the original plaque or group of plaques
responsible for the signal. The plaques can then be re-plated on
fresh E coli (remember each plaque contains phage that can infecte
and replicate in E coli), and the process is repeated to eventually
isolate a single plaque that is responsible for the signal (i.e.,
Plaque
purify* it).

Test for specificity. While the isolation of a plaque that gives a strong
signal is clearly an exciting step, it is only the first step. The
next question must be asked: is the isolated cDNA really the one of
interest? It could certainly be a cDNA for a related protein or a
completely unrelated protein that just happened to have a sequence
that would hybridize to the probe that was chosen. Thus, it is
essential to develop some criteria that the right cDNA has been
isolated or eliminated from contention. There are a number of
criteria that will fulfill this need. The simplest takes advantage of
sequence information that can be obtained from the isolated cDNA. In
contrast to proteins where getting sequence information is
experimentally difficult, it is relatively straight forward to get
sequence information from DNA. The cDNA can be subcloned into a
convenient vector and sequence information can be obtained. If the
sequence of the cDNA that has been isolated also encodes the sequence
of some of the peptides that had been sequenced but not used to
design a probe, this is certainly persuasive evidence that the
correct cDNA has been isolated. It would be hard to argue that the
wrong cDNA had been isolated if sequence of several independent
peptides were all predicted by a cDNA.

Frequently, there are also elements of the
structure of the predicted protein that can be used to help confirm
the correctness of the cloning procedure. During the characterization
of the protein, it is frequently known that the protein for example
may be a membrane protein in which case one might predict the
existence of transmembrane sequences. Some proteins are known to be
phosphoproteins which suggest the presence of either serines or
threonines in particular context that will allow kinase to
phosphorylate them. Likewise, some proteins are glycosylated and the
presence of amino acid sequences that are associated with
glycosylation will also support the correctness of the cloning
approach. Again, it must be emphasized that all of these are simply
criteria that the correct cDNA has been isolated. These must be used
by the experimentalist to develop a convincing case, but none are
absolutely fool-proof. In some cases, the pattern of expression of a
protein (a tissue-specific manner) or a change in mRNA in organisms
that carry a particular mutation that is known to influence the
activity of the protein of interest can be a powerful criterion that
will allow the experimentalist to make a persuasive case that the
correct cDNA has been isolated. One of the most convincing approaches
is to determine if the protein encoded by the isolated DNA has the
biolgical activity of interest, but it sometimes takes time to do
this experiment.

Getting full length cDNAs. In some cases, indeed in most cases, the cDNA that
is isolated will not be full-length, i.e. it will correspond only to
parts of mRNA but not the entire sequence. In this case it is
necessary to re-screen the library, generally using the cDNA that
already has been isolated to identify either a full-length cDNA or a
series of partial cDNAs that would encompass the entire cDNA of
interest. This brings up the interesting question of how an
experimentalist knows whether a full-length cDNA has indeed been
isolated. Consideration of basic molecular biology can provide a
number of clues in this question. The molecular weight of a mRNA can
be estimated by northern
analysis* and this can be compared
to the size of the cDNA that has been isolated. Of it is possible
that several mRNAs may be generated from a single gene by alternative
splicing and this should be remembered. A mRNA should include both a
coding region which has a long open-reading frame as well as
non-coding sequences (frequently called UTRs*, for untranslated regions) at both the
3' and 5' ends. An open
reading frame* is simply an
un-interrupted series of triplets that does not contain stop codons.
Such a coding sequence should predict a protein of an appropriate
molecular weight which can often be compared to the molecular weight
of the known protein. Upstream of the translation start site are
frequently, but not always, found stop codons. The 3' end of the
message frequently has a poly-A tail. There is almost always special
interest in clearly identifying the 5' end of the mRNA. This sequence
is often most difficult to obtain from a cDNA library since it
requires effective reverse transcriptase to the extreme end of the
mRNA. Often it is necessary to return to a cDNA library repeatedly or
use specialized approaches to isolate an authentic 5' end. Often, the
best way to identify sequences at the 5' end of a cDNA is to use
RACE, which is a PCR based technique to amplify DNA
sequences near either the 5' or the 3' end of a DNA. The authenticity
of a particular 5' end can be confirmed by doing
'primer
extension*' experiments. In this
technique, reverse transcriptase is used to extend an oligonucleotide
primer which has been designed to hybridize near the predicted 5' end
of a mRNA. The extension of such a primer should produce a polymer of
a specific and predicted size.

Other methods of isolating
cDNAs.

The choice of how to isolate cDNAs depends
on the interest of the investigator and the tools that are available.
Design of an oligonucleotide probe has been used effectively in many
cases but there are many other additional approaches that can be
used. A few of them will be listed and described in this
section.

Cloning from expression libraries. In many cases a vector can be designed so that the
cDNA will be expressed, frequently as a fusion protein. In this case
the cDNA has been incorporated into a vector in a position where it
is within a coding sequence of another protein. The vector also
incorporates promoter sequences that allows the protein to be
expressed (both transcribed and translated). When such a vector is
used to make a library it is called an expression
library*. Expression libraries have
the advantage and disadvantage that the protein is present. In some
cases this may mean that there may be selective pressure against the
expression a cDNA of interest, but in many cases this expression
allows for a novel screening approaches. The most straight forward of
these is the use of antibodies to screen a library.

Screening with an antibody is quite similar
to screening with an oligonucleotide probe, but in this case an
antibody to the protein is the reagent that is available. This
antibody can be generated experimentally, but it can also be
available because of interesting autoimmune response in an animal
model or in a human population. For example, some cancer patients
develop an autoimmune disease that leads to neuronal degeneration.
The antisera from these patients can then be used to isolate a gene
which produces a protein that is recognized by this antibody. The
antibody can be added to nitrocellulose filters under conditions
where it binds specifically and the antibody can be then detected by
a secondary antibody that is either labeled with an isotope or
covalently attached to an enzyme like horse radish peroxidase that
can be detected using standard enzymatic reactions. Here
is a good web site that provides more information on this type of
appraoch.

If a cDNA is thought to encode a soluble
factor that has a known biological effect, and if that effect can be
easily assayed, then the assay could be a way to screen the library,
although it may be difficult to screen a large number of independent
isolates.

Complementation.
Some genes can be isolated by a classic genetic complementation
approach. If there is a method to select for the expression of a
particular gene then this selection can be used to isolate a cDNA
that encodes for that gene. For example it is relatively straight
forward to select either for or against the presence of the enzyme
HGPRTase* (hypoxanthine guanine phospho ribosol transferase)
in E. coli or in eukaryotic cells. If HGPRTase-deficient E. coli can
be isolated and then transformed with an expression vector, those
cells expressing the appropriate activity would become HGPRTase+.
Since it is possible to select for such colonies, this would be an
easy way to isolate a cDNA for HGPRTase from any organism.
Complementation has been used to isolate many types of cDNAs
including some that regulate complex phenomenon like the cell cycle
or membrane trafficking. The power of this approach is that it
provides such strong evidence for specific in vivo function. Of
course, it is essential to independently establish that the correct
clone has been isolated.

Expression on the cell surface with antibody
screening. Cell surface receptors
are special interest in biology and they can sometimes be isolated
using an expression strategy. Cell surface molecules on lymphocytes
for example have been identified by the isolation of specific
monoclonal antibodies. Likewise, the ligand for many receptors has
been isolated before the nature of the receptor is established. In
both cases a cDNA for the cell surface molecule when expressed, will
lead to the presence of a binding site on the cell surface. This
binding site can be used to screen a library either by a method
analogous to the antibody screening mentioned above or by using the
ligand or antibody as an affinity reagent to "pan" for cells that
express the binding site.

Functional cloning of receptors. One of the more interesting classes of cell surface
molecules are molecules that encode ionic channels. Because of the
tremendous power and sensitivity of the electrophysiology
(electrophysiologists can even measure the function of a single
molecule!), the presence of one or a few mRNA molecules in a single
cell can produce enough ion channels to be detected relatively easily
using an electrophysiological approach. Injection of mRNAs in frog
oocytes can lead to the appearance of particular ion channels that
can be detected either because of their responsiveness to electrical
signals or the presence of extracellular ligands. This approach
provided a straight forward assay for the cell-surface receptor for
glutamic acid (glutamate), which is the most common neurotransmitter
receptor in the central nervous system. This type of approach can
either rely on expression vectors that can produce the mRNA of
interest or it can rely on a negative criterion. Co-injection of
cDNAs can squelch a signal by hybridizing specifically with the mRNA.
Of course the difficulty with any of these approaches is that it
becomes more difficult to screen a large number of mRNAs. This
problem has been successfully conquered by using strategies involving
'sib (for sibling) selection'. In this strategy, thousands of
independent clones are screened at once, and, once a signal is
identified in any one pool, the pool itself can then be subdivided
until an individual clone can be isolated.

Homology screening.
One of the most productive, although perhaps less creative approaches
to isolating cDNAs is homology
screening*. Once an interesting gene
has been isolated from one species, it is relatively straight forward
to use a low stringency hybridization strategy to isolate cDNAs from
another species. Likewise, additional family members from the same
species can frequently be identified. The power of this approach
should not be underestimated. Interesting mutants are frequently
obtained in Drosophila by using genetic screens and identifying the
existence of corresponding genes in humans can be tremendously
important. Likewise, because of the large population of humans in the
careful monitoring of their medical care, human genetic diseases are
proving an abundant source of interesting genes and eventually
interesting cDNAs. Determining the existence of such cDNAs in model
systems can then be extremely valuable. Good examples of this come
from the field of apoptosis. Some of the original genes like the ICE
protease were originally identified in studies of C. elegans and
subsequently human homologs of this genes were isolated. Likewise,
the human oncogene, bcl 2, was initially isolated by genetic studies
which led to the isolation of the cDNA and subsequently homologs were
identified in model systems.

PCR-based screens.
PCR-based screening is also a method to isolate novel cDNAs. After
two or more members of a family have been isolated, regions of
homology can be identified. These regions of homology are conserved
within the family, PCR primers can be designed and used to amplify
reverse transcriptase products of mRNAs in an appropriate tissue. The
molecular weight of known members of the family can be predicted and
novel mRNAs may give rise to novel amplification products. See the
section on proteins for a good
example of this. These amplification
products in turn can be used to screen cDNA libraries. In some cases
even a single region of conserved structure may be sufficient to
isolate novel genes using the following strategy. Reverse
transcriptase can be used to extend a primer which has been made to a
conserved sequence. Such products of course could be heterogeneous
because different reverse transcriptase molecules would extend to
different degrees. However, some restriction enzymes are capable of
cleaving single stranded DNA and treatment of such a product with an
enzyme of this type would produce a fragment of a unique size. Such a
fragment can then be homo-polymer tailed (i.e. a sequence of Cs can
be added to the end of the molecule) using terminal transferase. This
sequence of Cs can then be used as a site to anchor an
oligonucleotide primer containing a stretch of Gs. If this primer is
extended the resulting product will be suitable for PCR amplification
between the two primers that were used in its creation.

Plus/minus screening and differential
display*. Another useful
approach to isolating cDNAs of interest relies not on knowledge of
their primary structure, but rather on assumptions about their
expression. Both plus/minus screening and differential display rely
on strategies that seek to isolate cDNAs that are expressed in one
situation but not another. For example, growth factors like NGF or
PDGF and hormones like estrogen are known to induce the expression of
novel genes. Thus, a population of cells that are cultured or grown
in the presence and absence of such an experimental manipulation
(e.g., +/-NGF, +/-estrogen, + /- retinoic acid) should express some
genes in common, but have some distinct mRNAs. Likewise, tissues at
different developmental stages may have expression patterns that are
of special interest. Tissues that are related but distinct may also
express interesting subset of genes. There are presumably interesting
genes that are expressed in cerebellum, but not basal ganglia; or in
T cells, but not B cells. An isolation of those genes may give a clue
to the function of those tissues or the way gene expression is
regulated.

In plus/minus screening, mRNA is
isolated from two populations of cells and reverse transcribed to
produce a population of cDNAs. Aliquots of these cDNAs can then be
converted to probe by random hexamer priming and used to screen
duplicate lifts from a library (i.e., two nitrocellulose filters
produced from the same plate of plaques or cells. Any plaque or
colony that hybridizes duplicate lifts from a library to one probe
but not the other is a potential candidate for interest, and
differential expression can be tested by northern analysis or a
related approach.

Differential display is a simple
modification of PCR amplification. In this approach, mRNA is
reverse transcribed using a series of primers. Frequently primers
are chosen to have a random set of oligonucleotide and an oligo-dT
section that would hybridize to a poly-A tail. mRNAs that are
homologous to the randomly chosen sequence should be reverse
transcribed, producing a single-stranded cDNA. The addition of
another primer, again, randomly chosen will allow amplification of
a subset of the reverse transcribed cDNAs. Depending on the
distance between the two primers, fragments of varying molecular
weights will be obtained. By doing this procedure with mRNA that
has been isolated from two different cell populations, the pattern
of expression between the two cell types or cell states can be
determined. Again, an amplified product that is thought to be
unique to a particular cell type, can then be used a probe to
screen a library or test expression by northern analysis. Both of
these methods have been used to isolate large number of
interesting genes using only their expression pattern.

Two hybrid screening. One of the more active approaches to isolating cDNA
are the two-hybrid
screens*. These screens are named
because they take advantage of a specific protein -protein
interaction that occurs between two proteins each of which is itself
a hybrid protein. The entire assay relies on the ability of one part
of each of the hybrid protein to form a specific interaction that is
reasonably stable under physiological conditions with the other. A
number of variations of this approach have been developed, but they
all rely on the same feature.

In the most straight forward version a
test cell which expresses an easily assayed gene, like
beta-galactosidase, under the control of a well characterized
promoter is produced. The promoter is chosen so that it has a low
basal activity in the absence of stimulation from a specific
regulatory element.

The same cell is then transfected with
an expression vector for a hybrid protein. One part of the hybrid
protein is derived from a transcription factor which is designed
to recognize the DNA regulatory element. Binding to the site,
however, is not sufficient to induce gene expression; rather, a
specific mechanism to activate transcription is required. A second
part of this hybrid protein includes a sequence isolated from a
particular gene of interest. This protein can be derived from
another transcription factor, from a structural protein, or from
an intracellular signaling protein. The only requirement is that
the hybrid itself is not sufficient to activate expression of the
reporter gene.

This cell system is then transfected
with an expression library that also expresses a collection of
fused protein. In this case the fusion protein consists of two
part. One part of the fusion protein is coded by the collection of
the cDNA libraries that the experimentalist hopes may encode a
protein which will interact with target in the hybrid protein
already expressed in the cell. The second part of this fusion
protein is an activator of transcription, frequently the
activating region of VP16, a potent transcription factor. If a
cell is transfected with a hybrid protein that does not recognize
the hybrid protein already present in the cell (the bate) nothing
should happen.

In the rare case where VP16 is expressed
as a hybrid with the protein that interacts with the hybrid
already present in the cell, this should result in activation of
beta-galactosidase. Thus, the screen serves as an initial assay
for protein-protein interaction and is structured in such a way
that a large number of members of a cDNA library can be quickly
screened and selected for testing for specific interaction. Of
course, there are always the possibility that activation can occur
by a non-specific mechanism, but this possibility can be tested
for without too much difficulty.

Screening
by databases. The rapid accumulation
of sequence information and genetic data often allows scientists to
bypass the steps required to isolate cDNAs. For example, if partial
protein sequence or partial cDNA sequence is available, searching
data bases may result in identifying candidate clones that can be
ordered and tested to determine if they are the 'right' clone.
Databases include the sequence of entire genomes as well as short
sequences from cDNAs that serve to tag individual clones
(ESTs*, or expressed sequence tags).

Summary.
Each of these methods of screening a cDNA library provides a specific
screen or assay for cDNAs that may be of interest. Just as it is true
that when purifying a protein, one is likely to get what one assays
for, in screening a cDNA library one is likely to get what one
screens for. Determining whether the cDNA that has been isolated is
indeed the one that is of most interest to the experimentalist
requires additional tests. In the absence of understanding of what
those tests should be, it will makes little sense to do initial
screenings. Likewise, careful consideration of what are the best
screens for a specific purpose is likely to result in a more fruitful
search with a higher percentage of successes.

Why Isolate cDNAs?
The last topic to consider in this section is the question of why
isolation of cDNAs is such a powerful approach. A number of answers
quickly spring to mind.

1. Isolating cDNAs allows the
experimentalist to use the cDNA to develop expression vectors so
proteins of interest can be produced in high quantities, greatly
simplifying the task of protein purification (e.g. see baculovirus
expression*.)

2. Knowing the sequence of an amino acid
immediately gives access to the sequence of the protein. By
appreciating protein structure and studying the common motifs present
in known proteins, a great deal of information can be deduced about
the possible structure and/or function of the protein encoded by a
known cDNA. Presence of sequences can easily suggest the protein
product may be phosphorylated or may bind a particular small
biochemical molecule, like GTP.

3. The
availability of a mRNA allows one to quickly design assays for
studying the expression of the mRNA; labeling cDNA can be used to
determine the expression of mRNA using both northern analysis and
RNase protection and the subcellular distribution of RNase can be
determined by in situ hybridization. Each of these approaches
provides a specific value.

--Northern
analysis* can quickly determine
whether the level of expression changes with drug treatment,
hormones, or developmental stage. It reveals whether there are
several different mRNAs that are expressed. Northern analysis relies
on the fractionation of isolated mRNAs on agarose (or occasionally,
acrylamide) gels which are then transferred to nitrocellulose. The
transferred mRNA s are then detected by hybridization to a labeled
cDNA probe.

--RNase
protection* is a more sensitive
method of determining mRNA abundance. In contrast to northern
analysis, RNase protection relies on the resistance of a hybrid
molecule to digestion. Whereas mRNA is normally extremely sensitive
to ribonuclease, if RNA is isolated and then hybridized to a labeled
probe (either RNA or DNA) then the hybrid molecule or a portion of a
hybrid molecule can be protected from ribonuclease activity. The
amount of protected probe as well as its size can be easily
measured.

-DNA
arrays allow the determination of
the expression of huge numbers of mRNAs in a single experiment. It is
based on the hybridization to nucleic acids that are attached to a
solid phase support.

--RT-PCR.* even in the absence of a labeled cDNA probe,
knowledge of the sequence of a cDNA can allow a quantitation of
expression. Knowing a cDNA's sequence, PCR primers can be designed
and used to amplify a reverse transcriptase product although there
are certainly problems in doing this quantitatively, it can also be a
useful and powerful technique. This approach can also be adapted to
in situ approaches.

--in
situ hybridization*. The cDNA can
also be used to determine which cells, tissues, or developmental
stages produce a particular mRNA using in situ hybridization. Labeled
nucleic acid is incubated with fixed tissue or cells under conditions
where only specifically bound hybrid is stable. Auto radiography
reveals the position of endogenous RNA. Controls with RNase help
prove that hybridization is to RNA not genetic material.

3. As we have already noted above, knowledge
of mRNA sequence can allow for the cloning of homologous sequences
either from different species or additional members of the gene
families within a species.

4. Availability of a cDNA makes production
of both polyclonal and monoclonal antibodies much easier. Knowing the
sequence of a protein allows one to design and synthesize a peptide
that can be used as an antigen (anti-peptide
antigen). Thus, in some cases, an
antibody that recognizes a specific protein can be produced without
ever purifying that protein. It also allows the expression and
purification of a protein to be used as an antigen.

5. While expression vectors are
extraordinarily useful in allowing production of large quantities of
a protein, they are perhaps even more useful in that they allow
production of not only a wild type protein but also production of a
mutant protein. Coupled with site-directed mutagenesis it is possible
to modify proteins to almost any end that the experimenter desires.
This allows tests of specific structure-function relationships. For
example, the importance of a particular phosphorylation site in the
activity of a protein or a specific residue in the binding of DNA can
be studied by expressing mutant proteins. Of course, all such studies
should be cognizant of the possibility that mutant proteins may be
poorly expressed or unstable. More mundane uses of expressed proteins
incorporating specific mutations include the production of specific
proteins that can be used for biochemical reagents or biochemical
products. Would introduction of additional sulfhydral bonds increase
the thermostability of a particular protein so it would be better for
use in PCR or even better as a protease-based stain remover in
laundry detergent?

6. Isolation of cDNAs means that the in vivo
function of a protein can be tested using a wide variety of
approaches. A protein can either be overexpressed or its expression
can be reduced or the function of a protein can be modified in a
number of different ways.

Perhaps the simplest approach is to
design an anti-sense sequence to a particular cDNA using either
normal DNA or phoshothionate bases which are relatively more
stable to hydrolysis. Addition of antisense can in some cases
reduce the expression of a protein allowing a test of protein
function in a particular system.

Another powerful appraoch is to use
short sequences of RNA (RNAi or siRNA) to promote the degredation of the RNA encoding a
protein of interest.

Likewise, cDNAs can provide an avenue to
modifying the gene that produces the mRNA. Homologous
recombination provides an avenue
in which any gene of interest can be disrupted or even
conditionally disrupted (discussed elsewhere). Study of the
property of such a mutant organism is a powerful way to determine
the function of a particular gene. See knock-out and knock-in.

A gene can be over expressed either in a
cell-line or in an organism. Introduction of a gene under a
particular promoter into the germ line allows propagation of an
organism that will mis-express or even conditionally mis-express a
particular gene. Such transgenes are again an extremely powerful way of
determining real biological function.

If there is sufficient knowledge of the
structure-function relationships of a protein, it is sometimes
possible to disrupt biological function without interfering with
the expression of the endogenous genes. Frequently it is possible
to design a modified protein that, when expressed, interferes with
the function of the endogenous gene in a dominant fashion. For
example, expression of mutant forms of the regulatory subunit of
the cAMP-dependent protein kinase interferes with the function of
the wild type regulatory subunits in a cell. This interference is
due to the fact that the kinase is normally composed of 2
regulatory and 2 catalytic subunits. If there is an
over-expression of a mutant regulatory form that does not bind
cAMP, then this is sufficient to completely disrupt the activity
of any kinase molecule that incorporates even a single regulatory
subunit. Likewise, over expression of mutant forms of signaling
molecules like ras that are modified so that they cannot transmit
a signal but retain sufficient native structure so that they can
receive signaling information can help determine the role of ras
or related molecules in a signaling pathway. This is frequently
called a dominant
negative approach*.

Thus, the availability of cDNA clones brings
many of the logical approaches of classical genetics to the molecular
biologist and allows critical tests of in vivo function that would
not otherwise be possible.

cDNAs and Experimental Design

The effort invested in isolating and
characterizing a cDNA is well rewarded by the large number of uses
that can be made of such a reagent.

1. The most obvious use of a cDNA is to
study expression of mRNA. This can
be done by northern analysis, by RNase protection assay, by PCR-based
detection, or by in situ hybridization. To detect mRNAs by
northern
analysis*, mRNA must be prepared and
fractionated by gel electrophoresis to separate mRNAs of different
molecular weight. The RNA on the gel can then be transferred to
nitrocellulose and detected by hybridization with a labeled cDNA.
Label is generally incorporated into cDNAs by primer extension using
a random selection of oligonucleotide hexemers. This technique has
the ability to distinguish mRNAs of different molecular weights and
so may reveal alternatively spliced products. RT-PCR*is frequently chosen because it is a more sensitive
method of identifying mRNAs. In this technique, a probe is generated
by using a vector that incorporates a promoter for an RNA polymerase.
This promoter can then transcribe in vitro a high specific activity
RNA, part of which can be designed to be homologous to any cDNA. This
synthesized RNA is of course extremely sensitive to ribonuclease
treatment; however, if it is hybridized to a preparation of mRNA that
includes a complementary sequence, a hybrid will be formed and this
will render the RNA resistant to RNase digestion. In some cases the
amount of protected RNA can be measured directly but it can also be
fractionated on gels to determine the molecular weight of the
protected species. Another useful approach is to take advantage of
sequence information and use RT-PCR* (reverse transcription-polymerase chain reaction).
In this approach, mRNA is isolated, reverse transcribed to generate a
complementary DNA, and this complementary DNA is then amplified using
PCR primers. Again this is a sensitive method of detecting mRNA, but
care is required to make quantitative claims about the amount of mRNA
present in various samples. Finally, hybridization can be carried out
in fixed tissues to determine what cell types express mRNA. Again,
mRNA can be detected by virtue of its hybridization with a labeled
probe. Alternatively, a modified RT-PCR protocol can also be done in
situ. Thus all of these methods have the ability to detect mRNA
abundance and changes in mRNA among various cell types in response to
development, and in response to hormones or other signaling
molecules.

2. The sequence of a mRNA is the quickest
and most reliable way to identify the sequence of the encoded
protein. The sequence of an cloned DNA can be determined relatively
quickly by either Maxam-Gilbert
Sequencing* or Sanger
Sequencing*. With the rapidly
expanding DNA database and the appreciation of how specific amino
acid sequences can be used to define particular domains in proteins,
the sequence information can be used extremely profitably. For
example, the sequence of a protein can be usedto determine the
likelihood that particular regions of a protein will adopt an
alpha-helix configuration. Likewise particular sequences are
associated with particular functions or particular structures. The
zinc finger motif is a particular protein structure that can bind
zinc atoms with high affinity and this structure is frequently found
in DNA-binding proteins. Likewise, the helix-loop-helix structure
which includes two alpha-helices connected by a loop, is frequently
found in transcription factors. The catalytic triad is a sequence of
3 amino acids that is found in many proteases. Protein sequence will
also reveal the presence of particular sites for post-translational
modification. The sequences for addition of carbohydrates, fatty
acids, or phosphate groups are reasonably well conserved and the
presence of these sequences is strong indicator about the
post-translational modification of a protein. If alpha-helices are
predicted and show a high concentration of hydrophobic groups on
their surface, this is a strong indication that protein may have a
transmembrane segment. A repeated pattern of such a motif is found in
many signaling molecules. For example, the classic seven
transmembrane pattern that was originally found in bacterial
rhodopsin is also present in many cell-surface receptors. Of course,
any prediction made on the basis of amino acid sequence must be
confirmed, but primary sequence is often a powerful indication of
what experiments should be done.

3. The availability of a cDNA clone allows
the protein to be expressed in a variety of contexts. A cDNA can be
inserted into a variety of expression vectors for different purposes.
Perhaps the most obvious use of such an approach is to drive
expressions to extremely high levels. This produces a rich source of
protein that considerably eases the difficulty of protein
purification. This can make available abundant supplies of protein
for physiological testing or use as a reagent. A more striking use of
expression system was in the ability to express mutant proteins.
Since it is possible to mutate DNA sequences essentially at will, it
is possible to express not only the wild type proteins but also
related proteins that have particular mutations. These mutations, if
well designed, can be used to test particular structure-function
relationships within a protein. They can determine whether a
particular residue is important for catalytic activity or for
association with another protein. In a related and more practical
way, proteins can be modified for specific uses. One can incorporate
disulfide bonds to increase the thermal stability of proteins that
have industrial and commercial applications. Reagents that are used
in molecular applications can be modified so unwanted activities are
suppressed. For example, nuclease activities can be dissociated from
polymerase activities in DNA plolymerases. One of the most
interesting examples of expressing mutant proteins can be found in
the design of dominant negative mutant of a protein that can
interfere with the activity of an endogenous protein. For example, if
it is possible to separate the DNA-binding domain and the RNA
polymerase activating domain from a transcription factor, expression
of the DNA-binding domain in the absence of the activating domain
might be expected to interfere with the activity of the endogenous
domain. Many proteins function a multimers, so expression of a mutant
protein can frequently interfere with the activity of an endogenous
protein by interfering with protein-protein dimerization. This
strategy has been extremely useful in study of specific transcription
factors. Likewise, intracellular signaling requires sequential
interactions of a series of proteins. Expression of a mutant protein
that can interact with one member of the cascade but not the
subsequent downstream members can interfere with the function of
endogenous protein. This strategy has been used very profitably by
making truncated mutants of receptors that express only the
extracellular but not the intracellular domain of a protein or by
expressing mutant version of ras or other GTP-binding proteins that
transduce the signal within the cell.

4. The availability of mRNA sequence also
opens the possibility of taking a genetic approach to understanding
protein function. In many cases, expression of an antisense
oligonucleotide* or the presence of
a high concentration of synthetic anti-sense oligonucleotides can
suppress the translation of an endogenous mRNA, leading to a cell
that is depleted of a protein of interest. Analysis of such a cell or
tissue can help establish the function of a protein in vivo.
Likewise, information about the sequence of a cDNA or the gene
encoding it can be used to develop a strategy to disrupt or modify
the gene encoding the c-DNA. Using homologous recombination it is
possible to either disrupt and eliminate expression of a gene or to
force the expression of an altered gene product.

6. It is also possible to study the effect
of a forced expression of any gene product in any tissue of interest.
By taking advantage of understanding a particular promoter elements
(discussed on another page) that are required for the expression of a
protein in a particular tissue at a particular time, it is possible
to make a gene or a hybrid gene that expresses any protein of
interest. Thus, it is possible to determine the effect, for example
of overexpressing a neuropeptide gene on neural development or the
formation of a specific connection in the nervous system. It is
possible to determine whether the expression of a wild type or a
dominant negative form of a protein can interfere with any
developmental process or lead to the development of known diseases.
With the development of regulated expression vectors, it is possible
to control the expression of proteins by using small molecules like
tetracycline (see tTA*)

7. As we noted above, it is also possible to
take advantage of cDNA sequences to isolate homologous genes. Using
either low stringency hybridization or a PCR approach based on
knowledge of conserved regions of genes, it is frequently possible to
identify additional genes that are members of the family and maybe
biologically important in the absence of any knowledge of their
function.

8. Knowing the cDNA sequence of a protein
will frequently facilitate the development of antibodies and
monoclonal antibodies. Most simply, an overexpressed protein can be
purified and used as an antigen. Alternatively, careful consideration
of a cDNA sequence and the likely structure of the encoded protein
can be used to design peptides that can be used as antigens for the
production of either polyclonal or monoclonal antibodies, and this
will be discussed in the page devoted to antibodies*.

9. Lastly, a cDNA sequence can be used as a
probe to screen genomic libraries and isolate the gene encoding a
particular cDNA. This is an extremely valuable approach because it
provides a bridge between cDNA cloning and classic genetic analysis.
Once the gene for a cDNA has been mapped, it can be tested for its
association with a particular developmental or disease phenotypes. It
is possible to use classic genetic approaches to determine whether
mutations in a particular gene co-segregate with alterations in the
gene or its cDNA.

In conclusion, the availability of a cDNA
opens such a wide variety of experimental approaches and cDNA cloning
is such a powerful technology, that isolating a cDNA should be
considered, regardless of the ultimate experimental goal.