Antibody fragments such as Fabs possess properties
that can enhance protein and RNA crystallization
and therefore can facilitate macromolecular structure
determination. In particular, Fab BL3-6 binds to
an AAACA RNA pentaloop closed by a GC pair with
~100 nM affinity. The Fab and hairpin have served
as a portable module for RNA crystallization. The potential
for general application make it desirable to
adjust the properties of this crystallization module
in a manner that facilitates its use for RNA structure
determination, such as ease of purification, surface
entropy or binding affinity. In this work, we used both
in vitro RNA selection and phage display selection to
alter the epitope and paratope sides of the binding
interface, respectively, for improved binding affinity.
We identified a 5'-GNGACCC-3' consensus motif in
the RNA and S97N mutation in complimentarity determining
region L3 of the Fab that independently impart
about an order of magnitude improvement in affinity,
resulting from new hydrogen bonding interactions.
Using a model RNA, these modifications facilitated
crystallization under a wider range of conditions and
improved diffraction. The improved features of the
Fab-RNA module may facilitate its use as an affinity
tag for RNA purification and imaging and as a chaperone
for RNA crystallography.

Under the "RNA World" hypothesis, an early episode of
natural history on Earth used RNA as the only genetically encoded
molecule to catalyze steps in its metabolism catalysis. This, according
to the hypothesis, included RNA catalysts that used RNA cofactors.
However, the RNA World hypothesis places special demands on
prebiotic chemistry, which must now deliver not only four
ribonucleosides, but also must deliver the "functional" portion of these
RNA cofactors. While some (e.g. methionine) present no particular
challenges, nicotinamide ribose is special. Essential to its role in
biological oxidations and reductions, its glycosidic bond that holds a
positively charged heterocycle is especially unstable with respect to
cleavage. Nevertheless, we are able to report here a prebiotic
synthesis of phosphorylated nicotinamide ribose under conditions that
also conveniently lead to the adenosine phosphate components of
this and other RNA cofactors.

Directed evolution was first applied to diverse libraries of DNA and RNA molecules a
quarter century ago in the hope of gaining technology that would allow the creation of receptors,
ligands, and catalysts on demand. Despite isolated successes, the outputs of this technology have been
somewhat disappointing, perhaps because the four building blocks of standard DNA and RNA have
too little functionality to have versatile binding properties, and offer too little information density
to fold unambiguously. This review covers the recent literature that seeks to create an improved
platform to support laboratory Darwinism, one based on an artificially expanded genetic information
system (AEGIS) that adds independently replicating nucleotide "letters" to the evolving "alphabet".

A common criticism of "prebiotic chemistry research" is that it is done
with starting materials that are too pure, in experiments that are too directed, to get
results that are too scripted, under conditions that could never have existed on Earth.
Planetary scientists in particular remark that these experiments often arise simply
because a chemist has a "cool idea" and then pursues it without considering external
factors, especially geological and planetary context. A growing literature addresses
this criticism and is reviewed here. We assume a model where RNA emerged
spontaneously from a prebiotic environment on early Earth, giving the planet its
first access to Darwinism. This "RNA First Hypothesis" is not driven by the intrinsic
prebiotic accessibility; quite the contrary, RNA is a "prebiotic chemist's nightmare."
However, by assuming models for the accretion of the Earth, the formation of the
Moon, and the acquisition of Earth's "late veneer," a reasonable geological model
can be envisioned to deliver the organic precursors needed to form the nucleobases
and ribose of RNA. A geological model having an environment with dry arid land
under a carbon dioxide atmosphere receiving effluent from serpentinizing igneous
rocks allows their conversion to nucleosides and nucleoside phosphates. Mineral
elements including boron and molybdenum prevent organic material from devolving
to form "tars" along the way. And dehydration and activation allows the formation of
oligomeric RNA that can be stabilized by adsorption on available minerals.

Synthetic nucleobases presenting non-Watson-Crick arrangements of hydrogen bond donor and acceptor groups can form additional nucleotide pairs that stabilize duplex DNA independent of the standard A:T and G:C pairs. The pair between 2-amino-3-nitropyridin-6-one 2'-deoxyriboside (presenting a {donor-donor-acceptor} hydrogen bonding pattern on the Watson-Crick face of the small component, trivially designated Z) and imidazo[1,2-a]-1,3,5-triazin-4(8H)one 2'-deoxyriboside (presenting an {acceptor-acceptor-donor} hydrogen bonding pattern on the large component, trivially designated P) is one of these extra pairs for which a substantial amount of molecular biology has been developed. Here, we report the results of UV absorbance melting measurements and determine the energetics of binding of DNA strands containing Z and P to give short duplexes containing Z:P pairs as well as various mismatches comprising Z and P. All measurements were done at 1 M NaCl in buffer (10 mM Na cacodylate, 0.5 mM EDTA, pH 7.0). Thermodynamic parameters (ΔH°, ΔS°, and ΔG°37) for oligonucleotide hybridization were extracted. Consistent with the Watson-Crick model that considers both geometric and hydrogen bonding complementarity, the Z:P pair was found to contribute more to duplex stability than any mismatches involving either nonstandard nucleotide. Further, the Z:P pair is more stable than a C:G pair. The Z:G pair was found to be the most stable mismatch, forming either a deprotonated mismatched pair or a wobble base pair analogous to the stable T:G mismatch. The C:P pair is less stable, perhaps analogous to the wobble pair observed for C:O6-methyl-G, in which the pyrimidine is displaced into the minor groove. The Z:A and T:P mismatches are much less stable. Parameters for predicting the thermodynamics of oligonucleotides containing Z and P bases are provided. This represents the first case where this has been done for a synthetic genetic system.

The prebiotic significance of laboratory experiments that study the interactions between oligomeric RNA and mineral species is difficult to know. Natural exemplars of specific minerals can differ widely depending on their provenance. While laboratory-generated samples of synthetic minerals can have controlled compositions, they are often viewed as "unnatural". Here, we show how trends in the interaction of RNA with natural mineral specimens, synthetic mineral specimens, and co-precipitated pairs of synthetic minerals, can make a persuasive case that the observed interactions reflect the composition of the minerals themselves, rather than their being simply examples of large molecules associating nonspecifically with large surfaces. Using this approach, we have discovered Periodic Table trends in the binding of oligomeric RNA to alkaline earth carbonate minerals and alkaline earth sulfate minerals, where those trends are the same when measured in natural and synthetic minerals. They are also validated by comparison of co-precipitated synthetic minerals. We also show differential binding of RNA to polymorphic forms of calcium carbonate, and the stabilization of bound RNA on aragonite. These have relevance to the prebiotic stabilization of RNA, where such carbonate minerals are expected to have been abundant, as they appear to be today on Mars.

Nucleobase pairs in DNA match hydrogen-bond donor and
acceptor groups on the nucleobases. However, these can adopt
more than one tautomeric form, and can consequently pair with
nucleobases other than their canonical complements, possibly
a source of natural mutation. These issues are now being revisited
by synthetic biologists increasing the number of replicable
pairs in DNA by exploiting unnatural hydrogen bonding patterns,
where tautomerism can also create mutation. Here, we combine
spectroscopic measurements on methylated analogs of isoguanine
tautomers and tautomeric mixtures with statistical analyses
to a set of isoguanine analogs, the complement of isocytosine, the
5th and 6th "letters" in DNA.

According to a current "RNA first" model for the origin of life, RNA
emerged in some form on early Earth to become the first biopolymer
to support Darwinism here. Threose nucleic acid (TNA) and
other polyelectrolytes are also considered as the possible first Darwinian
biopolymer(s). This model is being developed by research
pursuing a "Discontinuous Synthesis Model" (DSM) for the formation
of RNA and/or TNA from precursor molecules that might have
been available on early Earth from prebiotic reactions, with the goal
of making the model less discontinuous. In general, this is done by
examining the reactivity of isolated products from proposed steps
that generate those products, with increasing complexity of the reaction
mixtures in the proposed mineralogical environments. Here,
we report that adenine, diaminopurine, and hypoxanthine nucleoside
phosphates and a noncanonical pyrimidine nucleoside (zebularine)
phosphate can be formed from the direct coupling reaction of
cyclic carbohydrate phosphates with the free nucleobases. The reaction
is stereoselective, giving only the β-anomer of the nucleotides
within detectable limits. For purines, the coupling is also
regioselective, giving the N-9 nucleotide for adenine as a major
product. In the DSM, phosphorylated carbohydrates are presumed
to have been available via reactions explored previously [Krishnamurthy
R, Guntha S, Eschenmoser A (2000) Angew Chem Int Ed
39:2281-2285], while nucleobases are presumed to have been available
from hydrogen cyanide and other nitrogenous species formed
in Earth's primitive atmosphere.

To the astrobiologist, Enceladus offers easy access to a potential subsurface biosphere via the intermediacy of a
plume of water emerging directly into space. A direct question follows: If we were to collect a sample of this
plume, what in that sample, through its presence or its absence, would suggest the presence and/or absence of
life in this exotic locale? This question is, of course, relevant for life detection in any aqueous lagoon that we
might be able to sample. This manuscript reviews physical chemical constraints that must be met by a genetic
polymer for it to support Darwinism, a process believed to be required for a chemical system to generate
properties that we value in biology. We propose that the most important of these is a repeating backbone charge;
a Darwinian genetic biopolymer must be a "polyelectrolyte". Relevant to mission design, such biopolymers are
especially easy to recover and concentrate from aqueous mixtures for detection, simply by washing the aqueous
mixtures across a polycharged support. Several device architectures are described to ensure that, once captured,
the biopolymer meets two other requirements for Darwinism, homochirality and a small building block "alphabet."
This approach is compared and contrasted with alternative biomolecule detection approaches that seek
homochirality and constrained alphabets in non-encoded biopolymers. This discussion is set within a model
for the history of the terran biosphere, identifying points in that natural history where these alternative approaches
would have failed to detect terran life. Key Words: Enceladus-Life detection-Europa-Icy moon-
Biosignatures-Polyelectrolyte theory of the gene. Astrobiology 17, 840-851.

One frontier in synthetic biology seeks to move artificially
expanded genetic information systems (AEGIS) into natural living cells and to
arrange the metabolism of those cells to allow them to replicate plasmids built
from these unnatural genetic systems. In addition to requiring polymerases that
replicate AEGIS oligonucleotides, such cells require metabolic pathways that
biosynthesize the triphosphates of AEGIS nucleosides, the substrates for those
polymerases. Such pathways generally require nucleoside and nucleotide kinases
to phosphorylate AEGIS nucleosides and nucleotides on the path to these
triphosphates. Thus, constructing such pathways focuses on engineering natural
nucleoside and nucleotide kinases, which often do not accept the unnatural
AEGIS biosynthetic intermediates. This, in turn, requires assays that allow the
enzyme engineer to follow the kinase reaction, assays that are easily confused by
ATPase and other spurious activities that might arise through "site-directed
damage" of the natural kinases being engineered. This article introduces three assays that can detect the formation of both natural
and unnatural deoxyribonucleoside triphosphates, assessing their value as polymerase substrates at the same time as monitoring
the progress of kinase engineering. Here, we focus on two complementary AEGIS nucleoside diphosphates, 6-amino-5-nitro-3-
(1'-B-D-2'-deoxyribofuranosyl)-2(1H)-pyridone and 2-amino-8-(1'-B-D-2'-deoxyribofuranosyl)-imidazo[1,2-a]-1,3,5-triazin-
4(8H)-one. These assays provide new ways to detect the formation of unnatural deoxyribonucleoside triphosphates in vitro
and to confirm their incorporation into DNA. Thus, these assays can be used with other unnatural nucleotides.

As with natural nucleic acids, pairing between artificial nucleotides can be influenced by tautomerism, with different placements of protons on the heterocyclic nucleobase changing patterns of hydrogen bonding that determine replication fidelity. For example, the major tautomer of isoguanine presents a hydrogen bonding donor-donor-acceptor pattern complementary to the acceptor-acceptor-donor pattern of 5-methylisocytosine. However, in its minor tautomer, isoguanine presents a hydrogen bond donor-acceptor-donor pattern complementary to thymine. Calculations, crystallography, and physical organic experiments suggest that this tautomeric ambiguity might be "fixed" by replacing the N-7 nitrogen of isoguanine by a CH unit. To test this hypothesis, we prepared the triphosphate of 2'-deoxy-7-deazaiso-guanosine and used it in PCR to estimate an effective tautomeric ratio "seen" by Taq DNA polymerase. With 7-deazaisoguanine, fidelity-per-round was ~92%. The analogous PCR with isoguanine gave a lower fidelity-per-round of ~86%. These results confirm the hypothesis with polymerases, and deepen our understanding of the role of minor groove hydrogen bonding and proton tautomerism in both natural and expanded genetic "alphabets", major targets in synthetic biology.

In its "grand challenge" format in chemistry, "synthesis" as an activity sets out a goal that is
substantially beyond current theoretical and technological capabilities. In pursuit of this
goal, scientists are forced across uncharted territory, where they must answer unscripted
questions and solve unscripted problems, creating new theories and new technologies in
ways that would not be created by hypothesis-directed research. Thus, synthesis drives discovery
and paradigm changes in ways that analysis cannot. Described here are the products
that have arisen so far through the pursuit of one grand challenge in synthetic biology:
Recreate the genetics, catalysis, evolution, and adaptation that we value in life, but using
genetic and catalytic biopolymers different from those that have been delivered to us by
natural history on Earth. The outcomes in technology include new diagnostic tools that have
helped personalize the care of hundreds of thousands of patients worldwide. In science, the
effort has generated a fundamentally different view of DNA, RNA, and how they work.

Laboratory in vitro evolution (LIVE) might deliver
DNA aptamers that bind proteins expressed on the surface of
cells. In this work, we used cell engineering to place glypican 3
(GPC3), a possible marker for liver cancer theranostics, on the
surface of a liver cell line. Libraries were then built from a sixletter
genetic alphabet containing the standard nucleobases and
two added nucleobases (2-amino-8H-imidazo[1,2-a]-
[1,3,5]triazin-4-one and 6-amino-5-nitropyridin-2-one),
Watson-Crick complements from an artificially expanded
genetic information system (AEGIS). With counterselection
against non-engineered cells, eight AEGIS-containing aptamers
were recovered. Five bound selectively to GPC3-overexpressing
cells. This selectionâ€“counterselection scheme had
acceptable statistics, notwithstanding the possibility that cells
engineered to overexpress GPC3 might also express different
off-target proteins. This is the first example of such a combination.

This paper combines two advances to detect MERS-CoV, the causative agent of Middle East Respiratory Syndrome, that have emerged over the past few years from the new field of "synthetic biology". Both are based on an older concept, where molecular beacons are used as the downstream detection of viral RNA in biological mixtures followed by reverse transcription PCR amplification. The first advance exploits the artificially expanded genetic information systems (AEGIS). AEGIS adds nucleotides to the four found in standard DNA and RNA (xNA); AEGIS nucleotides pair orthogonally to the A:T and G:C pairs. Placing AEGIS components in the stems of molecular beacons is shown to lower noise by preventing unwanted stem invasion by adventitious natural xNA. This should improve the signal-to-noise ratio of molecular beacons operating in complex biological mixtures. The second advance introduces a nicking enzyme that allows a single target molecule to activate more than one beacon, allowing "signal amplification". Combining these technologies in primers with components of a self-avoiding molecular recognition system (SAMRS), we detect 50 copies of MERS-CoV RNA in a multiplexed respiratory virus panel by generating fluorescence signal visible to human eye and/or camera.

Noroviruses are the major cause of global viral gastroenteritis with short incubation times and small inoculums required for infection. This creates a need for a rapid molecular test for norovirus for early diagnosis, in the hope of preventing the spread of the disease. Non-chemists generally use off-the shelf reagents and natural DNA to create such tests, suffering from background noise that comes from adventitious DNA and RNA (collectively xNA) that is abundant in real biological samples, especially feces, a common location for norovirus. Here, we create an assay that combines artificially expanded genetic information systems (AEGIS, which adds nucleotides to the four in standard xNA, pairing orthogonally to A:T and G:C) with loop-mediated isothermal amplification (LAMP) to amplify norovirus RNA at constant temperatures, without the power or instrument requirements of PCR cycling. This assay was then validated using feces contaminated with murine norovirus (MNV). Treating stool samples with ammonia extracts the MNV RNA, which is then amplified in an AEGIS-RT-LAMP where AEGIS segments are incorporated both into an internal LAMP primer and into a molecular beacon stem, the second lowering background signaling noise. This is coupled with RNase H nicking during sample amplification, allowing detection of as few as 10 copies of noroviral RNA in a stool sample, generating a fluorescent signal visible to human eye, all in a closed reaction vessel.

In addition to completing the Watson-Crick nucleobase matching "concept" (big pairs with small,
hydrogen bond donors pair with hydrogen bond acceptors),
artificially expanded genetic information systems
(AEGIS) also challenge DNA polymerases with a
complete set of mismatches, including wobble mismatches.
Here, we explore wobble mismatches with AEGIS with
DNA polymerase 1 from Escherichia coli. Remarkably, we
find that the polymerase tolerates an AEGIS:standard
wobble that has the same geometry as the G:T wobble that
polymerases have evolved to exclude but excludes a wobble
geometry that polymerases have never encountered in
natural history. These results suggest certain limits to "structural analogy" and "evolutionary guidance" as tools
to help synthetic biologists expand DNA alphabets.

Reported here is a laboratory in vitro evolution (LIVE)
experiment based on an artificially expanded genetic
information system (AEGIS). This experiment delivers
the first example of an AEGIS aptamer that binds
to an isolated protein target, the first whose structural
contact with its target has been outlined and
the first to inhibit biologically important activities of
its target, the protective antigen from Bacillus anthracis.
We show how rational design based on secondary
structure predictions can also direct the use
of AEGIS to improve the stability and binding of the
aptamer to its target. The final aptamer has a dissociation
constant of ~35 nM. These results illustrate
the value of AEGIS-LIVE for those seeking to
obtain receptors and ligands without the complexities
of medicinal chemistry, and also challenge the
biophysical community to develop new tools to analyze
the spectroscopic signatures of new DNA folds
that will emerge in synthetic genetic systems replacing
standard DNA and RNA as platforms for LIVE.

Reported here is the crystal structure of a heterocycle that implements a donor–donor–acceptor hydrogen-bonding pattern, as found in the Z component [6-amino-5-nitropyridin-2(1H)-one] of an artificially expanded genetic information system (AEGIS). AEGIS is a new form of DNA from synthetic biology that has six replicable nucleotides, rather than the four found in natural DNA. Remarkably, Z crystallizes from water as a 1:1 complex of its neutral and deprotonated forms, and forms a ‘skinny’ pyrimidine–pyrimidine pair in this structure. The pair resembles the known intercalated cytosine pair. The formation of the same pair in two different salts, namely poly[[aqua(µ6-2-amino-6-oxo-3-nitro-1,6-dihydropyridin-1-ido)sodium]–6-amino-5-nitropyridin-2(1H)-one–water (1/1/1)], denoted Z-Sod, {[Na(C5H4N3O3)(H2O)]·C5H5N3O3·H2O}n, and ammonium 2-amino-6-oxo-3-nitro-1,6-dihydropyridin-1-ide–6-amino-5-nitropyridin-2(1H)-one–water (1/1/1), denoted Z-Am, NH4+·C5H4N3O3·C5H5-N3O3·H2O, under two different crystallization conditions suggests that the pair is especially stable. Implications of this structure for the use of this heterocycle in artificial DNA are discussed.

A widely held 'RNA first' model proposes that RNA gave organic
matter on Earth its first access to Darwinism. Such a proposal,
which requires a mechanism to generate RNA from a prebiotic 'soup',
must also manage the intrinsic instability of any RNA so formed. Here,
we show that silicon dioxide (silica, SiO2), in the form of synthetic opal,
adsorbs and stabilizes RNA from aqueous solution. The extent of absorption
on fully amorphous silica is less, as is the extent of adsorption
on the surface of crystalline quartz. We show that the RNA adsorbed on
opal is considerably more stable than the same RNA molecule free in
aqueous solution at pH 9.5. This provides a mechanism by which any
RNA formed in a prebiotic environment could have been concentrated
and stabilized so that it could have later participated in the first Darwinian
biology.

ABSTRACT: Deoxynucleoside kinase from D. melanogaster (DmdNK) has broad specificity; although it catalyzes the phosphorylation of natural pyrimidine more efficiently than natural purine nucleosides, it accepts all four 2'-deoxynucleosides and many analogues, using ATP as a phosphate donor to give the corresponding deoxynucleoside monophosphates. Here, we show that replacing a single amino acid (glutamine 81 by glutamate) in DmdNK creates a variant that also catalyzes the phosphorylation of nucleosides that form part of
an artificially expanded genetic information system (AEGIS). By shuffling hydrogen bonding groups on the nucleobases, AEGIS adds potentially as many as four additional nucleobase pairs to the genetic "alphabet". Specifically, we show that DmdNK Q81E creates the monophosphates from the AEGIS nucleosides dP, dZ, dX, and dK (respectively 2-amino-8-(1'-β-D-2'-deoxyribofuranosyl)-imidazo[1,2-a]-1,3,5-triazin-4(8H)-one, dP; 6-amino-3-(1'-β-D-2'-deoxyribofuranosyl)-5-nitro-1H-pyridin-2-one, dZ; 8-(1'β-D-2'-deoxy-ribofuranosyl)imidazo[1,2-a]-1,3,5-triazine-2(8H)-4(3H)-dione, dX; and 2,4-diamino-5-(1'-β-D-2'-deoxyribofuranosyl)-pyrimidine, dK). Using a coupled enzyme assay, in vitro kinetic parameters were obtained for three of these nucleosides (dP, dX, and dK; the UV absorbance of dZ made it impossible to get its precise kinetic parameters). Thus, DmdNK Q81E appears to be a suitable enzyme to catalyze the first step in the biosynthesis of AEGIS 2'-deoxynucleoside triphosphates in vitro and, perhaps, in vivo, in a cell able to manage plasmids containing AEGIS DNA.

Axiomatically, the density of information
stored in DNA, with just four nucleotides (GACT), is
higher than in a binary code, but less than it might be if
synthetic biologists succeed in adding independently
replicating nucleotides to genetic systems. Such addition
could also add additional functional groups, not found in
natural DNA but useful for molecular performance. Here,
we consider two new nucleotides (Z and P, 6-amino-5-
nitro-3-(1'-B-D-2'-deoxyribo-furanosyl)-2(1H)-pyridone
and 2-amino-8-(1'-B-D-2'-deoxyribofuranosyl)-imidazo-
[1,2-a]-1,3,5-triazin-4(8H)-one). These are designed to
pair via strict Watson?Crick geometry. These were added
to a laboratory in vitro evolution (LIVE) experiment; the
GACTZP library was challenged to deliver molecules that
bind selectively to liver cancer cells, but not to
untransformed liver cells. Unlike in classical in vitro
selection systems, low levels of mutation allow this system
to evolve to create binding molecules not necessarily
present in the original library. Over a dozen binding
species were recovered. The best had Z and/or P in their
sequences. Several had multiple, nearby, and adjacent Zs
and Ps. Only the weaker binders contained no Z or P at all.
This suggests that this system explored much of the
sequence space available to this genetic system and that
GACTZP libraries are richer reservoirs of functionality
than standard libraries.

Expanding the synthetic biology of artificially expanded genetic information systems (AEGIS) requires tools to make and analyze RNA molecules having added nucleotide "letters". We report here the development of T7 RNA polymerase and reverse transcriptase to catalyze transcription and reverse transcription of xNA (DNA or RNA) having two complementary AEGIS nucleobases, 6-amino-5-nitropyridin-2-one (trivially, Z) and 2-aminoimidazo[1,2a]-1,3,5-triazin-4(8H)-one (trivially, P). We also report MALDI mass spectrometry and HPLC-based analyses for oligomeric GACUZP six-letter RNA and the use of ribonuclease (RNase) A and T1 RNase as enzymatic tools for the sequence-specific degradation of GACUZP RNA. We then applied these tools to analyze the GACUZP and GACTZP products of polymerases and reverse transcriptases (respectively) made from DNA and RNA templates. In addition to advancing this 6-letter AEGIS toward the biosynthesis of proteins containing additional amino acids, these experiments provided new insights into the biophysics of DNA.

Expanded genetic systems are most likely to work
with natural enzymes if the added nucleotides pair with
geometries that are similar to those displayed by standard duplex
DNA. Here, we present crystal structures of 16-mer duplexes
showing this to be the case with two nonstandard nucleobases (Z,
6-amino-5-nitro-2(1H)-pyridone and P, 2-amino-imidazo[1,2-a]-
1,3,5-triazin-4(8H)one) that were designed to form a Z:P pair
with a standard "edge on" Watson?Crick geometry, but joined by
rearranged hydrogen bond donor and acceptor groups. One
duplex, with four Z:P pairs, was crystallized with a reverse
transcriptase host and adopts primarily a B-form. Another
contained six consecutive Z:P pairs; it crystallized without a
host in an A-form. In both structures, Z:P pairs fit canonical
nucleobase hydrogen-bonding parameters and known DNA helical forms. Unique features include stacking of the nitro group on
Z with the adjacent nucleobase ring in the A-form duplex. In both B- and A-duplexes, major groove widths for the Z:P pairs are
approximately 1 Angstrom wider than those of comparable G:C pairs, perhaps to accommodate the large nitro group on Z. Otherwise,
ZP-rich DNA had many of the same properties as CG-rich DNA, a conclusion supported by circular dichroism studies in
solution. The ability of standard duplexes to accommodate multiple and consecutive Z:P pairs is consistent with the ability of
natural polymerases to biosynthesize those pairs. This, in turn, implies that the GACTZP synthetic genetic system can explore
the entire expanded sequence space that additional nucleotides create, a major step forward in this area of synthetic biology.

Nucleic acid (NA)-targeted tests detect and quantify viral DNA and RNA (collectively xNA) to support
epidemiological surveillance and, in individual patients, to guide therapy. They commonly use polymerase
chain reaction (PCR) and reverse transcription PCR. Although these all have rapid turnaround,
they are expensive to run. Multiplexing would allow their cost to be spread over multiple targets, but
often only with lower sensitivity and accuracy, noise, false positives, and false negatives; these arise by
interactions between the multiple nucleic acid primers and probes in a multiplexed kit. Here we offer a
multiplexed assay for a panel of respiratory viruses that mitigates these problems by combining several
nucleic acid analogs from the emerging field of synthetic biology: (i) self-avoiding molecular recognition
systems (SAMRSs), which facilitate multiplexing, and (ii) artificially expanded genetic information systems
(AEGISs), which enable low-noise PCR. These are supplemented by "transliteration" technology,
which converts standard nucleotides in a target to AEGIS nucleotides in a product, improving hybridization. The combination supports a multiplexed Luminex-based respiratory panel that potentially differentiates influenza viruses A and B, respiratory syncytial virus, severe acute respiratory syndrome
coronavirus (SARS), and Middle East respiratory syndrome (MERS) coronavirus, detecting as few as 10
MERS virions in a 20-ml sample.

Mosquito-borne arboviruses are emerging world-wide as important human and animal pathogens. This
makes assays for their accurate and rapid identification essential for public health, epidemiological, ecological
studies. Over the past decade, many mono- and multiplexed assays targeting arboviruses nucleic
acids have been reported. None has become established for the routine identification of multiple viruses
in a "single tube" setting. With increasing multiplexing, the detection of viral RNAs is complicated by
noise, false positives and negatives. In this study, an assay was developed that avoids these problems
by combining two new kinds of nucleic acids emerging from the field of synthetic biology. The first is a
"self-avoiding molecular recognition system" (SAMRS), which enables high levels of multiplexing. The
second is an "artificially expanded genetic information system" (AEGIS), which enables clean PCR amplification
in nested PCR formats. A conversion technology was used to place AEGIS component into amplicon,
improving their efficiency of hybridization on Luminex beads. When Luminex "liquid microarrays" are
exploited for downstream detection, this combination supports single-tube PCR amplification assays that
can identify 22 mosquito-borne RNA viruses from the genera Flavivirus, Alphavirus, Orthobunyavirus. The
assay differentiates between closely-related viruses, as dengue, West Nile, Japanese encephalitis, and the
California serological group. The performance and the sensitivity of the assay were evaluated with dengue
viruses and infected mosquitoes; as few as 6-10 dengue virions can be detected in a single mosquito.

As one of its goals, synthetic biology seeks to
increase the number of building blocks in nucleic acids. While
efforts towards this goal are well advanced for DNA, they have
hardly begun for RNA. Herein, we present a crystal structure
for an RNA riboswitch where a stem C:G pair has been
replaced by a pair between two components of an artificially
expanded genetic-information system (AEGIS), Z and P, (6-
amino-5-nitro-2(1H)-pyridone and 2-aminoimidazo[
1,2-a]-1,3,5-triazin-4-(8H)-one). The structure
shows that the Z:P pair does not greatly change
the conformation of the RNAmolecule nor the details
of its interaction with a hypoxanthine ligand. This was
confirmed in solution by in-line probing, which also
measured a 3.7 nm affinity of the riboswitch for
guanine. These data show that the Z:P pair mimics the
natural Watson-Crick geometry in RNA in the first
example of a crystal structure of an RNA molecule
that contains an orthogonal added nucleobase pair.

Because diamonds have strongly bonded networks of carbon atoms, they offer the potential to support DNA-targeted analysis in architectures that require very stable DNA immobilization with very low DNA leakage. Further, their non-porous structures should allow diamond-immobilized DNA to easily gain access to enzymes in bulk solution. As part of our work to develop a molecular biology tool kit to transform immobilized DNA, we asked whether diamond-immobilized DNA could be cleaved by sequence-specific restriction endonucleases, despite the large sizes of those enzymes, the potential for "steric" obstruction from the diamond surface, and the possibility that the diamond surface might inactivate those enzymes. We report here that both standard and "nicking" restriction endonucleases cut diamond-immobilized single-stranded DNA, after it forms a duplex with a complementary strand of DNA delivered from solution. As a somewhat surprising result, we also discovered that restriction enzymes could cleave a fraction of the immobilized duplex DNA even if the complementary strand came not from solution, but rather from a separate diamond crystallite. This cleavage did not result from a failure of the attachment linkage that allowed the diffusion of leaked DNA through bulk solvent. Rather, the cleavage required physical proximity between crystallites, as confirmed by transmission electron microscopy. These results add to the tools that can use diamond-immobilized DNA, as well as define practical constraints on assay architectures where diamond-immobilized DNA is presumed to be isolated from other diamond-immobilized DNA particles.

Assays that target DNA or RNA (xNA) are highly sensitive, as small amounts of xNA can be amplified by PCR. Unfortunately, PCR is inconvenient in low resource environments, requiring equipment and power that may not be available in these environments. However, isothermal procedures that avoid thermal cycling are often confounded by primer dimers, off-target priming, and other artifacts. Here, we show how a "self avoiding molecular recognition system" (SAMRS) eliminates these artifacts to give clean amplicons in a helicase-dependent isothermal amplification (SAMRS-HDA). We also show that incorporating SAMRS into the 3'-ends of primers facilitates the design and screening of primers for HDA assays. Finally, we show that SAMRS-HDA can be twofold multiplexed, something difficult to achieve with HDA using standard primers. This shows that SAMRS-HDA is a more versatile approach than standard HDA with a broader applicability for xNA-targeted diagnostics and research.

Ethers are proposed here as the repeating backbone linking units in linear genetic biopolymers that might support Darwinian evolution in hydrocarbon oceans. Hydrocarbon oceans are found in our own solar system as methane mixtures on Titan. They may be found as mixtures of higher alkanes (propane, for example) on warmer hydrocarbon-rich planets in exosolar systems ("warm Titans"). We report studies on the solubility of several short polyethers in propane over its liquid range (from 85 to 231 K, or - 188°C to - 42°C). These show that polyethers are reasonably soluble in propane at temperatures down to ca. 200 K. However, their solubilities drop dramatically at still lower temperatures and become immeasurably low below 170 K, still well above the ~95 K in Titan's oceans. Assuming that a liquid phase is essential for any living system, and genetic biopolymers must dissolve in that biosolvent to support Darwinism, these data suggest that we must look elsewhere to identify linear biopolymers that might support genetics in Titan's surface oceans. However, genetic molecules wih polyether backbones may be suitable to support life in hydrocarbon oceans on warm Titans, where abundant organics and environments lacking corrosive water might make it easier for life to originate.

Synthetic biologists wishing to self-assemble large DNA (L-DNA) constructs from small DNA fragments made by automated synthesis need fragments that hybridize predictably. Such predictability is difficult to obtain with nucleotides built from just the four standard nucleotides. Natural DNA's peculiar combination of strong and weak G:C and A:T pairs, the context-dependence of the strengths of those pairs, unimolecular strand folding that competes with desired interstrand hybridization, and non-Watson–Crick interactions available to standard DNA, all contribute to this unpredictability. In principle, adding extra nucleotides to the genetic alphabet can improve the predictability and reliability of autonomous DNA self-assembly, simply by increasing the information density of oligonucleotide sequences. These extra nucleotides are now available as parts of artificially expanded genetic information systems (AEGIS), and tools are now available to generate entirely standard DNA from AEGIS DNA during PCR amplification. Here, we describe the OligArch (for "oligonucleotide architecting") software, an application that permits synthetic biologists to engineer optimally self-assembling DNA constructs from both six- and eight-letter AEGIS alphabets. This software has been used to design oligonucleotides that self-assemble to form complete genes from 20 or more single-stranded synthetic oligonucleotides. OligArch is therefore a key element of a scalable and integrated infrastructure for the rapid and designed engineering of biology.

Recombinase polymerase amplification (RPA) is an isothermal method to amplify nucleic acid sequences without the temperature cycling that classical PCR uses. Instead of using heat to denature the DNA duplex, RPA uses recombination enzymes to swap single-stranded primers into the duplex DNA product; these are then extended using a strand-displacing polymerase to complete the cycle. Because RPA runs at low temperatures, it never forces the system to recreate base-pairs following Watson–Crick rules, and therefore it produces undesired products that impede the amplification of the desired product, complicating downstream analysis. Herein, we show that most of these undesired side products can be avoided if the primers contain components of a self-avoiding molecular recognition system (SAMRS). Given the precision that is necessary in the recombination systems for them to function biologically, it is surprising that they accept SAMRS. SAMRS-RPA is expected to be a powerful tool within the range of amplification techniques available to scientists.

Paleogenetics is an emerging field that resurrects ancestral
proteins from now-extinct organisms to test, in the laboratory,
models of protein function based on natural history and Darwinian
evolution. Here, we resurrect digestive alcohol dehydrogenases
(ADH4) from our primate ancestors to explore the history of
primate–ethanol interactions. The evolving catalytic properties of
these resurrected enzymes show that our ape ancestors gained
a digestive dehydrogenase enzyme capable of metabolizing ethanol
near the time that they began using the forest floor, about 10
million y ago. The ADH4 enzyme in our more ancient and arboreal
ancestors did not efficiently oxidize ethanol. This change suggests
that exposure to dietary sources of ethanol increased in hominids
during the early stages of our adaptation to a terrestrial lifestyle.
Because fruit collected from the forest floor is expected to contain
higher concentrations of fermenting yeast and ethanol than similar
fruits hanging on trees, this transition may also be the first time our
ancestors were exposed to (and adapted to) substantial amounts
of dietary ethanol.

Artificial genetic systems have been developed
by synthetic biologists over the past two decades to include
additional nucleotides that form additional nucleobase pairs
independent of the standard T:A and C:G pairs. Their use in
various tools to detect and analyze DNA and RNA requires
polymerases that synthesize duplex DNA containing unnatural
base pairs. This is especially true for nested polymerase chain
reaction (PCR), which has been shown to dramatically lower noise in multiplexed nested PCR if nonstandard nucleotides are
used in their external primers. We report here the results of a directed evolution experiment seeking variants of Taq DNA
polymerase that can support the nested PCR amplification with external primers containing two particular nonstandard
nucleotides, 2-amino-8-(1'-B-D-2'-deoxyribofuranosyl)imidazo[1,2-a]-1,3,5-triazin-4(8H)-one (trivially called P) that pairs with
6-amino-5-nitro-3-(1'-B-D-2'-deoxyribofuranosyl)-2(1H)-pyridone (trivially called Z). Variants emerging from the directed
evolution experiments were shown to pause less when challenged in vitro to incorporate dZTP opposite P in a template.
Interestingly, several sites involved in the adaptation of Taq polymerases in the laboratory were also found to have displayed
"heterotachy" (different rates of change) in their natural history, suggesting that these sites were involved in an adaptive change
in natural polymerase evolution. Also remarkably, the polymerases evolved to be less able to incorporate dPTP opposite Z in the
template, something that was not selected. In addition to being useful in certain assay architectures, this result underscores the
general rule in directed evolution that "you get what you select for".

Methods to detect DNA and RNA (collectively
xNA) are easily plagued by noise, false positives, and false
negatives, especially with increasing levels of multiplexing in
complex assay mixtures. Here, we describe assay architectures
that mitigate these problems by converting standard xNA
analyte sequences into sequences that incorporate nonstandard
nucleotides (Z and P). Z and P are extra DNA building blocks
that form tight nonstandard base pairs without cross-binding
to natural oligonucleotides containing G, A, C, and T
(GACT). The resulting improvements are assessed in an
assay that inverts the standard Luminex xTAG architecture,
placing a biotin on a primer (rather than on a triphosphate).
This primer is extended on the target to create a standard
GACT extension product that is captured by a CTGA oligonucleotide attached to a Luminex bead. By using conversion, a
polymerase incorporates dZTP opposite template dG in the absence of dCTP. This creates a Z-containing extension product that
is captured by a bead-bound oligonucleotide containing P, which binds selectively to Z. The assay with conversion produces
higher signals than the assay without conversion, possibly because the Z/P pair is stronger than the C/G pair. These architectures
improve the ability of the Luminex instruments to detect xNA analytes, producing higher signals without the possibility of
competition from any natural oligonucleotides, even in complex biological samples.

This year marks the 50th anniversary of a proposal by Alex Rich that RNA, as a single biopolymer acting in two
capacities, might have supported both genetics and catalysis at the origin of life. We review here both published and
previously unreported experimental data that provide new perspectives on this old proposal. The new data include
evidence that, in the presence of borate, small amounts of carbohydrates can fix large amounts of formaldehyde that
are expected in an environment rich in carbon dioxide. Further, we consider other species, including arsenate,
arsenite, phosphite, and germanate, that might replace phosphate as linkers in genetic biopolymers. While linkages
involving these oxyanions are judged to be too unstable to support genetics on Earth, we consider the possibility
that they might do so in colder semi-aqueous environments more exotic than those found on Earth, where cosolvents
such as ammonia might prevent freezing at temperatures well below 273 K. These include the ammonia-water
environments that are possibly present at low temperatures beneath the surface of Titan, Saturn’s largest moon.

6-Aminopyridin-2-ones form Watson-Crick pairs with complementary purine analogues to add a third
nucleobase pair to DNA and RNA, if an electron-withdrawing group at position 5 slows oxidation and epimerization. In previous
work with a nucleoside analogue trivially named dZ, the electron withdrawing unit was a nitro group. Here, we describe an
analogue of dZ (cyano-dZ) having a cyano group instead of a nitro group, including its synthesis, pKa, rates of acid-catalyzed
epimerization, and enzymatic incorporation.

To explore the possibility of using restriction enzymes in a synthetic biology based on artificially expanded genetic information systems (AEGIS), 24 type-II restriction endonucleases (REases) were challenged to digest DNA duplexes containing recognition sites where individual Cs and Gs were replaced by the AEGIS nucleotides Z and P [respectively, 6-amino-5-nitro-3-(1'-?-d-2'-deoxyribofuranosyl)-2(1H)-pyridone and 2-amino-8-(1'-?-d-2'-deoxyribofuranosyl)-imidazo[1,2-a]-1,3,5-triazin-4(8H)-one]. These AEGIS nucleotides implement complementary hydrogen bond donor-donor-acceptor and acceptor-acceptor-donor patterns. Results allowed us to classify type-II REases into five groups based on their performance, and to infer some specifics of their interactions with functional groups in the major and minor grooves of the target DNA. For three enzymes among these 24 where crystal structures are available (BcnI, EcoO109I and NotI), these interactions were modeled. Further, we applied a type-II REase to quantitate the fidelity polymerases challenged to maintain in a DNA duplex C:G, T:A and Z:P pairs through repetitive PCR cycles. This work thus adds tools that are able to manipulate this expanded genetic alphabet in vitro, provides some structural insights into the working of restriction enzymes, and offers some preliminary data needed to take the next step in synthetic biology to use an artificial genetic system inside of living bacterial cells.

The next goals in the development of a synthetic biology that uses
artificial genetic systems will require chemistry-biology combinations that
allow the amplification of DNA containing any number of sequential and
nonsequential nonstandard nucleotides. This amplification must ensure that the
nonstandard nucleotides are not unidirectionally lost during PCR amplification
(unidirectional loss would cause the artificial system to revert to an all-natural
genetic system). Further, technology is needed to sequence artificial genetic
DNA molecules. The work reported here meets all three of these goals for a sixletter
artificially expanded genetic information system (AEGIS) that comprises
four standard nucleotides (G, A, C, and T) and two additional nonstandard
nucleotides (Z and P). We report polymerases and PCR conditions that amplify
a wide range of GACTZP DNA sequences having multiple consecutive
unnatural synthetic genetic components with low (0.2% per theoretical cycle)
levels of mutation. We demonstrate that residual mutation processes both introduce and remove unnatural nucleotides, allowing the
artificial genetic system to evolve as such, rather than revert to a wholly natural system. We then show that mechanisms for these
residual mutation processes can be exploited in a strategy to sequence "six-letter" GACTZP DNA. These are all not yet reported for
any other synthetic genetic system.

Nucleoside triphosphates having a 3'-ONH(2) blocking group have been prepared with and without fluorescent tags on their nucleobases. DNA polymerases were identified that accepted these, adding a single nucleotide to the 3'-end of a primer in a template-directed extension reaction that then stops. Nitrite chemistry was developed to cleave the 3'-ONH(2) group under mild conditions to allow continued primer extension. Extension-cleavage-extension cycles in solution were demonstrated with untagged nucleotides and mixtures of tagged and untagged nucleotides. Multiple extension-cleavage-extension cycles were demonstrated on an Intelligent Bio-Systems Sequencer, showing the potential of the 3'-ONH(2) blocking group in "next generation sequencing."

While "life" may universally be a self-sustaining chemical system capable of Darwinian evolution, alien life may be quite different in its chemistry from the terran life that we know here on Earth. In this case, it will be difficult to recognize, especially if it has not advanced beyond the single cell life forms that have dominated much of the terran biosphere. This review summarizes what we might infer from general physical and chemical law about how such "weird" life might be structured, what solvents other than water it might inhabit, what genetic molecules it might contain, and what metabolism it might exploit.

2 '-Deoxy-5-methylisocytidine is widely used in assays to personalize the care of patients infected with HIV, hepatitis C, and other infectious agents. However, oligonucleotides that incorporate 2'-deoxy-5-methylisocytidine are expensive, because of its intrinsic chemical instability. We report here a C-glycoside analog that is more stable and, in oligonucleotides, pairs with 2 '-deoxyisoguanosine, contributing to duplex stability about as much as a standard 2 '-deoxycytidine and 2 '-deoxyguanosine pair. (C) 2009 Elsevier Ltd. All rights reserved.

The syntheses of N,N'-dibenzyl-2,4-diaminopyrimidine-2'-deoxyribonucleoside and 1-methyl-2'-deoxypseudoisocytidine via Heck coupling are described. A survey of the attempts to use the Heck coupling to synthesize N,N'-dibenzyl-2,4-diaminopyrimidine-2'-deoxyribonucleoside is provided, indicating a remarkable diversity in outcome depending on the specific heterocyclic partner used.

Astrobiologists are aware that extraterrestrial life might differ from known life, and considerable thought has been given to possible signatures associated with weird forms of life on other planets. So far, however, very little attention has been paid to the possibility that our own planet might also host communities of weird life. If life arises readily in Earth-like conditions, as many astrobiologists contend, then it may well have formed many times on Earth itself, which raises the question whether one or more shadow biospheres have existed in the past or still exist today. In this paper, we discuss possible signatures of weird life and outline some simple strategies for seeking evidence of a shadow biosphere.

DNA sequencing-by-synthesis (SBS) technology, using a polymerase or ligase enzyme as its core biochemistry, has already been incorporated in several second-generation DNA sequencing systems with significant performance. Notwithstanding the substantial success of these SBS platforms, challenges continue to limit the ability to reduce the cost of sequencing a human genome to $ 100,000 or less. Achieving dramatically reduced cost with enhanced throughput and quality will require the seamless integration of scientific and technological effort across disciplines within biochemistry, chemistry, physics and engineering. The challenges include sample preparation, surface chemistry, fluorescent labels, optimizing the enzyme-substrate system, optics, instrumentation, understanding tradeoffs of throughput versus accuracy, and read-length/phasing limitations. By framing these challenges in a manner accessible to a broad community of scientists and engineers, we hope to solicit input from the broader research community on means of accelerating the advancement of genome sequencing technology.

A molecular beacon that incorporates components of an artificially expanded genetic information system (AEGIS) in its stem is shown not to be opened by unwanted stem invasion by adventitious standard DNA; this should improve the "darkness" of the beacon in real-world applications.

Humans have relatively low plasma ascorbate levels and high serum uric acid levels compared to most mammals due to the presence of genetic mutations in L-gulonotactone oxidase and uricase, respectively. We review the major hypotheses for why these mutations may have occurred. In particular, we suggest that both mutations may have provided a survival advantage to early primates by helping maintain blood pressure during periods of dietary change and environmental stress. We further propose that these mutations have the inadvertent disadvantage of increasing our risk for hypertension and cardiovascular disease in today's society characterized by Western diet and increasing physical inactivity. Finally, we suggest that a "planetary biology" approach in which genetic changes are analyzed in relation to their biological action and historical context may provide the ideal approach towards understanding the biology of the past, present and future. (c) 2008 Elsevier Ltd. All rights reserved.

In this article, we focus on the synthesis of aryl C-glycosides via Heck coupling. It is organized based on the type of structures used in the assembly of the C-glycosides (also called C-nucleosides) with the following subsections: pyrimidine C-nucleosides, purine C-nucleosides, and monocyclic, bicyclic, and tetracyclic C-nucleosides. The reagents and conditions used for conducting the Heck coupling reactions are discussed. The subsequent conversion of the Heck products to the corresponding target molecules and the application of the target molecules are also described.

Tools to re-sequence the genomes of individual patients having well described medical histories is the first step required to connect genetic information to diagnosis, prognosis, and treatment. There is little doubt that in the future, genomics will influence the choice of therapies for individual patients based on their specific genetic inheritance, as well as the genetic defects that led to disease. Cost is the principle obstacle preventing the realization of this vision. Unless the interesting parts of a patient genome can be resequenced for less than $10,000 (as opposed to $100,000 or more), it will be difficult to start the discovery process that will enable this vision. While instrumentation and biology are important to reducing costs, the key element to cost-effective personalized genomic sequencing will be new chemical reagents that deliver capabilities that are not available from standard DNA. Scientists at the Foundation for Applied Molecular Evolution and the Westheimer Institute have developed several of these, which will be the topic of this talk.

Two approaches, one novel, are applied to analyze the divergent evolution of ruminant seminal ribonucleases (RNases), paralogs of the well-known pancreatic RNases of mammals. Here, the goal was to identify periods of divergence of seminal RNase under functional constraints, periods of divergence as a pseudogene, and periods of divergence driven by positive selection pressures. The classical approach involves the analysis of nonsynonymous to synonymous replacements ratios (omega) for the branches of the seminal RNase evolutionary tree. The novel approach coupled these analyses with the mapping of substitutions on the folded structure of the protein. These analyses suggest that seminal RNase diverged during much of its history after divergence from pancreatic RNase as a functioning protein, followed by homoplastic inactivations to create pseudogenes in multiple ruminant lineages. Further, they are consistent with adaptive evolution only in the most recent episode leading to the gene in modern oxen. These conclusions contrast sharply with the view, cited widely in the literature, that seminal RNase decayed after its formation by gene duplication into an inactive pseudogene, whose lesions were repaired in a reactivation event. Further, the 2 approaches, omega estimation and mapping of replacements on the protein structure, were compared by examining their utility for establishing the functional status of the seminal RNase genes in 2 deer species. Hog and roe deer share common lesions, which strongly suggests that the gene was inactive in their last common ancestor. In this specific example, the crystallographic approach made the correct implication more strongly than the omega approach. Studies of this type should contribute to an integrated framework of tools to assign functional and nonfunctional episodes to recently created gene duplicates and to understand more broadly how gene duplication leads to the emergence of proteins with novel functions.

The use of DNA polymerases to incorporate phosphorothioate linkages into DNA, and the use of exonuclease III to determine where those linkages have been incorporated, are re- examined in this work. The results presented here show that exonuclease III degrades single- stranded DNA as a substrate and digests through phosphorothioate linkages having one absolute stereochemistry, assigned ( assuming inversion in the polymerase reaction) as S, but not the other absolute stereochemistry. This contrasts with a general view in the literature that exonuclease III favors double-stranded nucleic acid as a substrate and stops completely at phosphorothioate linkages. Furthermore, not all DNA polymerases appear to accept exclusively the ( R) stereoisomer of nucleoside alpha- thiotriphosphates [ and not the ( S) diastereomer], a conclusion inferred two decades ago by examination of five Family- A polymerases and a reverse transcriptase. This suggests that caution is appropriate when extrapolating the detailed behavior of one polymerase from the behaviors of other polymerases. Furthermore, these results provide constraints on how exonuclease III - thiotriphosphate - polymerase combinations can be used to analyze the behavior of the components of a synthetic biology.

DNA polymerases are identified that copy a nonstandard nucleotide pair joined by a hydrogen bonding pattern different from the patterns joining the dA:T and dG:dC pairs. 6-Amino-5-nitro3-(l'-p-D-2'-deoxyribofuranosyl)-2(1H)-pyridone (dZ) implements the non-standard 'small' donordonor-acceptor (pyDDA) hydrogen bonding pattern. 2-Amino-8-(1-beta-D-2'-deoxyribofuranosyl)imidazo[1,2-a]-1,3,5-triazin-4 (8H)-one [dP) implements the 'large' acceptor-acceptor-donor (puAAD) pattern. These nucleobases were designed to present electron density to the minor groove, density hypothesized to help determine specificity for polymerases. Consistent with this hypothesis, both dZTP and dPTP are accepted by many polymerases from both Families A and B. Further, the dZ:dP pair participates in PCR reactions catalyzed by Taq, Vent (exo(-)) and Deep Vent (exo-) polymerases, with 94.4%, 97.5% and 97.5%, respectively, retention per round. The dZ:dP pair appears to be lost principally via transition to a dC:dG pair. This is consistent with a mechanistic hypothesis that deprotonated dZ (presenting a pyDAA pattern) complements dG (presenting a puADD pattern), while protonated dC (presenting a pyDDA pattern) complements dP (presenting a puAAD pattern). This hypothesis, grounded in the Watson-Crick model for nucleobase pairing, was confirmed by studies of the pH-dependence of mismatching. The dZ:dP pair and these polymerases, should be useful in dynamic architectures for sequencing, molecular-, systems- and synthetic-biology.

Background: When accurate models for the divergent evolution of protein sequences are integrated with complementary biological information, such as folded protein structures, analyses of the combined data often lead to new hypotheses about molecular physiology. This represents an excellent example of how bioinformatics can be used to guide experimental research. However, progress in this direction has been slowed by the lack of a publicly available resource suitable for general use. Results: The precomputed Magnum database offers a solution to this problem for ca. 1,800 full-length protein families with at least one crystal structure. The Magnum deliverables include 1) multiple sequence alignments, 2) mapping of alignment sites to crystal structure sites, 3) phylogenetic trees, 4) inferred ancestral sequences at internal tree nodes, and 5) amino acid replacements along tree branches. Comprehensive evaluations revealed that the automated procedures used to construct Magnum produced accurate models of how proteins divergently evolve, or genealogies, and correctly integrated these with the structural data. To demonstrate Magnum's capabilities, we asked for amino acid replacements requiring three nucleotide substitutions, located at internal protein structure sites, and occurring on short phylogenetic tree branches. In the cellular retinoid binding protein family a site that potentially modulates ligand binding affinity was discovered. Recruitment of cellular retinol binding protein to function as a lens crystallin in the diurnal gecko afforded another opportunity to showcase the predictive value of a browsable database containing branch replacement patterns integrated with protein structures. Conclusion: We integrated two areas of protein science, evolution and structure, on a large scale and created a precomputed database, known as Magnum, which is the first freely available resource of its kind. Magnum provides evolutionary and structural bioinformatics resources that are useful for identifying experimentally testable hypotheses about the molecular basis of protein behaviors and functions, as illustrated with the examples from the cellular retinoid binding proteins.

Background: The exchange of nucleotides at synonymous sites in a gene encoding a protein is believed to have little impact on the fitness of a host organism. This should be especially true for synonymous transitions, where a pyrimidine nucleotide is replaced by another pyrimidine, or a purine is replaced by another purine. This suggests that transition redundant exchange ( TREx) processes at the third position of conserved two-fold codon systems might offer the best approximation for a neutral molecular clock, serving to examine, within coding regions, theories that require neutrality, determine whether transition rate constants differ within genes in a single lineage, and correlate dates of events recorded in genomes with dates in the geological and paleontological records. To date, TREx analysis of the yeast genome has recognized correlated duplications that established a new metabolic strategies in fungi, and supported analyses of functional change in aromatases in pigs. TREx dating has limitations, however. Multiple transitions at synonymous sites may cause equilibration and loss of information. Further, to be useful to correlate events in the genomic record, different genes within a genome must suffer transitions at similar rates. Results: A formalism to analyze divergence at two fold redundant codon systems is presented. This formalism exploits two-state approach-to-equilibrium kinetics from chemistry. This formalism captures, in a single equation, the possibility of multiple substitutions at individual sites, avoiding any need to "correct" for these. The formalism also connects specific rate constants for transitions to specific approximations in an underlying evolutionary model, including assumptions that transition rate constants are invariant at different sites, in different genes, in different lineages, and at different times. Therefore, the formalism supports analyses that evaluate these approximations. Transitions at synonymous sites within two-fold redundant coding systems were examined in the mouse, rat, and human genomes. The key metric (f(2)), the fraction of those sites that holds the same nucleotide, was measured for putative ortholog pairs. A transition redundant exchange ( TREx) distance was calculated from f(2) for these pairs. Pyrimidine-pyrimidine transitions at these sites occur approximately 14% faster than purine-purine transitions in various lineages. Transition rate constants were similar in different genes within the same lineages; within a set of orthologs, the f(2) distribution is only modest overdispersed. No correlation between disparity and overdispersion is observed. In rodents, evidence was found for greater conservation of TREx sites in genes on the X chromosome, accounting for a small part of the overdispersion, however. Conclusion: The TREx metric is useful to analyze the history of transition rate constants within these mammals over the past 100 million years. The TREx metric estimates the extent to which silent nucleotide substitutions accumulate in different genes, on different chromosomes, with different compositions, in different lineages, and at different times.

Background: The medical community requires computational tools that distinguish genetic differences having phenotypic impact within the vast number of mutations that do not. Tools that do this will become increasingly important for those seeking to use human genome sequence data to predict disease, make prognoses, and customize therapy to individual patients. Results: An approach, termed DETECTER, is proposed to identify sites in a protein sequence where amino acid replacements are likely to have a significant effect on phenotype, including causing genetic disease. This approach uses a model-dependent tool to estimate the normalized replacement rate at individual sites in a protein sequence, based on a history of those sites extracted from an evolutionary analysis of the corresponding protein family. This tool identifies sites that have higher-than-average, average, or lower- than-average rates of change in the lineage leading to the sequence in the population of interest. The rates are then combined with sequence data to determine the likelihoods that particular amino acids were present at individual sites in the evolutionary history of the gene family. These likelihoods are used to predict whether any specific amino acid replacements, if introduced at the site in a modern human population, would have a significant impact on fitness. The DETECTER tool is used to analyze the cystic fibrosis transmembrane conductance regulator (CFTR) gene family. Conclusions: In this system, DETECTER retrodicts amino acid replacements associated with the cystic fibrosis disease with greater accuracy than alternative approaches. While this result validates this approach for this particular family of proteins only, the approach may be applicable to the analysis of polymorphisms generally, including SNPs in a human population.

2-Hydroxymethylphenylboronate is described as a reagent that converts neutral 1,2-diols, as found in simple carbohydrates, into 1:1 anionic complexes that are easily detected by Fourier transform ion cyclotron resonance mass spectrometry. The value of this reagent was demonstrated through its application to analyze complex mixtures of carbohydrates formed in the formose process, often cited as a way that biologically significant carbohydrates might have been generated from formaldehyde under prebiotic conditions. Coupled with isotope studies, the reagent shows that the simplest autocatalytic cycle for the consumption of formaldehyde in this process cannot account for the bulk consumption of formaldehyde.

A strategy is presented that uses dynamic equlibria to assemble in situ composite DNA polymerase primers, having lengths of 14 or 16 nt, from DNA fragments that are 6 or 8 nt in length. In this implementation, the fragments are transiently joined under conditions of dynamic equilibrium by an imine linker, which has a dissociation constant of 1 µM. If a polymerase is able to extend the composite, but not the fragments, it is possible to prime the synthesis of a target DNA molecule under conditions where two useful specificities are combined: (i) single nucleotide discrimination that is characteristic of short oligonucleotide duplexes (four to six nucleobase pairs in length), which effectively excludes single mismatches, and (ii) an overall specificity of priming that is characteristic of long (14 to 16mers) oligonucleotides, potentially unique within a genome. We report here the screening of a series of polymerases that combine an ability not to accept short primer fragments with an ability to accept the long composite primer held together by an unnatural imine linkage. Several polymerases were found that achieve this combination, permitting the implementation of the dynamic combinatorial chemical strategy.

To support efforts to develop a 'synthetic biology' based on an artificially expanded genetic information system (AEGIS), we have developed a route to two components of a non-standard nucleobase pair, the pyrimidine analog 6-amino-5-nitro-3-(1'-beta-D-2'-deoxyribofuranosyl)-2(1H)-pyridone (dZ) and its Watson-Crick complement, the purine analog 2-amino-8-(1'-beta-D-2'-deoxyribofuranosyl)-imidazo[1,2-a]-1,3,5-triazin -4(8H)-one (dP). These implement the pyDDA:puAAD hydrogen bonding pattern (where 'py' indicates a pyrimidine analog and 'pu' indicates a purine analog, while A and D indicate the hydrogen bonding patterns of acceptor and donor groups presented to the complementary nucleobases, from the major to the minor groove). Also described is the synthesis of the triphosphates and protected phosphoramidites of these two nucleosides. We also describe the use of the protected phosphoramidites to synthesize DNA oligonucleotides containing these AEGIS components, verify the absence of epimerization of dZ in those oligonucleotides, and report some hybridization properties of the dZ:dP nucleobase pair, which is rather strong, and the ability of each to effectively discriminate against mismatches in short duplex DNA.

In this article, we focus on the synthesis of aryl C-glycosides via Heck coupling. It is organized based on the type of structures used in the assembly of the C-glycosides (also called C-nucleosides) with the following subsections: pyrimidine C-nucleosides, purine C-nucleosides, and monocyclic, bicyclic, and tetracyclic C-nucleosides. The reagents and conditions used for conducting the Heck coupling reactions are discussed. The subsequent conversion of the Heck products to the corresponding target molecules and the application of the target molecules are also described.

Desorption/ionization on porous silicon mass spectrometry (DIOS-MS) was used to investigate the binding affinities between aldopentose isomers and boron. Boron has been recognized for its importance in pentose synthesis and stabilization in prebiotic conditions. Boron may also account for the fact that ribose, among other aldopentoses, is the favored building block in RNA synthesis. This research started with the detection of aldopentoses in the positive mode through cationization and the aldopentose-borate complexes in the negative mode. Then two competition schemes, one using a pentose structure analogue and the other using C-13-labeled ribose, were designed to compare the relative binding affinities of four aldopentoses (xylose, lyxose, arabinose, and ribose) to boron. Both approaches determined the binding preference to be ribose > lyxose > arabinose > xylose. This work illustrates the potential of DIOS-MS in the analyses of nonvolatile, small molecules in delicate chemical equilibria. Without externally introduced matrices, background signals are not a limiting factor. Furthermore, the possible dramatic change of pH associated with the matrix introduction, which may disturb the equilibria of interest, is avoided.

Background: Blocks of duplicated genomic DNA sequence longer than 1000 base pairs are known as low copy repeats (LCRs). Identified by their sequence similarity, LCRs are abundant in the human genome, and are interesting because they may represent recent adaptive events, or potential future adaptive opportunities within the human lineage. Sequence analysis tools are needed, however, to decide whether these interpretations are likely, whether a particular set of LCRs represents nearly neutral drift creating junk DNA, or whether the appearance of LCRs reflects assembly error. Here we investigate an LCR family containing the sulfotransferase (SULT) IA genes involved in drug metabolism, cancer, hormone regulation, and neurotransmitter biology as a first step for defining the problems that those tools must manage. Results: Sequence analysis here identified a fourth sulfotransferase gene, which may be transcriptionally active, located on human chromosome 16. Four regions of genomic sequence containing the four human SULTIA paralogs defined a new LCR family. The stem hominoid SULTIA progenitor locus was identified by comparative genomics involving complete human and rodent genomes, and a draft chimpanzee genome. SULTIA expansion in hominoid genomes was followed by positive selection acting on specific protein sites. This episode of adaptive evolution appears to be responsible for the dopamine sulfonation function of some SULT enzymes. Each of the conclusions that this bioinformatic analysis generated using data that has uncertain reliability (such as that from the chimpanzee genome sequencing project) has been confirmed experimentally or by a "finished" chromosome 16 assembly, both of which were published after the submission of this manuscript. Conclusion: SULTIA genes expanded from one to four copies in hominoids during intra-chromosomal LCR duplications, including (apparently) one after the divergence of chimpanzees and humans. Thus, LCRs may provide a means for amplifying genes (and other genetic elements) that are adaptively useful. Being located on and among LCRs, however, could make the human SULTIA genes susceptible to further duplications or deletions resulting in 'genomic diseases' for some individuals. Pharmacogenomic studies of SULTIAsingle nucleotide polymorphisms, therefore, should also consider examining SULTIA copy number variability when searching for genotype-phenotype associations. The latest duplication is, however, only a substantiated hypothesis; an alternative explanation, disfavored by the majority of evidence, is that the duplication is an artifact of incorrect genome assembly.

Chemistry is a broadly powerful discipline in contemporary science because it has the ability to create new forms of the matter that it studies. By doing so, chemistry can test models that connect molecular structure to behaviour without having to rely on what nature has provided. This creation, known as synthesis', began to be applied to living systems in the 1980s as recombinant DNA technologies allowed biologists to deliberately change the molecular structure of the microbes that they studied, and automated chemical synthesis of DNA became widely available to support these activities. The impact of the information that has emerged has made biologists aware of a truism that has long been known in chemistry: synthesis drives discovery and understanding in ways that analysis cannot. Synthetic biology is now setting an ambitious goal: to recreate in artificial systems the emergent properties found in natural biology. By doing so, it is advancing our understanding of the molecular basis of genetics in ways that analysis alone cannot. More practically, it has yielded artificial genetic systems that improve the healthcare of some 400,000 Americans annually. Synthetic biology is now set to take the next step, to create artificial Darwinian systems by direct construction. Supported by the National Science Foundation as part of its Chemical Bonding program, this work cannot help but generate clarity in our understanding of how biological systems work.

Synthetic biology based on a six-letter genetic alphabet that includes the two non-standard nucleobases isoguanine (isoG) and isocytosine (isoC), as well as the standard A, T, G and C, is known to suffer as a consequence of a minor tautomeric form of isoguanine that pairs with thymine, and therefore leads to infidelity during repeated cycles of the PCR. Reported here is a solution to this problem. The solution replaces thymidine triphosphate by 2-thiothymidine triphosphate (2-thioTTP). Because of the bulk and hydrogen bonding properties of the thione unit in 2-thioT, 2-thioT does not mispair effectively with the minor tautomer of isoG. To test whether this might allow PCR amplification of a six-letter artificially expanded genetic information system, we examined the relative rates of misincorporation of 2-thioTTP and TTP opposite isoG using affinity electrophoresis. The concentrations of isoCTP and 2-thioTTP were optimal to best support PCR amplification using thermostable polymerases of a six-letter alphabet that includes the isoC-isoG pair. The fidelity-per-round of amplification was found to be approximately 98% in trial PCRs with this six-letter DNA alphabet. The analogous PCR employing TTP had a fidelity-per-round of only approximately 93%. Thus, the A, 2-thioT, G, C, isoC, isoG alphabet is an artificial genetic system capable of Darwinian evolution.

Modern yeast living in fleshy fruits rapidly convert sugars into bult ethanol through pyruvate. Pyruvate loses carbon dioxide to become acetaldehyde, which is reduced by alcohol dehydrogenase 1 (Adh1) to ethanol, which accumulates. Yeast later consumes the accumulated ethanol, exploiting Adh2, an Adh1 homolog differing by 24 (of 348) amino acids. Because many microorganisms cannot grow in ethanol, accumulated ethanol may help yeast defend resources in the fruit. We report here the reconstruction of the last common ancestor of Adh1 and Adh2, called AdhA. The kinetic behavior of AdhA suggests that it was optimized to make (not consume) ethanol. This is consistent with the hypothesis that before the Adh1-Adh2 duplication, yeast did not accumulate ethanol for later consumption but rather used AdhA to recycle NADH generated in the glycolytic pathway. Silent nucleotide dating suggests that the Adh1-Adh2 duplication occurred near the time of duplication of several other proteins involved in the accumulation of ethanol, possibly in the Cretaceous age when fleshy fruits arose. These results help to connect the chemical behavior of these enzymes through systems analysis to a time of global ecosystem change, a small but useful step towards a planetary systems biology.

Synthetic biologists come in two broad classes. One uses unnatural molecules to reproduce emergent behaviours from natural biology, with the goal of creating artificial life. The other seeks interchangeable parts from natural biology to assemble into systems that function unnaturally. Either way, a synthetic goal forces scientists to cross uncharted ground to encounter and solve problems that are not easily encountered through analysis. This drives the emergence of new paradigms in ways that analysis cannot easily do. Synthetic biology has generated diagnostic tools that improve the care of patients with infectious diseases, as well as devices that oscillate, creep and play tic-tac-toe.

This Account describes work done in these laboratories that has used synthetic, physical organic, and biological chemistry to understand the roles played by the nucleobases, sugars, and phosphates of DNA in the molecular recognition processes central to genetics. The number of nucleobases has been increased from 4 to 12, generating an artificially expanded genetic information system. This system is used today in the clinic to monitor the levels of HIV and hepatitis C viruses in patients, helping to manage patient care. Work with uncharged phosphate replacements suggests that a repeating charge is a universal feature of genetic molecules operating in water and will be found in extraterrestrial life (if it is ever encountered). The use of ribose may reflect prebiotic processes in the presence of borate-containing minerals, which stabilize ribose formed from simple organic precursors. A new field, synthetic biology, is emerging on the basis of these experiments, where chemistry mimics biological processes as complicated as Darwinian evolution.

In vitro selections performed in the presence of Mg2+ generated DNA sequences capable of cleaving an internal ribonucleoside linkage. Several of these, surprisingly, displayed intermolecular catalysis and catalysis independent of Mg2+, features that the selection protocol was not explicitly designed to select. A detailed physical organic analysis was applied to one of these DNAzymes, termed 614. First, the progress curve for the reaction was dissected to identify factors that prevented the molecule from displaying clean first-order transformation kinetics and 100% conversion. Several factors were identified and quantitated, including (a) competitive intra- and intermolecular rate processes, (b) alternative reactive and unreactive conformations, and (c) mutations within the catalyst. Other factors were excluded, including "approach to equilibrium" kinetics and product inhibition. The possibility of complementary strand inhibition was demonstrated but was shown to not be a factor under the conditions of these experiments. The rates of the intra- and intermolecular processes were compared, and saturation models for the intermolecular process were built. The rate-limiting step for the intermolecular reaction was found to be the association/ folding of the enzyme with the substrate and not the cleavage step. The DNAzyme 614 is more active in trans than in cis and more active at temperatures below the selection temperature than at the selection temperature. Many of these properties have not been reported in similar systems; these results therefore expand the phenomenology known for this class of DNA-based catalysts. A brief survey of other catalysts arising from this selection found other Mg2+-independent DNAzymes and provided a preliminary view of the ruggedness of the landscape, relating function to structure in sequence space. Hypotheses are suggested to account for the fact that a selection in the presence of Mg2+ did not exploit this Mg2+. This study of a specific catalytically active DNAzyme is an example of studies that will be necessary generally to permit in vitro selection to help us understand the distribution of function in sequence space.

BACKGROUND: Joining a model for the molecular evolution of a protein family to the paleontological and geological records (geobiology), and then to the chemical structures of substrates, products, and protein folds, is emerging as a broad strategy for generating hypotheses concerning function in a post-genomic world. This strategy expands systems biology to a planetary context, necessary for a notion of fitness to underlie (as it must) any discussion of function within a biomolecular system. RESULTS: Here, we report an example of such an expansion, where tools from planetary biology were used to analyze three genes from the pig Sus scrofa that encode cytochrome P450 aromatases-enzymes that convert androgens into estrogens. The evolutionary history of the vertebrate aromatase gene family was reconstructed. Transition redundant exchange silent substitution metrics were used to interpolate dates for the divergence of family members, the paleontological record was consulted to identify changes in physiology that correlated in time with the change in molecular behavior, and new aromatase sequences from peccary were obtained. Metrics that detect changing function in proteins were then applied, including KA/KS values and those that exploit structural biology. These identified specific amino acid replacements that were associated with changing substrate and product specificity during the time of presumed adaptive change. The combined analysis suggests that aromatase paralogs arose in pigs as a result of selection for Suoidea with larger litters than their ancestors, and permitted the Suoidea to survive the global climatic trauma that began in the Eocene. CONCLUSIONS: This combination of bioinformatics analysis, molecular evolution, paleontology, cladistics, global climatology, structural biology, and organic chemistry serves as a paradigm in planetary biology. As the geological, paleontological, and genomic records improve, this approach should become widely useful to make systems biology statements about high-level function for biomolecular systems.

Background: All states require some kind of testing for newborns, but the policies are far from standardized. In some states, newborn screening may include genetic tests for a wide range of targets, but the costs and complexities of the newer genetic tests inhibit expansion of newborn screening. We describe the development and technical evaluation of a multiplex platform that may foster increased newborn genetic screening. Methods: MultiCode(R) PLx involves three major steps: PCR, target-specific extension, and liquid chip decoding. Each step is performed in the same reaction vessel, and the test is completed in similar to3 h. For site-specific labeling and room-temperature decoding, we use an additional base pair constructed from isoguanosine and isocytidine. We used the method to test for mutations within the cystic fibrosis transmembrane conductance regulator (CFTR) gene. The developed test was performed manually and by automated liquid handling. Initially, 225 samples with a range of genotypes were tested retrospectively with the method. A prospective study used samples from >400 newborns. Results: In the retrospective study, 99.1% of samples were correctly genotyped with no incorrect calls made. In the perspective study, 95% of the samples were correctly genotyped for all targets, and there were no incorrect calls. Conclusions: The unique genetic multiplexing platform was successfully able to test for 31 targets within the CFTR gene and provides accurate genotype assignments in a clinical setting. (C) 2004 American Association for Clinical Chemistry.

A review of organic chemistry suggests that life, a chemical system capable of Darwinian evolution, may exist in a wide range of environments. These include non-aqueous solvent systems at low temperatures, or even supercritical dihydrogen-helium mixtures. The only absolute requirements may be a thermodynamic disequilibrium and temperatures consistent with chemical bonding. A solvent system, availability of elements such as carbon, hydrogen, oxygen and nitrogen, certain thermodynamic features of metabolic pathways, and the opportunity for isolation, may also define habitable environments. If we constrain life to water, more specific criteria can be proposed, including soluble metabolites, genetic materials with repeating charges, and a well defined temperature range.

The 6-aminopyrazin-2(1H)-one, when incorporated as a pyrimidine-base analog into an oligonucleotide chain, presents a H-bond donor- donor- acceptor pattern to a complementary DNA or RNA strand. When paired with the corresponding acceptor-acceptor-donor purine in oligonucleotides, the heterocycle selectively contributes to the stability of the duplex, presumably by forming a base pair of Watson-Crick geometry joined by a nonstandard H-bonding pattern, expanding the genetic alphabet. Reported here is a short, high yielding, beta-D-selective synthesis of a 6-aminopyrazin-2(l H) -one nucleoside via the glycine riboside derivative 28. The key steps include a Wittig-Horner reaction of an appropriately protected ribose derivative (Scheme 10, 19 --> 21) followed by a Michael-like ring closure (Scheme 12, 30 --> la and 32 --> 1b). Thus, a variety of pyrazine nucleosides (Scheme 73) including the target 6-aminopyrazin-2(1H)-one riboside la, and its 5-methyl derivative 1b, 6-amino-5-methylpyrazin-2(1H)-one riboside, are obtained.

To understand how protein segments are inserted and deleted during divergent evolution, a set of pairwise alignments contained exactly one gap, and therefore arising from the first insertion-deletion (indel) event in the time separating the homologs, was examined. The alignments showed that "structure breaking" amino acids (PGDNS) were preferred within and flanking gapped regions, as are two residues with hydrophilic side-chains (QE) that frequently occur at the surface of protein folds. Conversely, hydrophobic residues (FMILYVW) occur infrequently within and flanking the gapped region. These preferences are modestly different in protein pairs separated by an episode of adaptive evolution, than in pairs diverging under strong functional constraints. Surprisingly, regions near an indel have not evolved more rapidly than the sequence pair overall, showing no evidence that an indel event must be compensated by local amino acid replacement. The gap-lengths are best approximated by a Zipfian distribution, with the probability of a gap of length L decreasing as a function of L-1.8. These features are largely independent of the length of the gap and the extent of divergence (measured by both silent and non-silent sequence changes) separating the two proteins. Surprisingly, amino acid repeats were discovered in more than a third of the polypeptide segments in and around the gap. These correspond to repeats in the DNA sequence. This suggests that a signature of the mechanism by which indels occur in the DNA sequence remains in the encoded protein sequences. These data suggest specific tools to score gap placement in an alignment. They also suggest tools that distinguish true indels from gaps created by mistaken gene finding, including under-predicted and overpredicted introns. By providing mechanisms to identify errors, the tools will enhance the value of genome sequence databases in support of integrated paleogenomics strategies used to extract functional information in a post-genomic environment.

The tautomerism of 2'-deoxy-7-deaza-isoguanosine (2) was studied and compared to that of 2'-deoxyisoguanosine (1). The fixed N-1-methyl (8) and O-methyl (4) derivatives were synthesized to represent the pure extremes of each tautomer. The replacement of the imidazole ring in 1 with a pyrrole ring in 2 makes the keto form in the latter more favored by 2 orders of magnitude (K-TAUT for 2 approximate to 10(3), as opposed to K-TAUT for 1 approximate to 10).

Standard nucleobases all present electron density as an unshared pair of electrons to the minor groove of the double helix. Many heterocycles supporting artificial genetic systems lack this electron pair. To determine how different DNA polymerases use the pair as a substrate specificity determinant, three Family A polymerases, three Family B polymerases and three reverse transcriptases were examined for their ability to handle 3-deaza-2'-deoxyadenosine (c(3)dA), an analog of 2'-deoxyadenosine lacking the minor groove electron pair. Different polymerases differed widely in their interaction with c(3)dA. Most notably, Family A and Family B polymerases differed in their use of this interaction to exploit their exonuclease activities. Significant differences were also found within polymerase families. This plasticity in polymerase behavior is encouraging to those wishing to develop a synthetic biology based on artificial genetic systems. The differences also suggest either that Family A and Family B polymerases do not share a common ancestor, that minor groove contact was not used by that ancestor functionally or that this contact was not sufficiently critical to fitness to have been conserved as the polymerase families diverged. Each interpretation is significant for understanding the planetary biology of polymerases.

As the next step towards generating a synthetic biology from artificial genetic information systems, we have examined variants of HIV reverse transcriptase (RT) for their ability to synthesize duplex DNA incorporating the non-standard base pair between 2,4-diaminopyrimidine (pyDAD), a pyrimidine presenting a hydrogen bond 'donor-acceptor-donor' pattern to the complementary base, and xanthine (puADA), a purine presenting a hydrogen bond 'acceptor-donor-acceptor' pattern. This base pair fits the Watson-Crick geometry, but is joined by a pattern of hydrogen bond donor and acceptor groups different from those joining the GC and AT pairs. A variant of HIV-RT where Tyr 188 is replaced by Leu, has emerged from experiments where HIV was challenged to grow in the presence of drugs targeted against the RT, such as L-697639, TIBO and nevirapine. These drugs bind at a site near, but not in, the active site. This variant accepts the pyDAD-puADA base pair significantly better than wild type HIV-RT, and we used this as a starting point. A second mutation, E478Q, was introduced into the Y188L variant, in the event that the residual nuclease activity observed is due to the RT, and not a contaminant. The doubly mutated RT incorporated the non-standard pair with sufficient fidelity that the variant could be used to amplify oligonucleotides containing pyDAD and puADA through several rounds of a polymerase chain reaction (PCR) without losing the non-standard base pair. This is the first time where DNA containing non-standard base pairs with alternative hydrogen bonding patterns has been amplified by a full PCR. This work also illustrates a research strategy that combines in clinico pre-evolution of proteins followed by rational design to obtain an enzyme that meets a particular technological specification.

The synthesis of 2'-deoxycytidine nucleosides bearing amino and thiol groups appended to the 5-position of the nucleobase via a butynyl linker is described. The corresponding triphosphates were then synthesized from the nucleoside and incorporated into oligonucleotides by Vent (exo(-)) DNA polymerase. The ability of Vent (exo(-)) polymerase to amplify oligonucleotides containing these functionalized cytidine derivatives in a polymerase chain reaction (PCR) was demonstrated for the amino-functionalized derivative.

The NASA Astrobiology Roadmap provides guidance for research and technology development across the NASA enterprises that encompass the space, Earth, and biological sciences. The ongoing development of astrobiology roadmaps embodies the contributions of diverse scientists and technologists from government, universities, and private institutions. The Roadmap addresses three basic questions: How does life begin and evolve, does life exist elsewhere in the universe, and what is the future of life on Earth and beyond? Seven Science Goals outline the following key domains of investigation: understanding the nature and distribution of habitable environments in the universe, exploring for habitable environments and life in our own solar system, understanding the emergence of life, determining how early life on Earth interacted and evolved with its changing environment, understanding the evolutionary mechanisms and environmental limits of life, determining the principles that will shape life in the future, and recognizing signatures of life on other worlds and on early Earth. For each of these goals, Science Objectives outline more specific high-priority efforts for the next 3-5 years. These 18 objectives are being integrated with NASA strategic planning.

The Leptin protein is central to the regulation of energy metabolism in mammals. By integrating evolutionary, structural, and biochemical information, a surface segment, outside of its known receptor contacts, is predicted as a second interaction site that may help to further define its roles in energy balance and its functional differences between humans and other mammals.

A new route is presented to prepare analogs of nucleosides homologated at the 3'- and 5'-positions. This route, applicable to both the D- and L-enantiomeric forms, is suitable for the preparation of monomeric bis-homonucleosides needed for the synthesis of oligonucleotide analogs. It begins with the known monobenzyl ether 3 of pent-2-yne-1,5-diol, which is reduced to alkenol 4. Sharpless asymmetric epoxidation of 4, followed by opening of the epoxide 5 with allylmagnesium bromide, gives a mixture of diols 6 and 7 Protection of the primary alcohol as a silyl ether followed by treatment with OsO4, NalO(4), and mild acid in MeOH, followed by reduction, yields (2R,3R) {{[(tert-butyl)diphenylsilyl]oxy}methyl}tetrahydro-2-(2-hydroxyethyl)-5- methoxyfuran (=methyl 3-{{[(tert-butyl)diphenylsilyl]oxy}methyl}-2 3,5-trideoxy-alpha/beta-D-erythro-hexafuranoside: 10) (Scheme 1). Protected nucleobases are added to this skeleton with the aid of trimethylsilyl triflate (Scheme 2). The o-toluoyl (2-MeC6H4CO) and p-anisoyl (4-MeOC6H4CO) groups were used to protect the exocyclic amino group of cytosine. The bis-homonucleoside analogs 11 and 14a are then converted to monothiol derivatives suitable for coupling (Schemes 3 and 4) to oligonucleotide analogs with bridging S-atoms. This synthesis replaces a much longer synthesis for analogous nucleoside analogs that begins with diacetoneglucose (= 1,2:5,6-di-O-isopropylideneglucose), with the stereogenic centers in the final products derived from the Sharpless asymmetric epoxidation. The new route is useful for large-scale synthesis of these building blocks for the synthesis of oligonucleotide analogs.

A convergent, solution-phase synthesis was developed for the bis(methylene) sulfone-bridged oligodeoxynucleotide analogs (SNA) 5'-d(HOCH2-Tso(2)Tso(2)Tso(2)Cso(2)Tso(2)Tso(2)Tso(2)T-CH2SO3-)-3' (35b) and 5'-d(HOCH2-Tso(2)Tso(2)Tso(2)Tso(2)Tso(2)Tso(2)Tso(2)T-CH2SO3-)-3' (34c) (SO2 corresponds to CH2SO2CH2 instead of OP(=O)(O-)(O). In these, the phosphodiester linkages are replaced by non-ionic bis(methylene) sulfone linkers. The general strategy involved convergent coupling of 3',5'-bishomo-beta-D-deoxyribonucleotide analogs functionalized at the 6'-end (=CH2-C(5')) as bromides or mesylates and at the CH2-C(3') position as thiols. with the resulting thioether being oxidized to the corresponding sulfone. A single charge was introduced at the terminal CH2-C(3') position of the octamers to increase their solubility in water. During the synthesis, it became apparent that the key intermediates generated secondary structures through either folding or aggregation in a variety of solvents. This generated unusual reactivity and was unique for very similar structures. For example, although the dimeric thiol d(BzOCH(2)-Tso(2)C-CH2SH) (14b) was a well-behaved synthetic intermediate, the tetrameric thiol d(TrOCH2-Tso(2)Tso(2)Tso(2)(10)C-CH2SH) derived from the corresponding thioacetate was rapidly converted to a disulfide by very small amounts of oxidant (28 --> 29, Scheme 6). while the analogous tetrameric thiol d(BzOCH(2)-Tso(2)TsTso(2)T-CH2SH) (26), differing only by a single heterocycle, was oxidized much more slowly (Bz = PhCO, Tr = Ph3C, to = 2-MeC6H4CO (at N-4 of dc)). The sequence-dependent reactivity, well known in many classes of natural products (including polypeptides), is not prominent in natural oligonucleotides. These results are discussed in light of the proposal that the repeating negative charge in nucleic acids is key to their ability to serve as genetic molecules, in particular, their capability to support Darwinian evolution. The ability of 5'-d(HOCH2-Tso(2)Tso(2)Tso(2)Cso(2)Tso(2)Tso(2)Tso(2)T-CH2SO3-)-3' (35b) to bind as a third strand to duplex DNA was also examined. No triple-helix-forming propensity was detected in this molecule.

6-Amino-3-(2'-deoxy-beta-D-ribofuranosyl)-5-nitro-1H-pyridin-2-one (4), a C-glycoside exhibiting the nonstandard pgammaDDA hydrogen-bonding pattern, was synthesized via Heck coupling. The nitro group greatly enhances the stability of the nucleoside toward acid-catalyzed epimerization without leading to significant deprotonation of the heterocycle at physiological pH. These results make nucleoside 4 a promising candidate for an expanded genetic alphabet.

Over 15 years ago, the Benner group noticed that the DNA alphabet need not be limited to the four standard nucleotides known in natural DNA. Rather, twelve nucleobases forming six base pairs joined by mutually exclusive hydrogen bonding patterns are possible within the geometry of the Watson-Crick pair (Fig. 1). Synthesis and studies on these compounds have brought us to the threshold of a synthetic biology, an artificial chemical system that does basic processes needed for life (in particular, Darwinian evolution), but with unnatural chemical structures. At the same time, the artificial genetic information systems (AEGIS) that we have developed have been used in FDA-approved commercial tests for managing HIV and hepatitis C infections in individual patients, and in a tool that seeks the virus for severe acute respiratory syndrome (SARS). AEGIS also supports the next generation of robotic probes to search for genetic molecules on Mars, Europa, and elsewhere where NASA probes will travel.

Features of the physical environment surrounding an ancestral organism can be inferred by reconstructing sequences(1-9) of ancient proteins made by those organisms, resurrecting these proteins in the laboratory, and measuring their properties. Here, we resurrect candidate sequences for elongation factors of the Tu family (EF-Tu) found at ancient nodes in the bacterial evolutionary tree, and measure their activities as a function of temperature. The ancient EF-Tu proteins have temperature optima of 55-65degreesC. This value seems to be robust with respect to uncertainties in the ancestral reconstruction. This suggests that the ancient bacteria that hosted these particular genes were thermophiles, and neither hyperthermophiles nor mesophiles. This conclusion can be compared and contrasted with inferences drawn from an analysis of the lengths of branches in trees joining proteins from contemporary bacteria(10), the distribution of thermophily in derived bacterial lineages(11), the inferred G+C content of ancient ribosomal RNA(12), and the geological record combined with assumptions concerning molecular clocks(13). The study illustrates the use of experimental palaeobiochemistry and assumptions about deep phylogenetic relationships between bacteria to explore the character of ancient life.

A route is presented to append, in a single step, alkynyl thioesters to the 5-position of a pyrimidine ring of a nucleoside that is unprotected. These products should be useful to support in vitro selection experiments with functionalized DNA.

To guide the design of alternative genetic systems, we measured melting temperatures of DNA duplexes containing matched and mismatched nucleobase pairs from natural and unnatural structures. The pairs were analyzed in terms of structural features, including nucleobase size, number of hydrogen bonds formed, the presence of uncompensated hydrogen bonding functional groups, the nature of the bond joining the nucleobase to the sugar, and nucleobase charge. The results suggest that stability of nucleobase pairs correlates with the number of H-bonds, size complementarity, the presence of uncompensated functional groups, and the presence of charge on a nucleobase. Each of these properties appear to be more significant than the nature of the glycosidic bond and sequence context. The results provide guidelines for constructing stable Watson-Crick like nucleobase pairs with unnatural nucleobases. The experiments also demonstrate that expanded genetic systems can be constructed using size complementary nucleobase pairs that contain three hydrogen bonds.

Phosphate groups are found and used widely in biological chemistry. We have asked whether phosphate groups are likely to be important to the functioning of genetic molecules. including DNA and RNA. From observations made on synthetic analogs of DNA and RNA where the phosphates are replaced by nonanionic linking groups, we infer a set of rules that highlight the importance of the phosphodiester backbone for the proper functioning of DNA as a genetic molecule. The polyanionic backbone appears to give DNA the capability of replication following simple rules, and evolving. The polyanionic nature of the backbone appears to be critical to prevent the single strands from folding. permitting them to act as templates, guiding the interaction between two strands to form a duplex in a way that permits simple rules to guide the molecular recognition event, and buffering the sensitivity of its physicochemical properties to changes in sequence. We argue that the feature of a polyelectrolyte (polyanion or polycation) may be required for a "self-sustaining chemical system capable of Darwinian evolution." The polyelectrolyte structure therefore may be a universal signature of life, regardless of its genesis. and unique to living forms as well. (C) 2002 Elsevier Science (USA).

Chimeric DNA molecules containing four different linking groups, the natural phosphate, 5'-methylenephosphonate. bis(methylene)phosphinate, and bis(methylene) sulfone (see Fig.1), were directly compared for their ability to form duplexes with complementary DNA and DNA chimeras. From melting temperatures for analogous complementary sequences, general conclusions about the impact of geometric distortion of the internucleotide linkage around the two P-O-C bridges were drawn, as were conclusions about the impact on duplex stability that arises from the removal of the negative charge in the linking group. Each structural perturbation diminished the melting temperature, by ca. -2.5degrees per modification for the 5'-methylenephosphonate, -3.5degrees per modification for the bis(methylene)phosphinate, and -4.5degrees per modification for the bis(methylene) sulfone linker. These results have implications for DNA chemistry including the design of 'antisense' candidates and the proposal of alternative genetic materials in the search for non-terrean life.

The preferred ligands for the Hck Src homology 2 domain among a combinatorial library containing 324 different peptides were determined in a single experiment involving Fourier transform ion cyclotron resonance (FT-ICR) mass spectrometry (MS), electrospray ionization (ESI), stored-waveform inverse Fourier transformation (SWIFT), and infrared multiphoton laser disassociation (IRMPD). These were compared with the results obtained by conventional screening of the peptide library in solution using affinity chromatography. The results reported here show that by combining ESI, FT-ICR MS, SWIFT, and IRMPD, ligands likely to bind under physiological conditions are rapidly and efficiently identified, even from complex library mixtures. In the gas phase some discrimination against hydrophobic ligands could be observed. However, the illustrated feasibility of identifying high affinity ligand via gas-phase screening of complex library mixtures should lead to broad applications in the development of ligands for proteins with interesting biological activity, the first step that must be taken to develop a therapeutic agent.

When protein sequences divergently evolve under functional constraints, some individual amino acid replacements that reverse the charge (e.g. Lys to Asp) may be compensated by a replacement at a second position that reverses the charge in the opposite direction (e.g. Glu to Arg). When these side-chains are near in space (proximal), such double replacements might be driven by natural selection, if either is selectively disadvantageous, but both together restore fully the ability of the protein to contribute to fitness (are together "neutral"). Accordingly, many have sought to identify pairs of positions in a protein sequence that suffer compensatory replacements, often as a way to identify positions near in space in the folded structure. A "charge compensatory signal" might manifest itself in two ways. First, proximal charge compensatory replacements may occur more frequently than predicted from the product of the probabilities of individual positions suffering charge reversing replacements independently. Conversely, charge compensatory pairs of changes may be observed to occur more frequently in proximal pairs of sites than in the average pair. Normally, charge compensatory covariation is detected by comparing the sequences of extant proteins at the "leaves" of phylogenetic trees. We show here that the charge compensatory signal is more evident when it is sought by examining individual branches in the tree between reconstructed ancestral sequences at nodes in the tree. Here, we find that the signal is especially strong when the positions pairs are in a single secondary structural unit (e.g. ut helix or P strand) that brings the side-chains suffering charge compensatory covariation near in space, and may be useful in secondary structure prediction. Also, "node-node" and "node-leaf" compensatory covariation may be useful to identify the better of two equally parsimonious trees, in a way that is independent of the mathematical formalism used to construct the tree itself. Further, compensatory covariation may provide a signal that indicates whether an episode of sequence evolution contains more or less divergence in functional behavior. Compensatory covariation analysis on reconstructed evolutionary trees may become a valuable tool to analyze genome sequences, and use these analyses to extract biomedically useful information from proteome databases. (C) 2002 Elsevier Science Ltd. All rights reserved.

Short DNA analogues with bridging dimethylene sulfide, sulfoxide, and sulfone groups replacing the phosphate diesters (S-DNAs) were synthesized from building blocks prepared via two routes, both starting from D-glucose. Building blocks for RNA analogues were prepared by stereoselective introduction of nucleobase into a 2'-acylated ribose analogue. The ribose analogues were converted to deoxyribose analogues by replacement of a 3"-OH group by a thioacetyl unit, followed by photolytic deoxygenation or radical-based 2'-deoxygenation. DNA analogues joined via CH2-S-CH2 units were prepared by S(N)2 displacement of a 6'-mesyl group on one building block using a thiolate nucleophile of another. 4,4'-Dimethoxytrityl protection and deprotection schemes were established for both the thiol and hydroxyl groups. The corresponding sulfoxide DNA analogues were obtained by oxidation with hydrogen peroxide. Sulfone DNA analogues were obtained by oxidation of the sulfide DNA with persulfate or hydrogen peroxide in the presence of a titanium silicate catalyst. The physical properties of several representative oligonucleotide analogues were examined, and interpreted in light of a "second-generation" model for DNA strand-strand recognition, a model that emphasizes the role of the polyanionic backbone in diminishing unwanted tendencies of highly functionalized molecules to form "structure" in solution. Even short sulfide-linked DNA analogues displayed,association properties different from those displayed by standard DNA molecules. Complex formation observed with sulfide-linked tetramers by HPLC study in different solvents suggested that the complex is formed using hydrogen bonding. Sulfone-linked dinucleotides display Watson-Crick behavior; the tetramer, however, displayed self-structure. Self-structure and self-aggregation become more prominent as the length of the oligonucleotide analogues increases. The tendency to self-aggregate can be decreased by adding a charged sulfonate group to the 3"-end of the DNA analogue. Features of the second-generation model are important for many areas of nucleic acid chemistry, from the design of nucleic acid therapeutic agents to the search for life on other planets.

Eight different polymerases, chosen from evolutionary families A (Taq, Tfl, HotTub and Tth) and B (Pfu, Pwo, Vent and Deep Vent), were examined for their ability to incorporate 5-position modified 2'-deoxyuridine derivatives that carry a protected thiol group appended via different linkers containing either three or four carbon atoms. This represents the first attempt to incorporate the thiol functionality into DNA via enzymatic synthesis. Each polymerase-substrate combination was evaluated using a hierarchy of increasingly more difficult challenges, starting with incorporation of a single derivative, proceeding to incorporation of two derivatives at adjacent sites and non-adjacent sites, then examining the ability of the polymerase to accept the derivative within the template, and concluding with a challenge involving PCR. The evaluation of thiol-bearing 2'-deoxyuridine derivatives was then extended to consider their chemical stabilities. Stability was found to be less than satisfactory when the thiol functionality has a 'propargylic' relationship to the unsaturation in the linker. The best polymerase-appendage combination used the polymerase from Pyrococcus woesei (Pwo) and the 5'-tBu-SS-CH2-CH2-Cequivalent toC- linker. This pair supported PCR amplification and therefore should have value in artificial in vitro selection experiments. Indeed, we discovered that Pwo and Pfu preferred the derivative triphosphate over TTP, the natural substrate, in competition studies. These studies confirm an earlier suggestion that membership of an evolutionary family of polymerases is a partial predictor of the ability of the polymerase to accept 5-modified 2'-deoxyuridines. Considerable differences are displayed by different members within a polymerase family, however. This remains curious, as the ability of the polymerase to replicate natural DNA with high fidelity and its propensity to exclude unnatural analogs are presumed to be correlated.

The history of life on Earth is chronicled in the geological strata, the fossil record, and the genomes of contemporary organisms. When examined together, these records help identify metabolic and regulatory pathways, annotate protein sequences, and identify animal models to develop new drugs, among other features of scientific and biomedical interest. Together, planetary analysis of genome and proteome databases is providing an enhanced understanding of how life interacts with the biosphere and adapts to global change.

Most modern tools that analyze protein evolution allow individual sites to mutate at constant rates over the history of the protein family. However, Walter Fitch observed in the 1970s that, if a protein changes its function, the mutability of individual sites might also change. This observation is captured in the 'non-homogeneous gamma model', which extracts functional information from gene families by examining the different rates at which individual sites evolve. This model has recently been coupled with structural and molecular biology to identify sites that are likely to be involved in changing function within the gene family. Applying this to multiple gene families highlights the widespread divergence of functional behavior among proteins to generate paralogs and orthologs.

A concise route is described to prepare the 5-aza-7-deazapurine 2 ' -deoxyriboside (4), which presents the puADA hydrogen-bonding pattern, analogous to the hydrogen-bonding pattern presented by 2 ' -deoxyxanthosine (2). The route begins with the commercially available 1-alpha -chloro-2-deoxy-3-5-bistoluoyloxyribofuranose (10), which proves to be a versatile point of entry to beta -2 ' -deoxyribofuranosides. In the first step, 2-nitroimidazole (8) is coupled with 10 to yield intermediate 11. Reduction of the nitro group to an amino group yields 12, which is treated with phenyl isocyanatoformate to complete the nucleobase to yield 13. Removal of the toluoyloxy protecting groups of 13 yields the target nucleoside 4 in 40% overall yield in four steps. In an alternative strategy, convergent coupling of 14 with 10 under basic conditions was attempted but found to yield the heterocycle glycosylated at the undesired position. Compound 13 displays potentially useful fluorescence properties. After excitation at 250 nm, a solution of 13 in MeCN shows a fluorescence emission with a maximum at 410 Dm. Furthermore, 13 is neutral at physiological pH, a property that it shares with natural nucleobases but not xanthosine itself, which is an acid with a pK(a) of ca. 5.6. Furthermore, as part of the design, 4 is made capable of presenting an unshared pair of electrons to the DNA minor groove.

The divergent evolution of protein sequences from genomic databases can be analyzed by the use of different mathematical models. The most common treat all sites in a protein sequence as equally variable. More sophisticated models acknowledge the fact that purifying selection generally tolerates variable amounts of amino acid replacement at different positions in a protein sequence. In their "stationary" versions, such models assume that the replacement rate at individual positions remains constant throughout evolutionary history. "Nonstationary" covarion versions, however, allow the replacement rate at a position to vary in different branches of the evolutionary tree. Recently, statistical methods have been developed that highlight this type of variation in replacement rates. Here, we show how positions that have variable rates of divergence in different regions of a tree ("covarion behavior"), coupled with analyses of experimental three-dimensional structures, can provide experimentally testable hypotheses that relate individual amino acid residues to specific functional differences in those branches. We illustrate this in the elongation factor family of proteins as a paradigm for applications of this type of analysis in functional genomics generally.

Almost a century ago, Wittgenstein pointed out that theory in science is intricately connected to language. This connection is not a frequent topic in the genomics literature. But a case can be made that functional genomics is today hindered by the paradoxes that Wittgenstein identified. If this is true, until these paradoxes are recognized and addressed, functional genomics will continue to be limited in its ability to extrapolate information from genomic sequences.

Oligodeoxynucleotides containing 2'-deoxyxanthosine (X-d) were synthesized in good yield from a O-2,O-6-bis[2-(4-nitrophenyl)ethyl](NPE)-protected phosphoramidite of X-d. Attempts to synthesize a O-6-monoNPE-protected phosphoramidite resulted in formation of a major by-product. The NPE protecting groups were removed by treatment with oximate ion after other protecting groups were removed with aqueous NH,OH solution. The composition of the synthetic oligonucleotides was verified by enzymatic degradation and MALDI-TOF mass spectrometry. The efficacy of this procedure allowed isolation of oligodeoxynucleotides containing multiple X-d residues.

Multiple sequence alignments (MSAs) are frequently used in the study of families of protein sequences or DNA/RNA sequences. They are a fundamental tool for the understanding of the structure, functionality and, ultimately, the evolution of proteins. A new algorithm, the Circular Sum (CS) method, is presented for formally evaluating the quality of an MSA, It is based on the use of a solution to the Traveling Salesman Problem, which identifies a circular tour through an evolutionary tree connecting the sequences in a protein family. With this approach, the calculation of an evolutionary tree and the errors that it mould introduce can be avoided altogether, The algorithm gives an upper bound, the best score that can possibly be achieved by any MSA for a given set of protein sequences. Alternatively, if presented with a specific MSA, the algorithm provides a formal score for the MSA, which serves as an absolute measure of the quality of the MSA, The CS measure yields a direct connection between an MSA and the associated evolutionary tree, The measure can be used as a tool for evaluating different methods for producing MSAs, A brief example of the last application is provided, Because it weights all evolutionary events on a tree identically, but does not require the reconstruction of a tree, the CS algorithm has advantages over the frequently used sum-of-pairs measures for scoring MSAs, which weight some evolutionary events more strongly than others. Compared to other weighted sum-of-pairs measures, it has the advantage that no evolutionary tree must be constructed, because we can find a circular tour without knowing the tree.

A bioinformatics analysis was conducted on the four members of the uterine serpin (US) family of serpins. Evolutionary analysis of the protein sequences and 86 homologous serpins by maximum parsimony and distance methods indicated that the uterine serpins proteins form a clade distinct from other serpins. Ancestral sequences were reconstructed throughout the evolutionary tree by parsimony. These suggested that some branches suffered a high ratio of nonsynonymous to synonymous mutations, suggesting episodes of adaptive evolution within the serpin family. Analysis of the sequences by neutral evolutionary distance methods suggested that the uterine serpins diverged from other serpins prior to the divergence of the mammals from other vertebrates. The porcine uterine serpins are paralogs that diverged from a single common ancestor within the Sus genus after pigs separated from other artiodactyls. The uterine serpins contain several protein kinase C and tyrosine kinase phosphorylation sites. These sites may be important for the lymphocyte-inhibitory activity of OvUS if, Like other basic proteins, OvUS can cross the cell membrane of an activated lymphocyte. Internalized OvUS could serve as an alternative target to protein kinases important for the mitogenic response to antigens. (C) 2000 Wiley-Liss, Inc.

GC-MS on the Viking 1976 Mars missions did not detect organic molecules on the Martian surface, even those expected from meteorite bombardment. This result suggested that the Martian regolith might hold a potent oxidant that converts all organic molecules to carbon dioxide rapidly relative to the rate at which they arrive. This conclusion is influencing the design of Mars missions. We reexamine this conclusion in light of what is known about the oxidation of organic compounds generally and the nature of organics likely to come to Mars via meteorite. We conclude that nonvolatile salts of benzenecarboxylic acids, and perhaps oxalic and acetic acid, should be metastable intermediates of meteoritic organics under oxidizing conditions. Salts of these organic acids would have been largely invisible to GC-MS, Experiments show that one of these, benzenehexacarboxylic acid (mellitic acid), is generated by oxidation of organic matter known to come to Mars, is rather stable to further oxidation, and would not have been easily detected by the Viking experiments. Approximately 2 kg of meteorite-derived mellitic acid may have been generated per m(2) of Martian surface over 3 billion years. How much remains depends on decomposition rates under Martian conditions, As available data do not require that the surface of Mars be very strongly oxidizing, some organic molecules might be found near the surface of Mars, perhaps in amounts sufficient to be a resource. Missions should seek these and recognize that these complicate the search for organics from entirely hypothetical Martian life.

If bioinformatics tools are constructed to reproduce the natural, evolutionary history of the biosphere, they offer powerful approaches to some of the most difficult tasks in genomics, including the organization and retrieval of sequence data, the updating of massive genomic databases, the detection of database error, the assignment of introns, the prediction of protein conformation from protein sequences, the detection of distant homologs, the assignment of function to open reading frames, the identification of biochemical pathways from genomic data, and the construction of a comprehensive model correlating the history of biomolecules with the history of planet Earth.

A prediction has been prepared ab initio for the secondary structure of the hydroxymethyldihydropterin pyrophosphokinase (HPPK) family of proteins starting from a set of aligned homologous protein sequences. Attempts to identify a fold by threading failed, judging; by the inability to iind a threading "hit" that had a secondary structure that was plausibly congruent to the predicted secondary structure for the HPPK, family. Therefore, a set of tertiary structure models was assembled ab initio, where alternative models were built and used to select between alternative secondary structure models. This prediction report illustrates the importance of non-computational approaches to structure prediction at its present frontier, which is to obtain medium resolution models of tertiary structure. (C) 1999 Academic Press.

The deoxygenation and 2'-labeling of a C-ribonucleoside by reductive elimination with tri-n-butyltin hydride[H-3] in a one-pot reaction is described. The approach is a safe, simple, efficient, and general method for 2'-labeling of nucleosides. (C) 1999 Elsevier Science Ltd. All rights reserved.

Bovine seminal ribonuclease (BS RNase) displays immunosuppressive and antitumor activities on mammalian cells, whereas bovine pancreatic ribonuclease (RNase A) is not cytotoxic. To learn more about the mechanism of BS RNase cytotoxicity, various mutants and hybrid proteins were prepared. A series of RNase A variants substituted with amino acid residues from BS RNase were prepared. Concerning quaternary structure, a significant impact was achieved in the variant TM (Q28L K31C S32C), which forms a dimer joined covalently by two intersubunit disulfide bonds. This variant is more efficient than RNase A but less active than BS RNase. Introduction of cationic residues at positions 55, 62, and 64 or substitution at positions 111 and 113 enhanced the immunosuppressive activity of RNase A but did not confer its antitumor activity. The substitution at positions 28, 31, 32, 55, 62, 64, 111, and 113 in variant T13 exerted the best immunosuppressive and antitumor effect observed among the round of the RNase A variants. Replacement of the active-site histidine residues H12 and H119 with asparagine led to the loss of both catalytic and biological activities. Five previously prepared hybrid enzymes (SRA 1-5), synthesized by introducing 16 amino acid residues from RNase A into BS RNase, exerted the same immunosuppressive activities as did the wild-type BS RNase. However, the substitution at positions 111, 113, and 115 in variant SRA 5 caused a marked decrease in its antitumor effect, indicating that these residues play an important role in antitumor efficiency. A different mechanism of action of RNases on tumor cells and/or on blastogenic transformed lymphocytes has been assumed. (C) 1999 Elsevier Science Inc. All rights reserved.

variant of bovine pancreatic ribonuclease A has been prepared with seven amino acid substitutions (Q55K, N62K, A64T, Y76K, S80RI E111G, N113K). These substitutions recreate in RNase A the basic surface found in bovine seminal RNase, a homologue of pancreatic RNase that diverged some 35 million years ago. Substitution of a portion of this basic surface (positions 55, 62, 64, 111 and 113) enhances the immunosuppressive activity of the RNase variant, activity found in native seminal RNase. while substitution of another portion (positions 76 and 80) attenuates the activity. Further, introduction of Gly at position 111 has been shown to increase the catalytic activity of RNase against double-stranded RNA. The variant and the wild-type (recombinant) protein were crystallized and their structures determined to a resolution of 2.0 Angstrom. Each of the mutated amino acids is seen in the electron density map. The main change observed in the mutant structure compared with the wild-type is the region encompassing residues 16-22, where the structure is more disordered. This loop is the region where the polypeptide chain of RNase A is cleaved by subtilisin to form RNase S, and undergoes conformational change to allow residues 1-20 of the RNase to swap between subunits in the covalent seminal RNase dimer.

The syntheses of the 5'-triphosphates of 2'-deoxyisoguanosine (= p(3)isoG(d)) and 2'-deoxy-5-methylisocytidine (= p(3)me(5)isoC(d)), two new bases for the genetic alphabet, are described. The triphosphates were synthesized from the corresponding nucleosides using a transient-protection procedure. The introduction of a methyl group at the 5-position of 2'-deoxyisocytidine remarkably improved the stability of the triphosphate. Characterization of the triphosphates included enzymatic incorporation opposite the complementary base in a template oligonucleotide.

5-(3 "-Aminopropynyl)-2'-deoxyuridine (dJ), a modi fled nucleoside with a side chain carrying a cationic functional group, was incorporated into an oligonucleotide library, which was amplified using the Vent DNA polymerase in a polymerase chain reaction (PCR). When coupled to an in vitro selection procedure, PCR amplification generated receptors that bind ATP. This is the first example of an in vitro selection generating oligonucleotide receptors where the oligonucleotide library;has incorporated a cationic nucleotide functionality. The selection yielded functionalized receptors having sequences differing from a motif known to arise in a standard selection experiment using only natural nucleotides. Surprisingly, both the natural and the functionalized motifs convergently evolved to bind not one, but two ATP molecules cooperatively. Likewise, the affinity of the receptors for ATP had converged; in both cases, the receptors are half saturated at the 3 mM concentrations of ATP presented during the selection. The convergence of phenotype suggests that the outcome of this selection experiment was determined by features of the environment during which selection occurs, in particular, a highly loaded affinity resin used in the selection step. Further, the convergence of phenotype suggests that the optimal molecular phenotype has been achieved by both selections for the selection conditions. This interplay between environmental conditions demanding a function of a biopolymer and the ability of the biopolymer to deliver that function is strictly analogous to that observed during natural selection, illustrating the nature of life as a self-sustaining chemical system capable of Darwinian evolution.

A set of substituted bisguanidines have been prepared and examined for their ability to bind and catalyze the hydrolysis of uridylyl-3',5'-uridine (UpU), an unactivated RNA substrate in water. The unexpected result is that this set includes both catalysts (binding the transition state better than the ground state) and anticatalysts (binding the ground state better than the transition state), each with respectable rate enhancements and/or affinities, despite the fact that these molecules all have very similar structures. These results therefore show the level of sophistication that must be achieved in the conformational theory of small molecules if we hope to truly "design" supramolecular structures that bind preferentially to a transition state over the ground state.

DNA polymerases are desired that incorporate modified nucleotides into DNA with diminished pausing, premature termination and infidelity. Reported here is a simple in vitro assay to screen for DNA polymerases that accept modified nucleotides based on a set of primer extension reactions, In combination with the scintillation proximity assay (SPA(TM)), this allows rapid and simple screening of enzymes for their ability to elongate oligonucleotides in the presence of unnatural nucleotides, A proof of the concept is obtained using pseudo-thymidine (psi T), the C-nucleoside analog of thymidine, as the unnatural substrate. The conformational properties of psi T arising from the carbon-carbon bond between the sugar and the base make it an interesting probe for the importance of conformational restraints in the active site of polymerases during primer elongation, From a pool of commercially available thermostable polymerases, the assay identified Taq DNA polymerase as the most suitable enzyme for the PCR amplification of oligonucleotides containing psi T. Subsequent experiments analyzing PCR performance and fidelity of Taq DNA polymerase acting on psi T are presented. This is the first time that PCR has been performed with a C-nucleoside.

The preparation of a novel phosphoramidite monomer based on thyminyl acetic acid coup led to the secondary nitrogen of 2-(2-amino-ethylamino)ethanol is described. This monomer can be used to attach a deoxynucleotide to the carboxy terminus of a PNA oligomer by solid-phase synthesis. The resulting PNA primer is recognized as a substrate by various DNA polymerases.

How small can a microorganism be?
Benner, SA Size Limits of Very Small Microorganisms: Proceedings of a Workshop, Steering Group on Astrobiology of the Space Studies Board
, National Research Council 126-135 (1999)

Two predictions have been prepared for the fold of initiation factor 5A (IF5A) starting from a set of homologous sequences. In the first, a secondary structural model was predicted for the protein in 1994, when only eleven homologs land no eubacterial homologs) had been sequenced. The second was made recently, after genome projects had generated a total of 33 sequences for the protein family from species of all three kingdoms of life. With the second set of sequences, but not with the first, it was possible to predict that the N-terminal domain of the protein folds in a possibly open beta-barrel/sandwich core structure, with a short helix capping one side of the barrel. We place the pair; of predictions in the public domain before an experimental structure is known. This example illustrates the impact of genome sequencing projects on structure prediction from sequence alignments. (C) 1998 Academic Press.

To enable application of postgenomic evolutionary approaches to understand the divergence of behavior and function in ribonucleases (RNases), the impact of divergent sequence on the divergence of tertiary and quaternary structure is analyzed in bovine pancreatic and seminal ribonucleases, which differ by 23 amino acids, In a crystal, seminal RNase is a homodimer joined by two "antiparallel" intersubunit disulfide bonds between Cys-31 from one subunit and Cys-32' from the other and having composite active sites arising from the "swap" of residues 1-20 from each subunit. Specialized Edman degradation techniques have completed the structural characterization of the dimer hi solution, new crosslinking methods have been developed to assess the swap, and sequence determinants of quaternary structure have been explored by protein engineering using the reconstructed evolutionary history of the protein family as a guide. A single Cys at either position 32 (the first to be introduced during the divergent evolution of the family) or 31 converts monomeric RNase A into a dimer. Even with an additional Phe at position 31, another residue introduced early in the seminal lineage, swap is minimal, A hydrophobic contact formed by Leu-28, however, also introduced early in the seminal lineage, increases the amount of "antiparallel" connectivity of the two subunits and facilitates swapping of residues 1-20. Efficient swapping requires addition of a Pro at position 19, a residue also introduced early in the divergent evolution of the seminal RNase gene. Additional cysteines required for dimer formation are found to slow refolding of the protein through formation of incorrect disulfide bonds, suggesting a paradox in the biosynthesis of the protein. Further studies showed that the dimeric form of seminal RNase known in the crystal is not the only form in vivo, where a substantial amount of heterodimer is known, These data complete the acquisition of the background needed to understand the evolution of new structure, behavior, and function in the seminal RNase family of proteins.

Bovine seminal ribonuclease (RNase) binds, melts, and (in the case of RNA) catalyzes the hydrolysis of double-stranded nucleic acid 30-fold better under physiological conditions than its pancreatic homologue, the well-known RNase A. Reported here are site-directed mutagenesis experiments that identify the sequence determinants of this enhanced catalytic activity. These experiments have been guided in part by experimental reconstructions of ancestral RNases from extinct organisms that were intermediates in the evolution of the RNase superfamily. It is shown that the enhanced interactions between bovine seminal RNase and double-stranded nucleic acid do not arise from the increased number of basic residues carried by the seminal enzyme. Rather, a combination of a dimeric structure and the introduction of two glycine residues at positions 38 and 111 on the periphery of the active site confers the full catalytic activity of bovine seminal RNase against duplex RNA. A structural model is presented to explain these data, the use of evolutionary reconstructions to guide protein engineering experiments is discussed, and a new variant of RNase A, A(Q28L K31C S32C D38G E111G), which contains all of the elements identified in these experiments as being important for duplex activity, is prepared. This is the most powerful catalyst within this subfamily yet observed, some 46-fold more active against duplex RNA than RNase A.

Examination of several commercially available thermostable DNA polymerases identifies 9 degrees N DNA polymerase as single enzyme that could incorporate two components of an expanded genetic alphabet, 2,4-diaminopyrimidine and xanthosine as deoxynucleoside triphosphate opposite their cognate base in a DNA template. (C) 1998 Elsevier Science Ltd. All rights reserved.

HIV-1 reverse transcriptase (RT) incorporates 2'-deoxyisoguanosine triphosphate (d-isoGTP) opposite thymidine (T) in a DNA template and opposite uracil (U) in an RNA template about 10 times more efficiently than the eukaryotic DNA polymerase a, both in the absence and presence of dATP. (C) 1998 Elsevier Science Ltd. All rights reserved.

Background: Distance geometry methods allow protein structures to be constructed using a large number of distance constraints, which can be elucidated by experimental techniques such as NMR. New methods for gleaning tertiary structural information from multiple sequence alignments make it possible for distance constraints to be predicted from sequence information alone. The basic distance geometry method can thus be applied using these empirically derived distance constraints. Such an approach, which incorporates a novel combinatoric procedure, is reported here. Results: Given the correct sheet topology and disulfide formations, the fully automated procedure is generally able to construct native-like C alpha models for eight small beta-protein structures. When the sheet topology was unknown but disulfide connectivities were included, ail sheet topologies were explored by the combinatorial procedure. Using a simple geometric evaluation scheme, models with the correct sheet topology were ranked first in four of the eight example cases, second in three examples and third in one example. if neither the sheet topology nor the disulfide connectivities were given a priori, all combinations of sheet topologies and disulfides were explored by the combinatorial procedure. The evaluation scheme ranked the correct topology within the top five folds for half the example cases. Conclusions: The combinatorial procedure is a useful technique for identifying a limited number of low-resolution candidate folds for small, disulfide-rich, beta-protein structures. Better results are obtained, however, if correct disulfide connectivities are known in advance, Combinatorial distance constraints can be applied whenever there are a sufficiently small number of finite connectivities. (C) Current Biology Ltd ISSN 1359-0278.

The synthesis of oligonucleotides containing 2'-deoxy-5-methylisocylidine and 2'-deoxyisoguanosine using phosphoramidite chemistry in solid-phase oligonucleotide synthesis is described. Supporting previous observations, the N,N-diisobutylformamidine moiety was found to be a far superior protecting group than N-benzoyl for 2'-deoxy-5-methylisoeylidine. 2'-Deoxy-N-2-[(diisobutylamino)methylidene-5'(4,4'-dimethoxytrityl)-5-me thylisocytidine 3'-(2-cyanoethyl diisopropylphosphoramidite) (Ic) incorporated multiple consecutive residues during a standard automated synthesis protocol with a coupling efficiency >99% according to dimethoxytrityl release. Extending coupling times of the standard protocol to greater than or equal to 600 s using 2'-deoxy-N-6-[(diisobutylamino)methylidene]-5'-O-(dimethoxytrityl)-O-2-( diphenylcarbamoyl)isoguanosine, 3'-(2-cyanoethyl diisopropylphosphoramidite) (7e) led to successful incorporation of multiple consecutive 2'-deoxyisoguanosine bases with a coupling efficiency > 97% according to dimethoxytrityl release.

A nonionic RNA analogue of the sequence r(U(SO2)G(SO2)A(SO2)C) has been synthesized where each bridging phosphate diester is replaced by a dimethylene sulfone unit (rSNA). The rSNA was synthesized in solution from 3',5'-bishomo-beta-ribonucleoside derivatives as building blocks. Full experimemtal procedures are provided, and the product and all synthetic inter mediates are fully characterized. The tetramer is nonionic but highly dipolar due to multiple hydrogen bonding opportunities. It is freely soluble in water only at higher pH's, permitting it to be radiolabeled by exchange of the acidic protons cr. to the sulfones with tritiated water. The tritiated molecule was administered intravenously into the tail vein (2.6 mg/kg) of mice, and its distribution was monitored over 48 h. The rSNA, was widely distributed in the biological tissues, including the brain, and excreted in both the feces and the urine. The accumulation of radioactivity was significantly higher in liver and kidney than in other tissues. Radiolabel was recovered from the urine, analyzed by HPLC, and shown to be intact oligonucleotide sulfone. This is the first bioavailability study on a short nonionic oligonucleotide analogue, a class of molecules with potential biomedical applications.

A research program has applied the tools of synthetic organic chemistry to systematically modify the structure of DNA and RNA oligonucleotides to learn more about the chemical principles underlying their ability to store and transmit genetic information. Oligonucleotides (as opposed to nucleosides) have long been overlooked by synthetic organic chemists as targets for structural modification. Synthetic chemistry has now yielded oligonucleotides with 12 replicatable letters, modified backbones, and new insight into why Nature chose the oligonucleotide structures that she did.

A library of alkenes is generated using the olefin metathesis reaction, and converted to a set of diols suitable for a receptor assisted combinatorial synthesis (RACS) experiment with borate as a linker.

A coenzyme B-12-dependent ribonucleotide reductase was purified from the archaebacterium Thermoplasma acidophila and partially sequenced, Using probes derived from the sequence, the corresponding gene was cloned, completely sequenced, and expressed in Escherichia coli, The deduced amino acid sequence shows that the catalytic domain of the B-12-dependent enzyme from T, acidophila, some 400 amino acids, is related by common ancestry to the diferric tyrosine radical iron(III)-dependent ribonucleotide reductase from E. coli, yeast, mammalian viruses, and man, The critical cysteine residues in the catalytic domain that participate in the thiyl radical-dependent reaction have been conserved even though the cofactor that generates the radical is not, Evolutionary bridges created by the T. acidophila sequence and that of a B-12-dependent reductase from Mycobacterium tuberculosis establish homology between the Fe-dependent enzymes and the catalytic domain of the Lactobacillus leichmannii B-12-dependent enzyme as well, These bridges are confirmed by a predicted secondary structure for the Lactobacillus enzyme, Sequence similarities show that the N-terminal domain of the T. acidophila ribonucleotide reductase is also homologous to the anaerobic ribonucleotide reductase from E. coli, which uses neither B-12 nor Fe cofactors, A predicted secondary structure of the N-terminal domain suggests that it is predominantly helical, as is the domain in the aerobic E. coli enzyme depending on Fe, extending the homologous family of proteins to include anaerobic ribonucleotide reductases, B-12 ribonucleotide reductases, and Fe-dependent aerobic ribonucleotide reductases, A model for the evolution of the ribonucleotide reductase family is presented; in this model, the thiyl radical-based reaction mechanism is conserved, but the cofactor is chosen to best adapt the host organism to its environment. This analysis illustrates how secondary structure predictions can assist evolutionary analyses, each important in ''post-genomic'' biochemistry.

A secondary structure has been predicted for the C termini of the fibrinogen beta and gamma chains from an aligned set of homologous protein sequences using a transparent method that extracts conformational information from patters of variation and conservation, parsing strings, and patterns of amphiphilicity. The structure is modeled to form two domains, the first having a core parallel sheet flanked on one side by at least two helices and on the other by an antiparallel amphiphilic sheet, with an additional helix connecting the two sheets. The second domain is built entirely from beta strands. (C) 1997 Wiley-Liss, Inc.

A secondary structure has been predicted for the heat shock protein HSP90 family from an aligned set of homologous protein sequences by using a transparent method in both manual and automated implementation that extracts conformational information from patterns of variation and conservation within the family. No statistically significant sequence similarity relates this family to any protein with known crystal structure. However, the secondary structure prediction, together with the assignment of active site positions and possible biochemical properties, suggest that the fold is similar to that seen in N-terminal domain of DNA gyrase B (the ATPase fragment). (C) 1997 Wiley-Liss, Inc.

The simultaneous substitution of pairs of buried amino acid side chains during divergent evolution has been examined in a set of protein families with known crystal structures, A weak signal is found that shows that amino acid pairs near in space in the folded structure preferentially undergo substitution in a compensatory way, Three different physicochemical types of covariation 'signals' were then examined separately, with consideration given to the evolutionary distance at which different types of compensation occur. Where the compensatory covariation tends towards retaining the combined residue volumes, the signal is significant only at very low evolutionary distances. Where the covariation compensates for changes in the hydrogen bonding, the signal is strongest at intermediate evolutionary distances. Covariations that compensate for charge variations appeared with equal strength at all the evolutionary distances examined, A recipe is suggested for using the weak covariation signal to assemble the predicted secondary structural elements, where the evolutionary distance, covariation type and weighting are considered together with the tertiary structural context (interior or surface) of the residues being examined.

A model experiment for the 'on-line' screening of substrate libraries by enzymes using combinatorial Libraries in combination with electrospray ionization-Fourier transform ion cyclotron resonance (ESI-FTICR) mass spectrometry has been performed, The reaction between the electrophilic substrate 1-chloro-2,4-dinitrobenzene and components of a H-gamma-Glu-Cys-Xxx-OH library, catalyzed by glutathione-S-transferase, has been monitored, It shows the feasibility of two-dimensional screening of substrate libraries by ESI-FTICR mass spectrometry. (C) 1997 John Wiley & Sons, Ltd.

3',5'-O-bis-tert-butyldimethylsilyl-2'deoxyguanosine is converted in two steps to 3',5'-O-tert-butyldimethylsilyl-6-O-aryl2'-deoxyxanthosine. This compound is used to make a 2'-deoxyisoguanosine analog with a functionalized side chain.

Paleomolecular biochemistry is a new field of science that seeks to understand how life emerged and developed in interaction with its geophysical surroundings. It is an experimental science, involving reconstruction of extinct biomolecules in the laboratory, studying their properties in the laboratory, and inferring details of their behavior and function in the context of geological data. An outline is provided of some tools of this field, together with its application to the study of two specific systems, ribonuclease and alcohol dehydrogenase.

Bovine seminal ribonuclease (RNase) diverged from pancreatic RNase after a gene duplication ca. 35 million years ago. Members of the seminal RNase gene family evidently remained as unexpressed pseudogene for much of its evolutionary history. Between 5 and 10 million years ago, however, after the divergence of kudu but before the divergence of ox, evidence suggests that the pseudogene was repaired and expressed. Intriguingly, detailed analysis of the sequences suggests that the repair may have involved gene conversion, transfer of information from the pancreatic gene to the RNase pseudogene. Further, the ratio of non-silent to silent substitutions suggests that the pancreatic RNases are divergently evolving under functional constraints, the seminal RNase pseudogenes are diverging under no functional constraints, while the genes expressed in the seminal plasma are evolving extremely rapidly in their amino acid sequences, as if to fulfil a new physiological role.

6-Aminopyrazin-2(1H)-one, when incorporated as a pyrimidine-base analog into an oligonucleotide chain, presents a H-bond acceptor-donor-donor pattern to a complementary purine analog. When paired with the corresponding donor-acceptor-acceptor purine in oligonucleotides, the heterocycle selectively contributes to the stability of the duplex, presumably by forming a base pair of Warson-Click geometry joined by a non-standard H-bonding pattern. Aspects of the nucleoside chemistry, including syntheses of the beta-furanosyl ribonucleoside 1, the ribonucleoside triphosphate 2 and the ribonucleoside bisphosphate 3 of 6-aminopyrazin-2(1H)-one are reported here. In aqueous solution, the ribonucleoside 1 was found to undergo acid- and base-catalyzed rearrangement with an apparent half-life of ca. 63 h at neutral pH and 30 degrees. The rearrangement appears to be specific acid- and base-catalyzed. The thermodynamically most stable compound formed during this rearrangement reaction was isolated by HPLC and shown to be the beta-pyranosyl form 4 of the 6-aminopyrazin-2(1H)-one nucleoside in its C-4(1) chair conformation. This reactivity of 1 under physiological conditions may explain why Nature does not use this particular heterocyclic system to implement an acceptor-donor-donor H-bonding pattern in the genetic alphabet.

A 6-aminopyrazin-2(1H)-one (pyADD), when incorporated as a pyrimidine-base analog into an oligonucleotide chain, presents a H-bond acceptor-donor-donor pattern to 5-aza-7-deazaisoguanine (puDAA), the complementary donor-acceptor-acceptor purine analog. Reported here are the syntheses of the phosphoramidite of the 2'-deoxyribonucleoside bearing the puDAA base, oligonucleotides containing this nucleoside unit, the enzyme-assisted synthesis of oligoribonucleotides containing the pyADD ribonucleoside, and the molecular-recognition properties of this non-standard base pair in an oligonucleotide context. A series of melting experiments suggests that the pyADD . puDAA base pair contributes to the relative stability of a duplex structure approximately the same as an A . T base pair, and significantly more than mismatches between these non-standard bases and certain standard nucleobases. The pyADD nucleoside bisphosphate is accepted by T4 RNA ligase, but the triphosphate of the pyADD nucleoside was not incorporated by T7 RNA polymerase opposite the puDAA nucleobase in a template. Oligonucleotides containing the pyADD base slowly undergo a reversible first-order reaction, presumably an epimerization process to give the alpha-D-anomer. These experiments provide the tools for laboratory-based use of the pyADD . puDAA base pair as a component of an oligonucleotide-like molecular-recognition system based on an expanded genetic alphabet.

Analogs of RNA have been synthesized where each of the phosphodiester linking groups is replaced by dimethylene sulfone units (sulfone-linked nucleic acid analogs of RNA, or ''rSNAs''). These are the first fully nonionic analogs of RNA to be prepared as oligomers. Sequences leading to the octamer 5'-r(A(SO2)U(SO2)G(SO2)U(SO2)C(SO2)-A(SO2)U)-3' have been prepared from 3',5'-bishomo-beta-ribonucleoside derivatives as building blocks prepared from diacetone D-glucose, and their chemistry has been explored. Coupling was performed in solution via S(N)2 reactions between a thiol from one fragment and a bromide from the other, oxidation of the resulting thioether to the sulfone, and deprotection of a terminal primary hydroxyl group and regioselective conversion of it-in the presence of secondary hydroxyl groups--to an active group (thiol or bromide) to yield another fragment for coupling. Base-labile protecting groups were used for the nucleobases, and one-step full deprotection was achieved using 1 M NaOH. The target octamer and each isolated intermediate were characterized by NMR, UV spectroscopy, and mass spectrometry. While chemical reactions involving longer rSNAs were in several cases retarded relative to analogous reactions with monomers, some rates were enhanced. In water, the rSNA octamer displayed a thermal transition in the UV spectrum above 65 degrees C with a large hyperchromicity. The behaviors of rSNAs suggest roles for the polyanionic backbone in DNA and RNA beyond its role in conferring aqueous solubility. The repeating anionic charges in natural oligonucleotides evidently also control the potent molecular recognition properties of these richly functionalized molecules, direct strand-strand interactions to the part of the biopolymer distant from the backbone (the Watson-Crick edge of the nucleobases), cause the polymer to favor an extended conformation, and ensure that the physical properties of the oligonucleotide are largely independent of its sequence. This suggests structural features that must be built into nonionic oligonucleotide analogs generally.

Mammalian DNA polymerases alpha and epsilon, the Klenow fragment of Escherichia coli DNA polymerase I and HIV-1 reverse transcriptase (RT) were examined for their ability to incorporate components of an expanded genetic alphabet in different forms. Experiments were performed with templates containing 2'-deoxyxanthosine (dX) or 2'-deoxy-7-deazaxanthosine (c(7)dX), both able to adopt a hydrogen bonding acceptor-donor-acceptor pattern on a purine nucleus (puADA). Thus these heterocycles are able to form a non-standard nucleobase pair with 2,4-diaminopyrimidine (pyDAD) that fits the Watson-Crick geometry, but is joined by a non-standard hydrogen bonding pattern. HIV-1 RT incorporated d(pyDAD)TP opposite dX with a high efficiency that was largely independent of pH. Specific incorporation opposite c(7)dX was significantly lower and also independent of pH. Mammalian DNA polymerases alpha and epsilon from calf thymus and the Klenow fragment from E.coli DNA polymerase I failed to incorporate d(pyDAD)TP opposite c(7)dX.

Electrospray ionization coupled with Fourier transform ion cyclotron resonance (FTICR) mass spectrometry has been used to provide information about complete combinatorial libraries of small peptides containing 10(3)-10(4) components, The fidelity of attempted synthesis steps can be ascertained rapidly, and, when the extremely high resolution FTICR mass spectra are combined with appropriate computer simulation, both diversity and degeneracy of the libraries as synthesized can be assessed.

The conversion of an alkylsulfonate to an iodide with triphenylphosphine/iodine in benzene has been performed for a nucleoside. Starting from thymidine, 3'-protected 5'-deoxy-5'-iodomethylthymidine was synthesized in 4 steps.

Angiogenin, a member of the pancreatic-like ribonuclease family with a special biological action (RISBAses), is a basic protein that induces blood vessel formation. Another member of these special ribonucleases, bovine seminal ribonuclease (BS RNase), displays biological properties, including aspermatogenic, embryotoxic, antitumor and immunosuppressive activities. The effects of two angiogenin preparations tested on the biological activities mentioned above are reported and compared with those of BS RNase and RNase A. In contrast to RNase A, which was ineffective in all biological activities tested, angiogenin suppressed significantly the proliferation of human lymphocytes stimulated by phytohemagglutinin or concanavalin A or by allogenic human lymphocytes (mixed lymphocyte culture). However, angiogenin did not affect the growth of human tumor cell lines, development of cow acid mouse embryos and spermatogenicity in mice. On the basis of these results, angiogenin is the first monomeric ribonuclease described so far that displays immunosuppressive activity similar to that of the dimeric BS RNase. The immunosuppressive activity of angiogenin might synergize with the effect on neovascularization of tumor tissues and thus contribute to the development of tumor.

A crystal structure has been solved for an analog of the r(ApU) ribodinucleotide, r(Aso(2)U), where a bridging non-ionic dimethylene sulfone linker replaces the phosphodiester linking group found in natural RNA, Crystals of the single-stranded state of r(Aso(2)U) were obtained from water at 50 degrees C, In these crystals, one hydrogen bond is formed between bases from different strands and base stacking occurs in intermolecular 'home-A' and 'homo-U' stacks, Similar to typical oligoribonucleotides, the ribose rings adopt N-type conformations and dihedral angles chi are in the anti range, The all-trans rotamer of the CH2-SO2-CH2-CH2 bridge was found, which leads to a large adenine-uracil distance, Qualitative analysis of a NOESY spectrum of the Aso(2)U part in r(Uso(2)Cso(2)Aso(2)U) dissolved in a dimethylsulfoxide-D2O mixture indicates that the conformation observed in the crystal is also populated in solution, Comparison with the structure of r(Gso(2)C), which has been crystallized in the Watson-Crick paired state, shows that a rotation around zeta by +112 degrees leads from the observed, single-stranded state to a conformation that is compatible with formation of a duplex, A concerted translgauche flip of alpha and gamma then yields the standard conformer of A-type RNA helices, From the observed structure of r(Gso(2)C) and other oligonucleotides it is anticipated that this flip will also revert the ribose pucker from C2'-exo to C3'-endo.

THE sequences of proteins from ancient organisms can be reconstructed from the sequences of their descendants by a procedure that assumes that the descendant proteins arose from the extinct ancestor by the smallest number of independent evolutionary events ('parsimony')(1,2). Tbe reconstructed sequences can then be prepared in the laboratory and studied(3,4). Thirteen ancient ribonucleases (RNases) have been reconstructed as intermediates in the evolution of the RNase protein family in artiodactyls (the mammal order that includes pig, camel, deer, sheep and ox)(5). The properties of the reconstructed proteins suggest that parsimony yields plausible ancient sequences. Going back in time, a significant change in behaviour, namely a fivefold increase in catalytic activity against double-stranded RNA, appears in the RNase reconstructed for the founding ancestor of the artiodactyl lineage, which lived about 40 million years ago(6). This corresponds to the period when ruminant digestion arose in the artiodactyls, suggests that contemporary artiodactyl digestive RNases arose from a non-digestive ancestor, and illustrates how evolutionary reconstructions can help in tbe understanding of physiological function within a protein family(7-9).

The ability of DNA polymerases (pols) to catalyze the template-directed synthesis of duplex oligonucleotides containing a nonstandard Watson-Crick base pair between a nucleotide bearing a 5-(2,4-diaminopyrimidine) heterocycle (d kappa) and a nucleotide bearing either deoxyxanthosine (dX) or N-1-methyloxoformycin B (pi) has been investigated, The kappa-X and kappa-pi base pairs are joined by a hydrogen bonding pattern different from and exclusive of those joining the AT and GC base pairs, Reverse transcriptase from human immunodeficiency virus type 1 (HIV-1) incorporates dXTP into an oligonucleotide opposite d kappa in a template with good fidelity, With lower efficiency and fidelity, HIV-1 reverse transcriptase also incorporates d kappa TP opposite dX in the template, With d pi in the template, no incorporation of dKTP was observed with HIV reverse transcriptase, The Klenow fragment of DNA pol I from Escherichia coli does not incorporate d kappa TP opposite dX in a template but does incorporate dXTP opposite d kappa, Bovine DNA pols alpha, beta, and epsilon accept neither dXTP opposite d kappa nor d kappa TP opposite d pi. DNA pols alpha and epsilon (but not beta) incorporate d kappa TP opposite dX in a template but discontinue elongation after incorporating a single additional base, These results are discussed in light of the crystal structure for pol beta and general considerations of how polymerases must interact with an incoming base pair to faithfully copy genetic information.

A bona fide consensus prediction for the secondary and supersecondary structure of the serine-threonine specific protein phosphatases is presented. The prediction includes assignments of active site segments, an internal helix, and a region of possible 3(10) helical structure. An experimental structure for a member of this family of proteins should appear shortly, allowing this prediction to be evaluated. (C) 1995 Wiley-Liss, Inc.

Two separate unrefined models for the secondary structure of two subfamilies of the 6-phospho-beta-D-galactosidase superfamily were independently constructed by examining patterns of variation and conservation within homologous protein sequences, assigning surface, interior, parsing, and active site residues to positions in the alignment, and identifying periodicities in these. A consensus model for the secondary structure of the entire superfamily was then built. The prediction tests the limits of an unrefined prediction made using this approach in a large protein with substantial functional and sequence divergence within the family. The protein belongs to the (alpha-beta class), with the core beta strands aligned parallel. The supersecondary structural elements that are readily identified in this model is a parallel beta sheet built by strands C, D, and E, with helices 2 and 3 connecting strands (C + D) and (D + E), respectively, and an analogous beta-alpha unit (strand G and helix 7) toward the end of the sequence, The resemblance of the supersecondary model to the tertiary structure formed by 8-fold alpha-beta barrel proteins is almost certainly not coincidental. (C) 1995 Wiley-Liss, Inc.

A secondary structure has been predicted for the protein kinase C2 regulatory domain found in homologous form in synaptotagmin, some phospholipases, and some GTP activated proteins. The proposed structure is built from seven consecutive beta strands followed by a terminal alpha helix. Considerations of overall surface exposure of individual secondary structural elements suggest that these are packed into a 2-sheet beta sandwich structure, with one of only three of the many possible folds being preferred. (C) 1995 Wiley-Liss, Inc.

Two bona fide consensus predictions of secondary and tertiary structure in a protein family, made and announced before experimental structures were known, are evaluated in light of the subsequently determined experimental structures. The first, for phospho-beta-galactosidase, identified the core strands of an 8-fold alpha-beta barrel, and identified the 8-fold alpha-beta barrel itself, which was found in the subsequently determined experimental structure to be the core folding domain. The second, for synaptotagmin, identified seven out of eight beta-strands in the structure correctly, missing only a noncore strand. Three preferred ''topologies'' were selected from several hundred thousand possible topologies of these seven predicted strands using a rule-based analysis. The subsequently determined experimental structure showed that these seven strands in synaptotagmin adopt one of the three preferred topologies. We were unable, however, to identify the correct topology from among these three topologies. (C) 1995 Wiley-Liss, Inc.

Analysis of a crystal structure of alcohol dehydrogenase (Adh) from horse liver suggests that Trp54 in the homologous yeast alcohol dehydrogenase prevents the yeast enzyme from efficiently catalysing the oxidation of long-chain primary alcohols with branching at the 4 position (e.g. 4-methyl-1-pentanol, cinnamyl alcohol). This residue has been altered to Leu by site-directed mutagenesis. The alteration yields an enzyme that serves as an effective catalyst for both longer straight-chain primary alcohols and branched chain alcohols.

Most formal methods for analyzing the divergent evolution of protein sequences assume a Markov model where position i in a polypeptide chain undergoes amino acid substitution independently from position i+1. The large number of aligned homologous sequence pairs available from the exhaustive matching of the protein sequence database makes it possible to examine this assumption empirically. We have constructed a 400 by 400 matrix that reports empirical probabilities for the interconversion of all pairs of dipeptides in proteins undergoing divergent evolution. Comparison of these probabilities with those expected if substitution at adjacent positions in a protein sequence were independent reveals interesting patterns that arise through the breakdown of this assumption. Several of these are useful in extracting conformational information from patterns of conservation and variation in homologous protein sequences. (C) 1994 Academic Press, Inc.

To learn how secondary structure assignments diverge during divergent evolution, pairs of proteins with solved crystal structures were aligned and their assignments compared as a function of evolutionary distance. Residues assigned in one structure to a helix or a strand are frequently paired with residues assigned in the other to a coil. However, residues assigned to a helix in one structure are almost never paired with residues assigned to a strand in the other. This suggests additional limitations to the ''three state residue-by-residue'' score commonly used to evaluate secondary structure predictions and suggests recommendations for how secondary structure predictions should be scored to assess accurately their value as starting points for modelling tertiary structure. (C) 1994 Academic Press, Inc.

Bovine pancreatic ribonuclease A interacts with RNA along multiple binding subsites that essentially recognize the negatively charged phosphates of the substrate. This work gives additional strong support to the existence of the postulated phosphate-binding subsite p2 (Pares, X., Llorens, R., Arus, C., and Cuchillo, C. M. (1980) Eur. J. Biochem. 105, 571-579) and confirms the central role of Lys-7 and Arg-10 in establishing an electrostatic interaction with a phosphate group of the substrate. The effects of charge elimination by Lys-7 --> Gln (K7Q) and/or Arg-10 --> Gln (R10Q) substitutions in catalytic and ligand-binding properties of ribonuclease A have been studied. The values of K(m) for cytidine 2',3'-cyclic phosphate and cytidylyl-3',5'-adenosine are not altered but are significantly increased for poly(C). In all cases, k(cat) values are lower. Synthetic activity, i.e. the reversion of the transphosphorylation reaction, is reduced for K7Q and R10Q mutants and is practically abolished in the double mutant. Finally, the extent of the reaction of the mutants with 6-chloropurine-9-beta-D-ribofuranosyl 5'-monophosphate indicates that the phosphate ionic interaction in P2 is weakened. Thus, p2 modification alters both the catalytic efficiency and the extent of the processes in which an interaction of the phosphate group of the substrate or ligand with the p2-binding subsite is involved.

A consensus prediction for the secondary structure of the pleckstrin homology (PH) domain is presented. The prediction is based on an analysis of patterns of conservation and variation of homologous protein sequences. The structure is predicted to be formed largely from beta strands with a single alpha helix. (C) 1994 Wiley-Liss, Inc.

In aligning homologous protein sequences, it is generally assumed that amino acid substitutions subsequent in time occur independently of amino acid substitutions previous in time, i.e. that patterns of mutation are similar at low and high sequence divergence. This assumption is examined here and shown to be incorrect in an interesting way. Separate mutation matrices were constructed for aligned protein sequence pairs at divergences ranging from 5 to 100 PAM units (point accepted mutations per 100 aligned positions). From these, the corresponding log-odds (Dayhoff) matrices, normalized to 250 PAM units, were constructed. The matrices show that the genetic code influences accepted point mutations strongly at early stages of divergence, while the chemical properties of the side chains dominate at more advanced stages.

Synthesis of a T-deoxyguanosine analog tethered through the exocyclic nitrogen via a 3 carbon chain to the 4-position of an imidazole is described. The imidazole forms a hydrogen bond with the 2'-hydroxyl group of a complementary cytosine bound as a Watson-Crick base pair.

Only 20 amino acids are normally incorporated into proteins synthesized in living cells, and this has limited the structural range of proteins that can be prepared. New methods that allow the incorporation of amino acids that are not normally encoded by natural genes are being developed: these include reassigning functions within the existing genetic code, and expanding the genetic code by constructing additional, non-natural codons. Used in conjunction with recent major advances in understanding protein structure-function relationships, these approaches should extend the range of de novo protein designs that are possible.

The ability of various polymerases to catalyze the template-directed formation of a base pair between isoguanine (iso-G) and isocytosine (iso-C) in duplex oligonucleotides has been investigated. A new procedure was developed for preparing derivatives of deoxyisoguanosine suitable for incorporation into DNA using an automated DNA synthesizer. T7 RNA polymerase, AMV reverse transcriptase, and the Klenow fragment of DNA polymerase all incorporated iso-G opposite iso-C in a template. T4 DNA polymerase did not. Several polymerases also incorporated iso-G opposite T, presumably through pairing with a minor tautomeric form of iso-G complementary to T. In a template, iso-G directs the incorporation of both iso-C and T when Klenow fragment is the catalyst and only U when T7 RNA polymerase is the catalyst. Further, derivatives of iso-C were found to undergo significant amounts of deamination under alkaline conditions used for base deprotection after automated oligonucleotide synthesis. Both the deamination reaction of iso-C and the ambivalent tautomeric forms of iso-G make it unlikely that the (iso-C).(iso-G) base pair was a part of information storage molecules also containing the A.T and G.C base pairs found in primitive forms of life that emerged on planet earth several billion years ago. Nevertheless, the extra letters in the genetic alphabet can serve useful roles in a contemporary laboratory setting.

Surface residues, interior residues, and parsing residues, together with a secondary structure derived from these, are predicted for the MoFe nitrogenase protein in advance of a crystal structure of the protein, scheduled shortly to appear in Nature. By publishing this prediction, we test our method for predicting the conformation of proteins from patterns in the divergent evolution of homologous protein sequences in a way that places the method 'at risk'.

Two types of approaches for predicting the conformation of proteins from sequence data have lately received attention: 'black box' tools that generate fully automated predictions of secondary structure from a set of homologous protein sequences, and methods involving the expertise of a human biochemist who is assisted, but not replaced, by computer tools. A friendly controversy has emerged as to which approach offers a brighter future. In fact, both are necessary. Nevertheless, a snapshot of the controversy at this instant offers much insight into the structure prediction problem itself.

A new synthesis is reported for 4-aminoimidazo[1,2-a]-1,3,5-triazin-2(1H)-one (= 5-aza-7-deaza-iso-guanosine; 8), a purine analog that, when incorporated into an oligonucleotide chain, presents a H-bond donor-acceptor-acceptor pattern to a complementary pyrimidine analog. A protected ribose derivative was coupled to 8 to yield 4-amino-8-(beta-D-ribofuranosyl)imidazo[1,2-a]-1,3,5-triazin-2(8H)-one (= 5-aza-7-deaza-isoguanosine; 11) after deprotection, Alternatively, direct synthesis of both the ribo derivative 11 and the corresponding deoxyribo derivative 17 as the beta-D-anomers was achieved using the enzyme purine nucleoside phosphorylase in a one-pot reaction. This adapts a known synthetic approach to yield a new strategy for obtaining diastereoisomerically pure deoxyribonucleoside analogs on 1-gram scales.

A yeast-catalyzed reduction of dimethyl (2S,3S)-2-allyl-3-hydroxyglutarate is the key step in the preparation of bis-homo, branched-chain nucleoside analogues. To establish unambiguously the stereochemical course of the microbial reaction, the product has been converted to a derivative esterified with camphanoyl chloride, and a crystal structure of the derivative solved.

Cyclopentane derivatives bearing a 3-(hydroxymethyl) group, a 4-(2-hydroxyethyl) functionality, and a nucleoside base are carbocyclic variants of nucleoside analogs previously described as building blocks for the preparation of oligonucleotide analogs having dimethylene sulfone (= methanosulfonylmethano) linking groups replacing the phosphodiester linking units found in natural oligonucleotides. These carbocyclic nucleoside analogs (e.g. 17 and 20) are stable to both acid-catalyzed depurination and base-catalyzed hydrolysis, in contrast with most non-ionic analogs of oligonucleotides. Furthermore, they can be prepared with complete control over the stereochemistry at the 'anomeric' center. A procedure is given for preparing these purine-nucleoside analogs via the construction of an enantiomerically pure carbocyclic skeleton (Schemes 1-3), followed by a Mitsunobu-type reaction to introduce the purine-base derivatives (Scheme 4). Furthermore, preliminary results for the coupling of these analogs to yield nucleoside dimers (e.g. 26) are also reported (Scheme 5).

The reduction of 2-substituted 3-oxoglutarates by yeast yields a new class of chiral building blocks, 2-allyl- and 2-propargyl-3-hydroxyglutarates. These are useful as starting points for the synthesis of, inter alia, branched chain analogs of sugars and nucleosides. When allyl is the side chain, the principal product has the absolute configuration (2S,3S), proven by correlation with a compound whose absolute configuration was established by crystallography. Several features of this yeast-mediated reduction are noteworthy. First, its diastereoselectivity is higher than its enantioselectivity, especially with the propargyl side chain. Further, with all substrates, variation in enantioselectivity is not manifested by a variation in diastereoselectivity. This example therefore serves as a warning for those using yeast-mediated reactions that diastereoselectivity cannot be accepted as a substitute for direct measurements of enantioselectivity, even with analogous substrates and similar reaction conditions. Finally, an unexpected metabolism of impurities in the starting material by the yeast made the overall transformation preparatively useful.

6-Aminopyrazin-2-one, when incorporated as pyrimidine base analog into an oligonucleotide, might participate in a nonstandard base pair that retains a Watson-Crick geometry but is joined by a nonstandard hydrogen bonding pattern. Such base pairs can, at least in principle, be recognized independently in duplex nucleic acids. To explore the tautomeric properties that govern hydrogen bonding of this heterocycle, 6-amino-5-benzyl-3-methylpyrazin-2-one was synthesized. The equilibrium constant for the interconversion of the keto and hydroxyl tautomeric forms was estimated by comparing its ultraviolet spectrum with those of N- and 0-methyl derivatives in water, methanol, ethanol, dioxane, and water-dioxane mixtures. A plot of the logarithm of the tautomeric equilibrium constant versus Dimroth's microscopic dielectric constant (ET(30)) was linear. On the basis of an extrapolation of this relationship to the microscopic dielectric of water, 6-amino-5-benzyl-3-methylpyrazin-2-one is expected to favor at equilibrium the keto form over the hydroxyl form by a factor of ca. 2000 under conditions where DNA and RNA polymerases operate. This is substantially better than the tautomeric ratio observed with isoguanosine, where the minor form has been observed to create tautomeric ambiguity with some polymerase systems.

BIOLOGICAL macromolecules with catalytic activity can be created artificially using two approaches. The first exploits a system that selects a few catalytically active biomolecules from a large pool of randomly generated (and largely inactive) molecules. Catalytic antibodies1 and many catalytic RNA molecules2 are obtained in this way. The second involves rational design of a biomolecule that folds in solution to present to the substrate an array of catalytic functional groups3-8. Here we report the synthesis of rationally designed polypeptides that catalyse the decarboxylation of oxaloacetate via an imine intermediate. We determine the secondary structures of the polypeptides by two-dimensional NMR spectroscopy. We are able to trap and identify intermediates in the catalytic cycle, and to explore the kinetics in detail. The formation of the imine by our artificial oxaloacetate decarboxylases is three to four orders of magnitude faster than can be achieved with simple amine catalysts: this performance rivals that of typical catalytic antibodies.

A protecting group strategy has been developed that permits the convergent synthesis of oligonucleotide analogs containing dimethylene sulfone groups replacing of phosphate diester groups. The strategy is based on experimental conditions that allow selective removal of dimethoxytrityl groups from oxygen and sulfur.

A set of derivatives of cyclopentaneacetic acid cis-substituted at position 3 by nucleoside bases (both purines and pyrimidines) were prepared and characterized (see 11, 14, and 23a, b; Schemes 2-4). These molecules are carbocyclic analogs of 2',3'-dideoxy-5'-homonucleosides. In this synthesis, the skeleton was constructed from norbornanone and a novel method based on Mitsunobu chemistry used for the introduction of nucleoside-base substituents. The scope of this method was further explored via the preparation of a cyclobutyl analog of dideoxyguanosine (see 17, Scheme 3).

A process for preparing fructose from starch has been designed to have a thermodynamic profile similar to those found in natural metabolic pathways and implemented in a reactor containing five enzymes acting together. The process runs at equilibrium, with a final exergonic step pulling intermediates to fructose, the desired product. Therefore, the yields of fructose are high and not dominated by the glucose-fructose equilibrium constant that constrains the commercial process, which uses xylose isomerase to catalyze its final step. Three different strategies were used to find enzymes suitable for catalyzing the final irreversible step, the hydrolysis of fructose-6-phosphate: (a) recruiting an enzyme to operate backwards with respect to its physiological function; (b) recruiting an enzyme to accept a non-natural substrate through the use of a cosubstrate; and (c) developing an indirect route for converting fructose-6-phosphate to fructose. As presently implemented, the process converts starch and inorganic phosphate to glucose-1-phosphate, glucose-6-phosphate, fructose-6-phosphate, and then fructose and inorganic phosphate; the last is recycled. The net hydrolysis of fructose-6-phosphate to yield fructose is obtained via a transaldolase-catalyzed reaction between fructose-6-phosphate and glyceraldehyde to yield fructose and glyceraldehyde-3-phosphate, which is then hydrolyzed to regenerate glyceraldehyde and inorganic phosphate using a 3-phosphoglycerate phosphatase recruited to act on an unnatural substrate. This work illustrates general ideas that may prove useful in designing other multistep biocatalytic transformations, in particular, the focus on the energetics of the pathway and on the evolution of enzymes as a guide to selecting enzymes useful in biocatalytic processes.

ONE serious limitation facing protein engineers is the availability of only 20 'proteinogenic' amino acids encoded by natural messenger RNA. The lack of structural diversity among these amino acids restricts the mechanistic and structural issues that can be addressed by site-directed mutagenesis. Here we describe a new technology for incorporating non-standard amino acids into polypeptides by ribosome-based translation. In this technology, the genetic code is expanded through the creation of a 65th codon-anticodon pair from unnatural nucleoside bases having non-standard hydrogen-bonding patterns 1,2. This new codon-anticodon pair efficiently supports translation in vitro to yield peptides containing a non-standard amino acid. The versatility of the ribosome as a synthetic tool offers new possibilities for protein engineering, and compares favourably with another recently described approach in which the genetic code is simply rearranged to recruit stop codons to play a coding role 3-9.

N2-Isobutyryl-O6-[2-(p-nitrophenyl)ethyl]guanine (4) allows the synthesis of different types of carbocyclic analogs of guanosine in high yield under Mitsunobu conditions. Only the desired N-9-substituted derivatives of guanine are formed.

The entire protein sequence database has been exhaustively matched. Definitive mutation matrices and models for scoring gaps were obtained from the matching and used to organize the sequence database as sets of evolutionarily connected components. The methods developed are general and can be used to manage sequence data generated by major genome sequencing projects. The alignments made possible by the exhaustive matching are the starting point for successful de novo prediction of the folded structures of proteins, for reconstructing sequences of ancient proteins and metabolisms in ancient organisms, and for obtaining new perspectives in structural biochemistry.

Reaction conditions are presented that allow regioselective introduction (N-9 versus N7) of guanine into sugar analogs under Vorbruggen conditions. Using these conditions, a set of N2-protected guanosine analogs has been prepared with N2-isobutyryl-O6-[2-(p-nitrophenyl)ethyl]guanine (1) as nucleophile. This approach helps solve an important synthetic problem in the preparation of guanosine analogs.

With templates containing 2'-deoxy-1-methylpseudouridine (d(m)PSI), T7 RNA polymerase catalyzes the incorporation of either adenosine triphosphate (ATP) or formycin triphosphate (FTP) into a growing chain of RNA with the same efficiency as with templates containing thymidine (dT). In each case, the overall rate of synthesis of full-length products containing formycin is about one-tenth of the rate of synthesis of analogous products containing adenosine. Analysis of the products of abortive initiation shows that incorporation of FMP into the growing oligonucleotide by T7 RNA polymerase is more likely to lead to premature termination of transcription than is incorporation of AMP. Nevertheless, the results demonstrate that T7 RNA polymerase tolerates the formation of a C-nucleotide transcription complex in which the nucleoside bases on both the template and the incoming nucleotide are joined to the ribose by a carbon-carbon bond. This result increases the prospects for further expanding the genetic alphabet via incorporation of new base pairs with novel hydrogen-bonding schemes (Piccirilli et al., 1990).

Chemical modification studies suggest that two residues of bovine pancreatic ribonuclease A (RNase A), Lys-41 and Asp-121, are important for catalysis. Three mutants of RNase A have been prepared, two point mutants with Lys-41 altered to Arg-41 and Asp-121 altered to Glu-121, and a double mutant where both residues are altered. The Lys-41 Arg mutant has ca. 2% the catalytic activity (k(cat)/K(m)) of the native protein, while the Asp-121 Glu mutant has ca. 17% the catalytic activity of the native protein. The double mutant has catalytic activity comparable to the Lys-41 Arg mutant.

A route for synthesizing C-nucleosides with 2,6-substituted pyridines as heterocyclic aglycones is described. Condensation of appropriately substituted lithiated pyridines with ribono-1,4-lactone derivatives yields hemiacetal 4a-g (Table 1), which can be reduced by Et3SiH and BF3.Et2O to the corresponding C-nucleoside (see Scheme 1 for 4d --> beta-D-5). Conditions are presented that optimize the amount of the 2,6-dichloropyridine-derived beta-D-anomer beta-D-5 formed (Table 3). Aminolysis of beta-D-5 yeilds the diaminonucleoside 14 (Scheme 3).

Two routes are presented for the synthesis of 3',5'-bishomodeoxyribonucleosides, building blocks needed to synthesize oligodeoxynucleotide analogues where the OPO2O groups are replaced by CH2SCH2, CH2SOCH2, and CH2SO2CH2 units. Two of these have been coupled to create an uncharged analogue of a dinucleotide. As isosteric, achiral, and nonionic analogues of natural oligonucleotides stable to both enzymatic and chemical hydrolysis, such molecules have potential application as probes in the laboratory, in studies of the role of individual genes in biological function, and as ''antisense'' oligonucleotide analogues for the treatment of diseases.

Two protected derivatives of the ribonucleoside inosine have been prepared to serve as building blocks for phosphoramidite-based synthesis of RNA. Two different synthetic routes address the unusual solubility characteristics of inosine and its derivatives. The final products of the different synthetic pathways, 5'-O-(dimethoxytrityl)-2'-O-(t-butyldimethylsilyl) inosine 3'-O-(beta-cyanoethyldiisopropylamino) phosphoramidite 5a, and O6-p-nitrophenylethyl-5'-O-(dimethoxytrityl)-2'-O-(t-butyldimethylsilyl) inosine 3'-O-(methyldiisopropylamino) phosphoramidite 5b, were chemically incorporated into short oligoribonucleotides which also contained the four standard ribonucleoside bases. The oligomers were chosen to study base-specific interactions between an RNA substrate and an RNA enzyme derived from the Group I Tetrahymena self-splicing intron. The oligomers were shown to be biochemically competent using a trans cleavage assay with the modified Tetrahymena intron. The results confirm the dependence of the catalytic activity on a wobble base pair, rather than a Watson-Crick base pair, in the helix at the 5'-splice site. Furthermore, comparison of guanosine and inosine in a wobble base pair allows one to assess the importance of the guanine 2-amino group for biological activity. The preparation of the inosine phosphoramidites adds to the repertoire of base analogues available for the study of RNA catalysis and RNA-protein interactions.

Replacing Leu-182 by Ala in yeast alcohol dehydrogenase (YADH; alcohol:NAD+ oxidoreductase, EC 1.1.1.1) yields a mutant that retains 34% of its k(cat) value and makes one stereochemical "mistake" every 850,000 turnovers (instead of almost-equal-to 1 error every 7,000,000,000 turnovers in native YADH) in its selection of the 4-Re hydrogen of NADH. Half of the decrease in stereochemical fidelity comes from an increase in the rate of transfer of the 4-Si hydrogen of NADH. The mutant also accepts 5-methylnicotinamide adenine dinucleotide, a co-factor analog not accepted by native YADH. The stereospecificity of the mutant is lower still with analogs of NADH where the carboxamide group of the nicotinamide ring is replaced by groups with weaker hydrogen bonding potential. For example, with thio-NADH, the mutant enzyme makes 1 stereochemical "mistake" every 450 turnovers. Finally, the double mutant T157S/L182A, in which Thr-157 is replaced by Ser and Leu-182 is replaced by Ala, also shows decreased stereochemical fidelity. These results suggest that Si transfer in the mutant enzymes arises from NADH bound in a syn conformation in the active site and that this binding is not obstructed in native YADH by side chains essential for catalysis.

A comparison of the sequences of three homologous ribonucleases (RNase A, angiogenin and bovine seminal RNase) identifies three surface loops that are highly variable between the three proteins. Two hypotheses were contrasted: (i) that this variation might be responsible for the different catalytic activities of the three proteins; and (ii) that this variation is simply an example of surface loops undergoing rapid neutral divergence in sequence. Three hybrids of angiogenin and bovine pancreatic ribonuclease (RNase) A were prepared where regions in these loops taken from angiogenin were inserted into RNase A. Two of the three hybrids had unremarkable catalytic properties. However, the RNase A mutant containing residues 63-74 of angiogenin had greatly diminished catalytic activity against uridylyl-(3' --> 5')-adenosine (UpA), and slightly increased catalytic activity as an inhibitor of translation in vitro. Both catalytic behaviors are characteristic of angiogenin. This is one of the first examples of an engineered external loop in a protein. Further, these results are complementary to those recently obtained from the complementary experiment, where residues 59-70 of RNase were inserted into angiogenin [Harper and Vallee (1989) Biochemistry, 28, 1875-1884]. Thus, the external loop in residues 63-74 of RNase A appears to behave, at least in part, as an interchangeable 'module' that influences substrate specificity in an enzyme in a way that is isolated from the influences of other regions in the protein.

A set of carbocyclic nucleoside analogs have been prepared using a novel modification of the Mitsunobu reaction. This approach helps solve an important synthetic problem in the preparation of carbocyclic analogs of nucleosides.