The aim of this Base Pair Directory
is to compile structural information on nucleic acid base pairs.

This is work in progress. We start out from the usual
canonical and noncanonical base pairs with two or three hydrogen bonds
and will finally include more recently discovered unusual base pairs
with only one standard hydrogen bond and additional C-H...O or C-H...N
contacts, water-mediated pairs, and even base pairs with no standard
hydrogen bond at all. Examples for these latter pairs include:

the adenine-difluorotoluene base pair with no
standard hydrogen bond in a DNA duplex (PDB code: 1bw7),

the Calcutta UU base pair with one standard hydrogen
bond and one C-H...O contact in an RNA hexamer with a 5'-UU-overhang
(PDB code: 1osu)
,

Basic Information

Nucleic acids are polymers made up of repeated units,
nucleotides, comprising three components:

phosphate,

a sugar (2'-deoxyribose in DNA, ribose in RNA),

and one of four heretocyclic bases.

In a formal sense a nucleic acid strand is generated by forming C3'-O3'
bonds between different nucleotides. This is, however, only a formal
structural description. The chemical reaction is more complicated. The
well-known double helix is obtained by connecting the two strands via
hydrogen bonding between bases.

These images show a nucleic acid double helix structure in an ideal B
conformation. Nucleic acids can, however, occur in different conformations.
The bases are colored in the following manner: A
- red, T - yellow, C - blue, G - green.

The bases correspond to the colored plates in the side view and are
located inside in the top view.

Base pairing via hydrogen bonds as shown in the detailed view is of
utmost importance for the structure of nucleic acids.

Note, however, that interactions within the
sugar-phosphate backbone and base stacking are also relevant for
nucleic acid structure.

The base pairs are formed from the two purine bases adenine
(A) and guanine (G) and from the two pyrimidine bases cytosine
(C) and uracil (U) or thymine (T).

- purine bases

adenine - A

guanine - G

- pyrimidine bases

uracil - U

thymine - T

cytosine - C

Uracil is used in RNA and thymine in DNA. The standard or canonical Watson-Crick
base pairs are A-U(T) and G-C. More information on these base pairs can
be found here.

In addition, other non-canonical base pairs have been found. The
latter base pairs are also called mismatches. Many of them occur
in RNA structures. Therefore, often only uracil but not thymine
is taken into account.

Canonical and non-canonical base pairs with at least two standard hydrogen bonds

In 1. 28 base pairs with at least four H-bond heavy-atom donor/acceptor
sites have been enumerated. The compilation 2. includes also examples
with only three H-bond heavy-atom donor acceptor sites and lists 38
base pair structures. On the other hand, in 2. base pairs involving
H-bonds with N3 of purines are not considered. The classification by
Leontis and Westhof provides new and more comprehensive information.

In the following a comprehensive compilation is presented. The total
number of possible base pairs with at least two standard H-bonds and
four heavy-atom donor/acceptor site is 32. This means that four
additional pairs are included as compared to the Tinoco compilation (2
x GU, 1x GG, 1 x GC). They were probably discarded for
sterical reasons. However, a comprehensive search for all base pairs
occurring in the currently known RNA structures has shown that this is
not justified in all cases.

It is important to note that the compilations given above and below
are based on simple structural rules. It cannot be excluded that a few
base pairs listed do not correspond to an energy minimum. In addition,
it should be kept in mind that in a nucleic acid structure stacking and
backbone restraints may affect base pair geometries.

All possible base pairs with at least two standard H-bonds

In parentheses the number of possible base pair structures with
(four/three) heavy-atom donor-acceptor sites is given (
x stands for data coming soon).

The backbone may lead to steric restraints on base
pairing. Therefore, in the preceding tables the complete nucleotides
are shown. The backbone geometry corresponds to a standard A-RNA
conformation. The base pair geometries were generated manually.
The two bases are located approximately in a common plane and the
hydrogen bond H...O or H...N distances are approximately 2 A. The
structures shown do not correspond to either optimized or experimental
geometries.

Both the canonical and non-canonical base pairs mentioned
above were formed from standard nucleotides/bases. Modified
nucleotides/bases do also occur. A few of them found in transfer
RNA are shown here. A comprehensive compilation of
modified nucleotides in RNA can be obtained from the RNA
Modification Database.