Introduction to the DNP database

The Chapman & Hall/CRC Chemical Database is a structured database holding
information on chemical substances. It includes descriptive and numerical data
on chemical, physical and biological properties of compounds; systematic and
common names of compounds; literature references; structure diagrams and
their associated connection tables. The Dictionary of Natural Products Online
is a subset of this database and includes all compounds contained in
the Dictionary of Natural Products (Main Work and Supplements).

The Dictionary of Natural Products (DNP) is the only comprehensive and
fully-edited database on natural products. It arose as a daughter product of the
well-known Dictionary of Organic Compounds (DOC) which, since its
inception in the 1930s has, through successive editions, always been a leading
source of natural product information.

In the early 1980s, following the publication of the Fifth Edition of DOC, the
first to be founded on database methods, the Editors and contributors for the
various classes of natural products embarked on a programme of enlargement,
rationalisation and classification of the natural product entries, while at the same
time keeping the coverage up-to-date. In 1992 the results of this major project,
which had grown to match DOC in size, were separately published in both book
(7 volumes) and CD-ROM format, leaving DOC with coverage of only the most
widely distributed and/or practically important natural products. DNP compilation
has since continued unabated by a combination of an exhaustive survey of
current literature and of historical sources such as reviews to pick up minor
natural products and items of data previously overlooked.

The compilation of DNP is undertaken by a team of academics and
freelancers who work closely with the in-house editorial staff at Chapman &
Hall. Each contributor specialises in a particular natural product class (e.g.
alkaloids) and is able to reorganise and classify the data in the light of new
research so as to present it in the most consistent and logical manner possible.
Thus the compilation team is able to reconcile errors and inconsistencies.

The resulting on-line version represents an extremely well organised dictionary documenting virtually every
known natural product.

A valuable feature of the design is that closely related natural products (e.g.
where one is a glycoside or simple ester of another) are organised into the same
entry, thus simplifying and bringing out the underlying structural and
biosynthetic relationships of the compounds. Structure diagrams are drawn and
numbered in the most consistent way according to best stereochemical and
biogenetic relationships. In addition, every natural product is indexed by
structural/biogenetic type under one of more than 1000 headings, allowing the
rapid location of all compounds in the category, even where they have undergone
biogenetic modification and no longer share exactly the same skeleton.

There is extensive (but not complete) coverage of natural products of
unknown structure, and the coverage of these is currently being enhanced by
various retrospective searches.

Data presentation and organisation

Derivatives and variants

In the database, closely related compounds are grouped together to form an
entry. Stereoisomers and derivatives of a parent compound are all listed under
one entry. The compounds in the Dictionary of Natural Products are grouped
together into approximately 40,000 entries. The structure of an entry is shown
below.

Entry (parent compound)
DerivativesVariants (stereoisomers or other
closely-related compounds)
Derivatives of the variant

A simple entry covers one compound, with no derivatives or variants. A
composite entry will start with the entry compound, then may have:

one or more derivatives at entry level

one or more variants of the entry

one or more derivatives of the variant.

Variants may include stereoisomers, e.g. (R)-form, endo-form; members of a
series of natural products with closely related structures such as antibiotic
complexes.

For example, Trienomycins are often treated as variants although their
structures may be more varied.

Derivatives may include hydrates, complexes, salts, classical organic derivatives,
substitution products and oxidation products etc. Derivatives may exist
on more than one functional group of an entry compound.
The following techniques are among those used to bring together related
substances in the same entry:

Glycosides are given as derivatives of the parent aglycone, except for
those glycosides which have an extensive literature in their own right (e.g.,
Digoxin)

Acyl derivatives are extremely common and are listed under the parent
compound, again unless it has as extensive literature of its own

N-Alkyl and O-Alkyl derivatives such as methyl ethers of phenols are
similarly given under the parent compound.

Data Types

The format of a typical entry is given in Fig. 1, and shows the individual types
of data that may be present in an entry.

Chemical names and synonyms

All the names discussed below can be searched using the Chemical Name field.
Compounds have been named so as to facilitate access to their factual data by
keeping the nomenclature as simple as possible, whilst still adhering to good
practice as determined by IUPAC (the International Union of Pure and Applied
Chemistry). A great deal of care has been taken to achieve this aim as nearly as
possible. Some intentional departures from IUPAC terminological principles are
occasionally made to clarify the nomenclature of natural products. For example,
compounds containing both lactone and -COOH groups are often named using
two principal functional groups:

Fig. 1. Sample entry from database

There are many instances in the primary literature of compounds being
named in ways which are gross violations of good IUPAC practice, e.g., where
the substituents are ordered non-alphabetically. These have been corrected.

The number of trivial names used for acylating substituents has been kept
to a minimum but the following are used throughout.

Many other trivial appellations have from time to time appeared in the
literature for other acyl groups (e.g., Senecioyl = 3-methyl-2-butenoyl,
Feruloyl = 3-(4-hydroxy- 3-methoxyphenyl)-2-propenoyl or 4-hydroxy-
3-methoxycinnamoyl) but the systematic forms are usually employed except in
a few cases where the shortened form is used to abbreviate a very long and
unwieldy derivative descriptor as much as possible (e.g., for some of the
complex flavonoid glycosides).

The term prenyl for the common 3-methyl-2-butenyl substituent,
(H3C)2C=CHCH2-, is used throughout.

Names which are known to be duplicated within the chemical literature
(not necessarily within DNP), are marked with the sign.

CAS Registry Numbers

CAS Registry Numbers are identifying numbers allocated to each distinctly
definable chemical substance indexed by the Chemical Abstracts Service since
1965 (plus retrospective allocation of numbers by CAS to compounds from the
sixth and seventh collective index periods). The numbers have no chemical
significance but they provide a label for each substance independent of any
system of nomenclature.

In DNP, much effort has been expended to ensure that accurate CAS numbers
are given for as many substances as possible.

If a CAS number is not given for a particular compound, it may be (a)
because CAS have not allocated one, (b) very occasionally, because an editorial
decision cannot be made as to the correct number to cite, or (c) because the
substance was added to the DNP database at a late stage in the compilation
process, in which case the number will probably be added to the database soon.

At the foot of the DNP entry, immediately before the references, may be
shown additional registry numbers. These are numbers which have been
recognised by the DNP editors or contributors as belonging to the entry
concerned but which cannot be unequivocally assigned to any of the compounds
covered by the entry. Their main use will be in helping those who need to carry
out additional searches, especially online searches in the CAS or other
databases, and who will be able to obtain additional hits using these numbers.
Clearly, discretion is needed in their use for this purpose.

Additional registry numbers may arise for a variety of reasons:

A number may refer to stereoisomers or other variants of the main entry
compound or its derivatives, which may or may not be mentioned in the entry
but for which no physical properties or other useful information is available.
For example, the DNP entry for Carlic acid [56083-49-9] states that it has so
far been obtained in solution as a mixture of (E) and (Z)-forms. The additional
registry numbers given are those of the (E) and (Z) isomers [67381-73-1] and
[67381-74-2].

A CAS number may refer to a mixture, in which case it is added to the
DNP entry referring to the most significant component. It may refer to a
hydrate, salt, complex, etc. which is not described in detail in the DNP entry.

Replaced numbers, duplicate numbers and other numbers arising from
CAS indexing procedure or, occasionally, from errors or inconsistencies by
CAS, are also reported. For example, the DNP entry scyllo-Inositol [488-59-5]
contains an additional registry number for D-scyllo-Inositol [41546-32-1]. Since
scyllo-Inositol is a meso-compound, the number is erroneous. More generally,
CAS frequently replace a given number with one that more accurately
represents what they now know about a substance, and the replaced number
remains on their files and is given in DNP as an additional number.

In the case of compounds with more than one stereogenic centre,
additional registry numbers frequently refer to levels of stereochemical
description which cannot be assigned to a particular stereoisomer described in
the entry.

For example, the CHCD entry for 2-Amino-3-hydroxy-3-phenylpropanoic
acid (ß-Hydroxyphenylalanine, 9CI) has a general CAS number [1078-17-7]
and CAS numbers for all four optically active diastereoisomers [7352-06-9,
32946-42-2, 109120-55-0, 6524-48-4] as well as the two possible racemates
[2584-74-9] [2584-75-0]. However, among the additional registry numbers
quoted are the following:

[7687-36-7] - number for erythro-ß-Hydroxyphenylalanine
[50897-27-3] - number for ß-Hydroxy-L-phenylalanine
[68296-26-4] - number for ß-Hydroxy-D-phenylalanine
[39687-93-9] - general number for the methyl ester, hydrochloride which
cannot be placed under any of the individual stereoisomers of
this compound described in the entry.

Numbers may refer to derivatives similar to those described in the DNP
entry for which no data is available, or which have not yet been added to the
entry.

Some DNP entries refer to families of compounds, such as the entry for
Calcitonin where only the porcine and human variants are described in detail.
The additional registry numbers given in this entry are those of a number of
other species variants which appear to have been identified according to CAS
but for which no attempt has been made to collate full data for DNP.

Diagrams

In each entry display there is a single diagram which applies to the parent entry.
Separate diagrams are not given for variants or derivatives.

Every attempt has been made to present the structures of chemical substances
as accurately as possible according to current best practice and IUPAC
recommendations. In drawing the formulae, as much consistency as possible
between closely related structures has been aimed at. Thus, for example, sugars
have been standardised as Haworth formulae and, wherever possible in complex
structures, the rings are oriented in the standard Haworth manner so that
structural comparisons can quickly be made. In formulae the pseudoatom
abbreviations Me, Et and Ac for methyl, ethyl and acetyl respectively, are used
only when attached to a heteroatom. Ph is used throughout whether attached to
carbon or to a heteroatom. Other pseudoatom abbreviations such as Pri for
isopropyl and Bz for benzoyl are not used in DNP.

Care must be taken with the numbering of natural products, as problems may
arise due to differences in systematic and non-systematic schemes. Biogenetic
numbering schemes which are generally favoured in DNP may not always be
contiguous, e.g., where one or more carbon atoms have been lost during
biogenesis.

Structures for derivatives can be viewed in Structure Search, but remember
that these structures are generated from connection tables and may not always
be oriented consistently.

Stereochemical conventions

Where the absolute configuration of a compound is known or can be inferred
from the published literature without undue difficulty, this is indicated. Where
only one stereoisomer is referred to in the text, the structural diagram indicates
that stereoisomer. Wherever possible, stereostructures are described using the
Cahn-Ingold-Prelog sequence-rule (R,S) and (E,Z) conventions but, in cases
where these are cumbersome or inapplicable, alternatives such as the
α,ß-system are used instead. Alternative designations are frequently presented
in such cases.

The structure diagrams for compounds containing one or two chiral centres
are given in DNP as Fischer-type diagrams showing the stereochemistry
unequivocally. True Fischer diagrams in which the configuration is implied by
the North-South-East-West positions of the substituents are widespread in the
literature; they are quite unambiguous but need to be used with caution by the
inexperienced. They cannot be reoriented without the risk of introducing errors.

Where only the relative configuration of a compound containing more than
one chiral centre is known, the symbols (R*) and (S*) are used, the lowestnumbered
chiral centre being arbitrarily assigned the symbol (R*).
For racemic modifications of compounds containing more than one chiral
centre the symbols (RS) and (SR) are used, with the lowest-numbered chiral
centre being arbitrarily assigned the symbol (RS). The racemate of a compound
containing one chiral centre only is described in DNP as (±)-.

In comparing CAS descriptors with those given in DNP, it is important to
remember that the order of presentation of the chirality labels in CAS is itself
based on the sequence rule priority and not on any numbering scheme, for
example the CAS descriptor for the structure illustrated is [S-(R*,S*)].

The relative stereochemical label (R*,S*) is first applied with the R* applying
to the chiral centre of higher priority (C-3). The absolute stereochemical
descriptor (S)- is then applied changing R* to S for the chiral centre of higher
priority and S* to R for the chiral centre of lower priority (C-2). For further
details, see the current CAS Index Guide.

For simplicity, the enantiomers of bridged-ring compounds, such as camphor,
are described simply as (+)- and (-)-. Although camphor has two chiral centres,
steric restraints mean that only one pair of enantiomers can be prepared.

Where appropriate, alternative stereochemical descriptors may be given using
the D, L or α,ß-systems. For a fuller description of these systems, consult The Organic Chemist's Desk Reference (Chapman & Hall, 1995).

Molecular formula and molecular weight

The elements in the molecular formula are given according to the Hill
convention (C, H, then other elements in alphabetical order). The molecular
weights given are formula weights (or more strictly, molar masses in daltons)
and are rounded to one place of decimals. In the case of some high molecular
mass substances such as proteins the value quoted may be that taken from an
original literature source and may be an aggregate molar mass.

Molecular formulae are included in DNP for all derivatives which are natural
products and so are readily searchable, whether they are documented as
derivatives or have their own individual entry. Molecular formulae are not in
general given for salts, hydrates or complexes (e.g. picrates) nor for most
"characterisation" derivatives such as acetates and methyl ethers of complex
natural products.

Where a derivative appears to have characterised only as a salt, the properties
of the salt may be given under the heading for the derivative. In such cases the
data is clearly labelled, e.g., Mp 179° (as hydrochloride).

Source

The taxonomic names for organisms given throughout are in general those given
in the primary literature. Standardisation of minor orthographical variations has
been carried out. Data in this field may be searched under Source/Synthesis or
All Text. Standards used are: Brummitt, R.K. (1992) Vascular Plant Families
and Genera, Royal Botanic Gardens, Kew; Willis, J.C. (1973) A Dictionary of
the Flowering Plants, Cambridge University Press, Cambridge; Gozmany, L.
(1990) Seven Language Thesaurus of European Animals, Chapman & Hall
London; Chemical Abstracts Service.

Importance/use

Care has been taken to make the information given on the importance and uses
of chemical substances as accurate as possible. Data in this field may be
searched under Use/Importance or All Text.

Type of Compound

All natural products are classified under one of more than 1050 headings
according to structural type, e.g., daucane sesquiterpenoid, pyrrolizidine
alkaloid, withanolide. Each structural type is assigned as a type of compound
code, e.g., VG0300, VX0150. Type of compound words and type of compound
codes may both be searched in Menu and Command search.

The full type of compound code index is given in Table 3, page 128 of the
printed User Manual, and in the Description of Natural Product Structures that
follows, each descriptive paragraph is followed by its Type of Compound
code(s).

Physical Data

Appearance

Natural products are considered to be colourless unless otherwise stated. Where
the compound contains a chromophore which would be expected to lead to a
visible colour, but no colour is mentioned in the literature, the DNP entry will
mention this fact if it has been noticed by the contributor.

An indication of crystal form and of recrystallisation solvent is often given
but these are imprecise items of data; most organic compounds can be
crystallised from several solvent systems and the crystal form often varies. In
the case of the small number of compounds where crystal behaviour has been
intensively studied (e.g. pharmaceuticals), it is found that polymorphism is a
very common phenomenon and there is no reason to believe that it is not
widespread among organic compounds generally.

Melting points and boiling points

The policy followed in the case of conflicting data is as follows:

Where the literature melting points are closely similar, only one figure
(the highest or most probable) is quoted.

Where two or more melting points are recorded and differ by several
degrees (the most likely explanation being that one sample was impure), the
lower figure is given in parentheses, thus: 139° (134-135°).

Where quoted figures differ widely and some other explanation such as
polymorphism or incorrect identity seems to be the most likely explanation,
both figures are quoted without parentheses, thus Mp 142º, Mp 205-206°.

Known cases of polymorphism or double melting point are noted.
Boiling point determination is less precise than that of melting points and
conflicting boiling point data is not usually reported except when there appears
to be a serious discrepancy between the different authors.

Optical rotations

These are given whenever possible, and normally refer to what the DNP
contributor believes to be the best-characterised sample of highest chemical and
optical purity. Where available an indication of the optical purity (op) or
enantiomeric excess (ee) of the sample measured now follows the specific
rotation value.

Specific rotations are dimensionless numbers and the degree sign which was
formerly universal in the literature has been discontinued.

Densities and refractive indexes

Densities and refractive indexes are now of less importance for the identification
of liquids than has been the case in the past, but are quoted for common or
industrially important substances (e.g. monoterpenoids), or where no boiling
point can be found in the literature.

Densities and refractive indexes are not quoted where the determination
appears to refer to an undefined mixture of stereoisomers.

Solubilities

Solubilities are given only where the solubility is unusual. Typical organic
compounds are soluble in the usual organic solvents such as ether and
chloroform, and virtually insoluble in water. The presence of polar groups (OH,
NH2 and especially COOH, SO3H, NR+) increases water solubility.

pKa values

pKa values are given for both acids and bases. The pKb of a base can be
obtained by subtracting its pKa from 14.17 (at 20°) or from 14.00 (at 25°).

Spectroscopic data

Spectroscopic data such as uv wavelengths and extinction coefficients are given
only where the spectrum is a main point of interest, or where the compound is
unstable and has been identified only by spectroscopic data.

In many other cases, spectroscopic data can be rapidly located through the
references quoted.

Hazard and toxicity information

General

Toxicity and hazard information is highlighted by the sign , and has been
selected to assist in risk assessments for experimental, manufacturing and
manipulative procedures with chemicals.

The field of safety testing is a complex, difficult and rapidly expanding one,
and while as much care as possible has been taken to ensure the accuracy of
reported data, the Dictionary must not be considered a comprehensive source on
hazard data. The function of the reported hazard data is to alert the user to
possible hazards associated with the use of a particular compound, but the
absence of such data cannot be taken as an indication of safety in use, and the
Publishers cannot be held responsible for any inaccuracies in the reported
information, neither does the omission of hazard data in DNP imply an absence
of this data from the literature. Widely recognised hazards are included
however, and where possible key toxicity reviews are identified in the
references. Further advice on the storage, handling and disposal of chemicals is
given in The Organic Chemist's Desk Reference.

Finally, it should be emphasised that any chemical has the potential for harm
if it is carelessly used. For many newly isolated materials, hazardous properties
may not be apparent or may have been cited in the literature. In addition, the
toxicity of some very reactive chemicals may not have been evaluated for
ethical reasons, and these substances in particular should be handled with
caution.

Many entries in DNP contain one or more RTECS® Accession Numbers. Possession
of these numbers allows users to locate toxicity information on relevant
substances from the NIOSH Registry of Toxic Effects of Chemical Substances,
which is a compendium of toxicity data extracted from the scientific literature.
For each Accession Number, the RTECS® database provides the following
data when available: substance prime name and synonyms; date when the
substance record was last updated; CAS Registry Number; molecular weight
and formula; reproductive, tumorigenic and toxic dose data; and citations to
aquatic toxicity ratings, IARC reviews, ACGIH Threshold Limit Values,
toxicological reviews, existing Federal standards, the NIOSH criteria document
program for recommended standards, the NIOSH current intelligence program,
the NCI Carcinogenesis Testing Program, and the EPA Toxic Substances
Control Act inventory. Each data line and citation is referenced to the source
from which the information was extracted.

Bibliographic References

The selection of references is made with the aim of facilitating entry into the
literature for the user who wishes to locate more detailed information about a
particular compound. Thus, in general, recent references are preferred to older
ones, particularly for chiral compounds where optical purity and absolute
configuration may have been determined relatively recently. The number of
references quoted cannot therefore be taken as an indication of the relative
importance of a compound, and the references quoted for important substances
may not be the most significant historically.

References are given in date order except for references to spectroscopic
library collections, which sort at the top of the list, and those to hazard/toxicity
sources which sort at the bottom.

The content of most references is indicated by means of suffixes, known as
reference tags. A list of the most common ones is given in Table 4, p. 145 of the
printed User Manual. For references describing a minor natural product which
has been included in DNP as a derivative of a parent compound, the reference
tag may be the identifying name of the natural product, e.g. (Laciniatoside II).

Some reference suffixes are now given in boldface type, where the editors
consider the reference to be particularly important, for example the best
synthesis giving full experimental details and often claiming a higher yield than
previously reported methods.

In some entries, minor items of information, particularly the physical
properties of derivatives, may arise from references not cited in the entry.

Journal abbreviations

In general these are uniform with the Chemical Abstracts Service Source Index
(CASSI) listing except for a short list of very common journals:

DNP ABBREVIATION

CASSI

Acta Cryst. (and sections thereof)

Acta Crystallogr. (and sections thereof)

Annalen

Justus Liebigs Ann. Chem.

Chem. Comm.

J. Chem. Soc., Chem. Commun.

J.A.C.S.

J. Am. Chem. Soc.

J.C.S. (and various
subsections thereof)

J. Chem. Soc. (and various
subsections thereof)

J. Het. Chem.

J. Heterocycl. Chem.

J.O.C.

J. Org. Chem.

Tet. Lett.

Tetrahedron Lett

Entry under review

The database is continually updated. When an entry is undergoing revision at
the time of a on-line release (for example by the addition of further
derivatives or references), this is indicated by a message at the head of the entry.

*RTECS® Accession Numbers are compiled and distributed by the National Institute for Occupational Safety and Health Service
of the U.S. Department of Health and Human Services of The United States of America. All rights reserved. (1996)