Guidance on Searching for Chemical Information and
Data

Looking for information
on chemicals is often complicated by the fact that a chemical may be called
different things by different people. Because of the complexity of searching
for data on chemicals, it is recommended that CAS Registry
Numbers be used as the primary key when searching electronic databases.

Searches using chemical
nomenclature may also be appropriate in some cases, although such searches
can either miss relevant studies because a different chemical synonym
was used. On the other hand, such searches may sometimes result
in a large number of "false drops" because other chemicals share many
parts of the desired chemical's name. In addition, some studies
may have looked at the effects of a chemical as part of a mixture or formulated
product.

CAS
Registry Numbers are preferred because they were created specifically
to function as unique identifiers to help eliminate the confusion caused
by the variety of synonyms that could be used for the same chemical.
They have since been adopted by many government agencies and other organizations
as a standard. Searching by CAS Registry Numbers for individual
chemical substances enables one to specifically identify a substance without
needing to know the particular synonym an author or database publisher
may have used for the chemical.

Limitations of
searching by CAS Registry Number

In cases where some
articles or studies may not include CAS Registry Numbers or where a publisher
may not have indexed by CAS Registry Number, searchers will need to search
by some other means. The other possibilities are to search by synonym
or by chemical structure (if a system has that capability).

If a database indexes
studies by CAS Registry Number there will be less need to do a search
by synonym, but a sponsor may want to try a variety of synonyms just to
be on the safe side. Most chemical databases index by CAS Registry Number
and allow fielded searching by Registry Number, but some do not.
Library catalogs and similar databases will generally not be indexed by
Registry Number, but will usually be searchable on the title and subject
headings. For example, while most of the records within the National
Library of Medicine's TOXLINE database are indexed by CAS Registry Number,
some are not. Searchers should check a system's documentation to
see whether controlled vocabulary terms relevant to their search are used.

Another point to
consider in developing a search strategy is the possible value of looking
for "other forms" of the chemical. For example, if the HPV chemical
is a labile salt of an acid or an amine, there may be value in looking
for relevant information on other salt forms or the free acid or amine.
One way of approaching this task is via chemical substructure searching
to identify the CAS Registry Numbers of other salt forms of the target
chemical. These Registry Numbers and chemical identities could then
be run through the search strategy to identify data which may be applicable
via a Structure Activity Relationship (SAR)-based argument. (Note
that the identity of the counter ion may be an important element to consider
in evaluating the relevance of the data. For example, consider the
situation involving data on a lead (Pb) salt of an HPV acid versus data
on the soldium salt of that acid.)

Other issues
surrounding searching chemical databases

In addition to problems
created by authors using different synonyms for the same chemical, another
problem arises because database publishers and vendors can construct databases
differently and the software used to create the database can operate in
different ways, especially in terms of the syntax used to construct search
statements. Would-be searchers should realize that advanced training
is required to search some databases.Because of this need
for training, interested parties have two basic options. One, learn to
search the databases themselves, or two, have someone to do it for them.
We will describe some of the basic approaches you can take in searching.
Because of the complexity of the systems, it would be we have included
information on contacting the vendors regarding system documentation and
appropriate training.For those people
who plan on doing large amounts of searching, it is recommended that you
refer to one of the many volumes written on this subject. (Maizell, Ridley,
Wiggins, Wexler)

General search
strategies for identifying relevant studies

Searchers may want
to search databases that are free or relatively inexpensive first. Some
chemicals may have had many studies and articles written about them, but
it is still possible for there to be little or no data about particular
SIDS endpoints. Doing such searches on more expensive databases
could result in large charges.

One of the best
places to start is with a "pointer" database such as NLM's ChemID database.
Pointer databases provide an indication of the specific databases where
information on the chemical can be found.

Some systems take
that one step further by providing the capability to do a search across
groups or clusters of databases. This way searchers can determine
whether there are any articles or records concerning a the chemical in
question in a particular database, thus eliminating the need to search
that database. EPA's Chemical Hazard Data Availability Study relied
on such an element in its strategy for identifying which databases in
the Chemical Information System (CIS) might contain relevant studies by
submitting a list of CAS Registry Numbers to CIS's Structure and Nomenclature
Search System (SANSS).

Once searchers have
an idea of the number of articles in question, they can decide whether
they need to narrow the scope of their search to a particular endpoint.
For example, if a search on a particular chemical only gives three "hits",
it would be easier to just display those citations rather than intersect
that search set with terms for the various endpoints.

Most systems have
a variety of print and display options. These can be include "citation
only"; "citation and abstract"; "citation, abstract, and subject headings";
or, in some systems, even the entire text of the article. The cost
associated with these options varies by vendor and by database. Searchers
should consult their documentation for information on fees.

It will be clear
from the titles of some articles what endpoints a study was investigating
while for others it may not be particularly clear whether any included
data are relevant to the HPV Challenge Program.

In some cases, a
search on a particular chemical may result in dozens, even hundreds, of
hits. As noted above, just because one aspect of a chemical has
been well documented does not mean that there will be data for all the
endpoints requested under the HPV Challenge Program. Searchers can
take one of two approaches here.

One, most systems
have the capability to display a list of the titles of the articles.
(Note: It will sometimes be helpful to include at least the names
of journals in citation displays since they may provide clues as to what
endpoint was being investigated.) Searchers will need to determine
which studies to examine for possible relevance.

Two, the search
set can be combined with appropriate terminology for a particular endpoint.
For example, a search for articles concerning reproductive effects might
include a search for the words "REPRO?" or "TERATO?" (where "?" represents
a wildcard character to allow for different forms of the words--different
search engines may use different characters as wildcards). Some
vendors or database producers might even have indexed the articles cited
in their databases using controlled vocabulary terms or special classification
codes. Since it is not possible to include all the details for that
here searchers should consult the users' guide or other documentation
for the particular system they are using. If there are few hits, combining
search sets will probably not be needed.

One way of identifying
relevant search terms is to identify one relevant article, then view the
terms and classification codes used to index it. Some database producers
include thesauri containing controlled vocabulary terms and related terms.
Experienced searchers can use those to narrow or broaden their searches.

Searchers should
also be aware that studies found searching one database may duplicate
those found while searching another database. In cases where a database
contains subfiles, duplicates may show up in search results. In
such cases the number of hits may not reflect the number of studies actually
cited.

Verifying relevance
of studies identified in a search

It is important
to verify that the articles or studies found do indeed deal with the chemical
of interest. There is always the possibility the study may
not have focused on the chemical searched for, but that it was indexed
on a chemical because it was used as a solvent or a substrate. Verifying
the chemical's identity is especially important when searching by synonym
or by chemical structure.

Obtaining copies
of studies

For the full text
of articles not available electronically, the original publishers may
be able to provide copies (for a price). Document delivery services
and information brokers can also obtain copies of many articles for a
charge. Companies with libraries may be able to get copies through
interlibrary loan. Note: With the exception of U.S. government publications,
most of the articles searchers will find will be covered by U.S. copyright
laws. Searchers are responsible for making sure they are in compliance
with these laws.

Basic procedure for searching
for chemical information in databases

Examples: ChemID, TOXLINE

Identify the correct CAS Registry
Number for the chemical in question.

In databases use the database's
search function to search for the CAS Registry Number. (Note:
If a database is not indexed on CAS Registry Numbers searchers will
need to use synonyms.)

If the search results in many
hits you will need to narrow the search by combining the search set
with appropriate terms.

Basic procedure for searching
for chemical information in publications

Examples: Merck Index

Identify the correct CAS Registry
Number for the chemical in question.

Check to see whether the publication
has a CAS Registry Number index (many do, though not all).

If the publication has a CAS
Registry Number index, use that to determine under what entry data about
the chemical is listed. This will often be by name.

If the publication does not
have a CAS Registry Number index, searchers will need to use the general
index or table of contents to determine where in the publication information
about the chemical is located.

DisclaimerIMPORTANT: The fact
that a resource is included in this guide does not mean that EPA is endorsing
those sources. Nor does it mean that EPA will automatically accept
data included in or referenced by those sources. Studies and
data will need to meet the requirements as spelled out in the guidance
document on data adequacy in order to be accepted under the HPV
Challenge Program.