PubChem Substance Tags

PUBCHEM_EXT_DATASOURCE_REGID

Your Catalog Number or Other Unique Registry Identifier External to PubChem.

This mandatory identifier never changes. It can't be duplicated within one submission.
If we see a given RegId again in another submission, we treat it as a replacement
update for the record and create a new version. You can also use it to revoke records
with the PUBCHEM_REVOKE_SUBSTANCE tag. Please use only ASCII text characters in
the identifier.

PUBCHEM_SUBSTANCE_SYNONYM

Synonyms including common names, registry-ids (like CAS), chemical names and trade names.
This tag can have multiple synonyms by putting each on a separate line or by
repeating this tag with a different synonym for each tag occurrence.

Example entry (one synonym per line):

aspirin

50-78-2

2-acetoxybenzoic acid

Synonyms are the primary keywords by which a substance is known and found via text search.
They are collected to name our aggregated PubChem Compound records. Your records are
most frequently discovered through your high-quality synonyms and chemical structure.
Synonyms must be ASCII text; as with other input, all special characters and HTML tags
will be converted to text or stripped out as appropriate.

PUBCHEM_SUBSTANCE_COMMENT

Comments, annotations or keywords available for indexed PubChem text searches.

Multiple lines of comments are helpful to the PubChem user to learn about the
biological activity, safety data or special availability information for example.
For a submitter, comments give you another opportunity to have your data
discovered through keyword text searches.

We reserve the right to suppress or eliminate unsuitable or excessive comments.
The expected format must be printable ASCII characters.
URLs should be provided without any HTML tags such as "...". All HTML
tags are stripped out of the text.

PUBCHEM_EXT_SUBSTANCE_URL

A specific substance webpage (URL) relevant to this record within your organization
(external to PubChem).

PUBCHEM_REVOKE_SUBSTANCE

Remove a Substance Record From Search Results of Live Records.

To use this tag, your record must contain only two tags: the PUBCHEM_EXT_DATASOURCE_REGID
tag identifying the record to revoke, and this tag whose value is a short comment stating a
reason for the revoke.

Note: The Substance will remain in the PubChem archive; however, there will be no direct
links to this substance from within PubChem. Effectively, this deletes
the record from public view.

PUBCHEM_EXT_DATASOURCE_SMILES

SMILES string specifying the chemical structure.

This tag is ignored if a chemical structure with atoms is also provided in the SD
file format CTAB section in the same SDF record submitted.

PUBCHEM_EXT_DATASOURCE_INCHI

InChI string specifying the chemical structure.

This is ignored if a chemical structure with atoms is also provided in the SD
file format CTAB section in the same SDF record submitted. Only a single InChI string
is allowed for a given Substance. The expected format is a single line of text containing
a valid InChI string. This InChI can be standard or non-standard.

PUBCHEM_EXT_DATASOURCE_CID

PubChem Compound Identifier (CID) specifying the chemical structure.

This is ignored if a chemical structure with atoms is also provided in the SD
file format CTAB section in the same SDF record submitted. Only a single CID is
allowed for a given Substance. The expected format is a single line of text containing
a valid PubChem Compound identifier. The CID cannot be on-hold.

PUBCHEM_HOLD_UNTIL_DATE

Optional Hold-Until Date delaying the allowed day on which the record can become public
in PubChem. Absent this tag, public release is allowed once the submission is committed
by the submitter.

You may wish to coordinate the public release of your data with a journal publication, a
patent application or other grant-related administrative deadlines. Each substance
record has its own hold-until date. Public records cannot be put on-hold. See our
help doc
for complete details.

A single date is expected using the international standard date notation ISO 8601 such as:

YYYY-MM-DD (e.g., 1997-07-16)

Note that for any public release date specified by the submitter, that is the first
day public release is allowed. Upload pipelines typically require an additional delay of
24 hours, but sometimes more.

PUBCHEM_EXT_DATASOURCE_URL

The main webpage (URL) for your organization. NOTE: You don't need to set this
because it is auto-populated with your account's URL.

You might choose to set this field only to override your account's URL. For example,
you might have a summary webpage for a class of compounds to which you wish to refer.

PUBCHEM_BONDANNOTATIONS

Substance Bond Annotations. This tag is offered as a convenient alternative
to directly encoding the information in the SDF format.

Bond Annotations will affect how the Substance is interpreted and validated within
PubChem. Multiple Bond Annotations may be provided for a Substance. The allowed
format for a Bond Annotation is three unsigned numbers, separated by white-space,
per line, representing the AtomIDs of the two atoms, followed by the annotation
ID, respectively. Only a single Bond Annotation may be provided per line. The atoms
do not have to be explicitly bonded in the SD file format to have a bond annotation.
Nonsensical annotations will be suppressed. Atom-Atom Annotation list is in the
format: AtomID AtomID AnnotationID where AtomID and AnnotationID are unsigned integer
numbers. AnnotationID Meaning ------------ --------------------------------------------------
1 Crossed Bond, a non-specific stereo double bond 2 Dashed Bond, a 3-D hydrogen
bond 3 Wavy Bond, a non-specific stereo single bond 4 Dotted Bond, a complex or
fractional bond 5 Wedge-up Bond, a solid wedge stereo bond 6 Wedge-down Bond, a
dashed wedge stereo bond 7 Arrow Bond, a dative bond 8 Aromatic Bond, an aromatic
bond 9 Resonance Bond, a resonating bond 10 Bold Bond, a thick bond 11 Fischer Bond,
use Fischer stereo conventions 12 Close Contact, a 3-D atom-atom close contact

PUBCHEM_NONSTANDARDBOND

Substance Non-Standard Bonds. This tag is offered as a convenient alternative
to directly encoding the information in the SDF format.

Non-Standard Bonds will affect how the Substance is interpreted and standardized
within PubChem. Multiple Non-Standard Bonds may be provided for a Substance. The
allowed format for a Non-Standard Bond is three unsigned numbers, separated by white-space,
per line, representing the AtomIDs of the two atoms, followed by the bond type ID,
respectively. Only a single Non-Standard Bond may be provided per line. The atoms
do not have to be actually bonded in the SD file format to have a nonstandard bond.
If the atoms are already bonded in the SD file format, the non-standard bonds provided
using this SD tag will supersede that interpreted from the SD file format. Atom-Atom
Non-Standard Bond list in the format: AtomID AtomID BondTypeID where AtomID and
BondTypeID are unsigned integer numbers. BondTypeID Meaning ---------- -----------------
1 Single Bond 2 Double Bond 3 Triple Bond 4 Quadruple Bond 5 Dative Bond 6 Complex
Bond 7 Ionic Bond

PUBCHEM_DEPOSITOR_RECORD_DATE

This date is not related to PubChem submission or processing; rather it is intended
to be the date the substance was last changed in your internal database (mapping to PubChem
Entrez search field "SourceReleaseDate"). PubChem provides its own date when the
record is added or updated ("DepositDate"). For date format, see
Hold-Until Date.

PUBCHEM_STRUCTURE (not SDF tag; info only)

Substance structure may be provided as a PubChem CID (such as "2244"), a SMILES
string (such as "C1C(CCC1)CCC"), an InChI string (such as "InChI=1S/C8H16/c1-2-5-8-6-3-4-7-8/h8H,2-7H2,1H3"),
or a PubChem SID (such as "123").

Alternatively a structure can be input via PubChem Sketcher by either drawing a
structure or uploading a file in a number of different chemical formats that the
PubChem Sketcher accepts. To input or edit structure using the Sketcher, click the
Edit button or the structure image area.