However, molecular formula search implemented in some databases, including PubChem Chemical Structure Search, has an option to allow other elements in returned hits (e.g., C6H6O or C6H6N2O for the “C6H6” query).

Why is it has option to allow other elementsin returned hits when we type C6H6 or any other related molecules?
Amita

The most common types of molecular fingerprints are structural keys, which encode structural information of a molecule into a binary string (that is, a string of 0’s and 1’s). The position of each number in this string corresponds to a particular fragment. If the molecule has a particular fragment, the corresponding bit position is set to 1, and otherwise to 0. Note that there are many different ways to design molecular fingerprints, depending on what fragments are included in the fingerprint definition. PubChem uses its own fingerprint called PubChem subgraph fingerprints.

I am confused with binary string and fingerprint. How does it work to recognize molecules?
Amita

On the contrary, superstructure search returns molecules that comprise or make up the provided chemical structure query (that is, substructures that is contained in the query superstructure). It should be noted that substructure search does not give you substructures of the query and that superstructure search does not return superstructures of the query.

Therefore, some records in PubChem can persist with outdated (or incorrect) data. To help identify such cases, we are introducing a “legacy” indication for contributors and their records. Please note that this does not mean that data identified as “legacy” is without value. Quite to the contrary, some legacy collections successfully collected valuable scientific data for the research community, and are simply no longer updating the information.

How can we determined the data which are designated as Legacy are valuable or not?
Amita

TOXNET (http://toxnet.nlm.nih.gov/)22-25, maintained by the National Library of Medicine (NLM) at NIH, is a group of databases covering toxicology, hazardous chemicals, toxic releases, environmental and occupational health, risk assessment.

Does Toxnet also deals with nanomaterials and environmental pollutions?
Amita

Actually, it is very common that there are a lot of SMILES strings that represent the same structure, whether it has a ring or not, because one can start with any atom in a molecule to derive a SMILES string. Therefore, it is necessary to select a “unique SMILES” for a molecule among many possibilities. Because this is done through a process called “canonicalization”, this unique SMILES string is also called the “canonical SMILES”.

Annotators

URL

All InChIs currently are prefixed with “INCHI=”. Following this, a designator of “1/” or “1S/” indicates whether the InChI is non-standard or standard (i.e. with fixed standardized options in the software)

Can one compound or molecule have both standard and non-standard InCHIs?

the full stop (“.”) which overrides the implicit single bond between adjacent atoms we can make some exotic variants on SMILES:C1C.CC1ButaneC1CC1.C2CC2Cyclohexane

In this section, it is explained about notation for even number of molecule, like butane but not explain for odd number of molecule such as heptane, cycloheptane or cycloheptene. So how we can write nototion for these compound?

Resonance
Run-of-the mill delocalization presents some of the same problems as aromaticity, but there is no conventional label for (non-aromatic) delocalized electrons, such as the delocalized negative charge and pi system in benzoate (VII and VIII). The connection tables will simply represent one resonance structure or another.

In the figure MOL VII and MOL VIII, there is insert of 5 in the V2000 (file format), what is significant of this value? and how can we know which value should be inserted for other resonance structure?

URL

We would need to add an additional field to the atom and/or bond table to handle chirality (SCT VI, VII). We could do so either in a chemically sophisticated way, annotating the atom property, in a chemically-naive translation of a diagram feature, annotating the bond configuration, or both.

In this section, SCT VI is used for R stereoisomer and SCT VII for S isomers but they are looks similar so how can we recognize which one is which? The authors said we can recognize them by using chemically sophisticated way but they have not fully explained it.

Tags

Annotators

URL

Of course, chemical structuresand topological graphs are not entirely equivalent: aconnection table is akin to a description of a singlevalence bond structure and does not take account,for example, of delocalized bonds.

The Connection table could not explain about the delocalized bonds but most of organic compounds are made up of delocalized bonds so how can we understand these bonds through databases? and how can we said that connection table is good method ?