ChemSpider as an integration hub for interlinked chemistry data

The internet has provided access to unprecedented quantities of data. In the domain of chemistry specifically over the past decade the web has become populated with tens of millions of chemical
…

The internet has provided access to unprecedented quantities of data. In the domain of chemistry specifically over the past decade the web has become populated with tens of millions of chemical structures and related properties, both experimental and predicted, together with tens of thousands of spectra and syntheses. The data have, to a large extent, remained disparate and disconnected. In recent years with the wave of Web 2.0 participation any chemist can contribute to both the sharing and validation of chemistry-related data whether it be via Wikipedia, the online encyclopedia, or one of the multiple public compound databases. Toxicologists commonly wish to source data, either for reference purposes, to support the development of models or, when experimental data are not available, predicted data will suffice. This presentation will offer a perspective of the type and quality of chemistry data available today, our experiences of building the ChemSpider public compound database to link together chemistry on the internet and our efforts to both encourage and enable even greater integration and connectivity for chemistry data for the community.

Transcript

2.
How Much Data Online?
• How much data regarding environmental
toxicology and chemistry is online?
• How can it all be mapped together?

3.
A Grand Challenge….
• Let’s map together all historical chemistry
data and build systems to integrate new data
• Let’s integrate chemistry, toxicology and
biology data and add in disease data too
• Lets model the data and see if we can
extract new relationships – quantitative and
qualitative
• Let’s make it all available on the web

4.
What about this….
• We’re going to map the world
• We’re going to take photos of as many
places as we can and link them together
• We’ll let people annotate and curate the map
• Then let’s make it available free on the web
• We’ll make it available for decision making
• Put it on Mobile Devices, Give it Away

7.
ChemSpider
• Build a HUB connecting as many data
sources as possible
• NOT to harvest all data from each data source
• Today we have >29 million unique chemicals
from >500 data sources
• Focus on improving data quality
• Allow users to enhance, curate and annotate

50.
Text Mining
The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4thiadiazol-5-yl)urea prepared in Example 6 , thionyl chloride
( 5 ml ) and benzene ( 50 ml ) were charged into a glass
reaction vessel equipped with a mechanical stirrer,
thermometer and reflux condenser.
The reaction mixture was heated at reflux with stirring , for a
period of about one-half hour.
After this time the benzene and unreacted thionyl chloride
were stripped from the reaction mixture under reduced
pressure to yield the desired product N-(β-chloroethyl)-Nmethyl-N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a
solid residue

53.
Conclusions
• There are some amazing online resources for
environmental toxicology and chemistry already!
• ChemSpider has an important role in quality
data and linking resources
• Crowdsourced deposition, validation and
curation works
• Standards are an important part of data linking
• MORE collaboration and data sharing can
benefit us all