Anyway, when this paper reached the most viewed paper position in the JChemInf journal, and I tweeted that event, I was asked for an update of the linked data graph (the darker nodes are the twelve the LODD task force worked on). A good questions indeed, particularly if you consider the name, and that not all of the data sets were really Open (see some of the things on Is It Open Data?). UMLS is not open; parts of SIDER and STICH are, but not all; CAS is not at all, and KEGG Cpd has since been locked down. Etc. A further issue is that the Berlin node in the LODD network is down, which hosted many data sets (Open or not). Chem2Bio2RDF seems down too.

Bio2RDF is still around, however (doi:10.1007/978-3-642-38288-8_14). At this moment, it is a considerable part of the current Linked Drug Data network. It provides 28 data sets. It even provides data from KEGG, but I still have to ask them what they had to do to be allowed to redistribute the data, and whether that applies to others too. Open PHACTS is new and integrated a number of data sets, like ChEMBL, WikiPathways, ChEBI, a subset of ChemSpider, and DrugBank. However, it does not expose that data as Linked Data. There is also the new (well, compared to three years ago :) Linked Life Data which exposes quite a few data sets, some originating from the Berlin node.

I am aggregating data in a Google Spreadsheet, but obviously this needs to go onto the DataHub. And a new diagram needs to be generated. And I need to figure out how things are linked. But the biggest question is: where are all the triples with the chemistry behind the drugs? Like organic syntheses, experimental physical and chemical data (spectra, pKa, logP/logD, etc), crystal structures (I think COD is working on a RDF version), etc, etc. And, what data sets am I missing in the spreadsheet (for example, data sets exposed via OpenTox)?

Search This Blog

Loading...

This blog deals with chemblaics in the broader sense. Chemblaics (pronounced chem-bla-ics) is the science that uses computers to solve problems in chemistry, biochemistry and related fields. The big difference between chemblaics and areas such as chem(o)?informatics, chemometrics, computational chemistry, etc, is that chemblaics only uses open source software, open data, and open standards, making experimental results reproducible and validatable. And this is a big difference!

About Me

Assistant professor at Maastricht University, studying biology at an unsupervised but atomic level. Open science is my main hobby resulting in participation in, among many others, Bioclipse, CDK and Wikipathways.

Cookies

In the EU there is a directive upcoming requiring websites to warn people about HTTP cookies. This website uses the Blogger.com platform, Google Adsense (not that is it actually paying anything significantly), and a few scripts to count how often a blog post was tweeted, using Topsy and LinkedIn. These services undoubtedly make use of cookies, which you can disallow in your browser.