Pages

Friday, September 10, 2010

Pulling out data as JSON from XHTML+RDFa

I am keen on RDFa and RDF in general; that should not be a surprise. RDFa is a serialization of RDF triples embedded in (X)HTML. I recently posted about chemical examples of XHTML+RDFa. Now, the reason for putting data in HTML as RDFa is that we can easily pull it out again, e.g. with this distiller. But the fun goes on, and we can actually also run SPARQL directly on it, for example with RDFaDev which I recently blogged about.

Now, consider we have all these nice visualization tools written in JavaScript which can visualize data from JSON sources, the mashup requires a JSON serialization of that data embedded in HTML pages. Now, I have no experience with the cool JavaScript tools, and hope someone can help me out here, but the JSON bit I already got help with before on SemanticOverflow (thanx to Comment Bot!). The service mentioned no longer works, but there are plenty of alternatives.

Now, Peter is creating this nice data set about green solvents from patents, and it would be great of that data ends up online as RDFa, so that we can easily visualize the trends in solvent use over the years. But as I do not have this data as XHTML+RDFa yet, you will have to do with another example: boiling points.

So, let's consider the data on this page, relating paraffin molecules to boiling points, and we'll take a complexity descriptor (w0, Wiener descriptor) and the boilingpoint (t0). so we get this SPARQL query:

The point is, I am sure at least one of my readers knows how to visualize the data in this JSON with, for example, Google Chart, particularly, because all the mashing up is embedded in the just linked-to, though obscure, URL. And, if it helps, you can otherwise use the CSV or TSV output. The output of that is even more simple (CSV):

w,p
56,4
286,9
35,3
220,8
20,2
84,5
10,1
165,7
120,6

The first one who can use one of the above URLs to extract the data from that XHTML+RDFa page to create a scatter plot in a HTML page with some JavaScript library, wins a free mention in my blog! ;)

Search This Blog

This blog deals with chemblaics in the broader sense. Chemblaics (pronounced chem-bla-ics) is the science that uses computers to solve problems in chemistry, biochemistry and related fields. The big difference between chemblaics and areas such as chem(o)?informatics, chemometrics, computational chemistry, etc, is that chemblaics only uses open source software, open data, and open standards, making experimental results reproducible and validatable. And this is a big difference!

About Me

Assistant professor at the Dept of Bioinformatics - BiGCaT at NUTRIM, Maastricht University, studying biology at an unsupervised and atomic level. Open Science is my main hobby resulting in participation in, among many others, Bioclipse, CDK and WikiPathways. ORCID:0000-0001-7542-0286. Posts on G+ are personal.

Cookies

In the EU there is a directive upcoming requiring websites to warn people about HTTP cookies. This website uses the Blogger.com platform, Google Adsense (not that is it actually paying anything significantly), and a few scripts to count how often a blog post was tweeted, using Topsy and LinkedIn. These services undoubtedly make use of cookies, which you can disallow in your browser.