A Look Into Learning SPARQL With Author Bob DuCharme

The second edition of Bob DuCharme’s Learning SPARQL debuted this summer. The Semantic Web Blog connected with DuCharme – who is director of digital media solutions at TopQuadrant, the author of other works includingXML: The Annotated Specification, and also a welcome speaker both at the Semantic Technology & Business Conference and our Semantic Web Blog podcasts – to learn more about the latest version of the book.

Semantic Web Blog: In what I believe has been two years since the first edition was published, what have been the most significant changes in the ‘SPARQL space’ – or the semantic web world at large — that make this the right time for an expanded edition of Learning SPARQL?

DuCharme: The key thing is that SPARQL 1.1 is now an actual W3C Recommendation. It was great to see it so widely implemented so early in its development process, which justified the release of the book’s first edition so long before 1.1 was set in stone, but now that it’s a Recommendation we can release an edition of the book that is no longer describing a moving target. Not much in SPARQL has changed since the first edition – the VALUES keyword replaced BINDINGS, with some tweaks, and some property path syntax details changed – but it’s good to know that nothing in 1.1 can change now.

Semantic Web Blog: Will the new edition be equally applicable to newbies to the semantic web as well as old hands?

DuCharme: The new edition has more for both. All the original introductory material, suitable for people who’ve barely heard of SPARQL, is still there, and the website at learningsparql.com has the sample code for both editions. The new cookbook chapter, instead of using queries that were made up to demonstrate specific SPARQL features with my sample data (like in the introductory chapters), shows a range of real-world queries that are useful in a variety of situations, and both beginners and advanced users have told me that they appreciated that.

Also, the new chapter on “RDF Schema, OWL, and Inferencing” explains the relationship between SPARQL and these topics, a subject that often confuses new SPARQL users who are first learning about semantic web technology.

Semantic Web Blog: Everyone in the world today is dealing with Big Data, or so it seems. How can this book help them deal with that Big Data when running SPARQL queries?

DuCharme: The classic “Three Vs” of Big Data are Variety, Volume, and Velocity, and the combination of SPARQL and the RDF data model deal with variety especially well because they let users look for patterns in data aggregated from multiple disparate sources – even sources that are not stored as RDF but converted for the query dynamically using middleware – and looking for patterns in dynamically aggregated data is a classic Big Data use case. With IBM’s DB2 database manager and the Urika product from Cray subsidiary YarcData both supporting SPARQL, they’ve clearly seen that this standard’s ability to handle variety is a good fit with their ability to handle volume and velocity.

Semantic Web Blog: You’ve made reference in your blog to expanding the App Dev chapter quite a bit. Can you give us a little taste as to what it gets into now?

DuCharme: The first edition of the book didn’t even mention the SPARQL 1.1 Graph Store HTTP protocol, a SPARQL specification for reading, adding, deleting, and updating entire graphs (that is, sets) of RDF triples with simple HTTP commands. This can make many basic operations of an application happen much more easily, so I added coverage of that. Also, in the book’s first edition, this chapter was more focused on web-based applications, so in the new edition it includes more about SPARQL’s potential role in applications for a wider variety of platforms. It also describes the good fit of RDF-related technology to model-driven development, which has always been an important principle at TopQuadrant.

Semantic Web Blog: You also reference a 23 percent reduction in mentions of the semantic web in a bigger book – why is that a good thing?

DuCharme: It was a bit of a joke, but true. When I studied up on the popular NoSQL databases, I found that people were applying non-relational, unstandardized technologies to some very modern challenges and having plenty of success. The semantic web world has often been about selling a vision first and the related technology second, so that if people don’t buy into the vision you may never get to tell them about the technology. I saw the limited success of that vision help to inspire the idea of Linked Data, a related vision that rebranded many of the same ideas.

I think that skipping past the visions and getting right to the benefits of the non-relational (but in this case, standardized) RDF and SPARQL technology can help more people to appreciate its benefits–the same people who have been investigating HBase, neo4j, Hadoop, and related technologies. These people often use these tools to address a different buzzphrase vision such as Big Data, and as I described earlier, SPARQL can make a great contribution there as well.

Semantic Web Blog: What are some of the cool things you learned while adding the new material?

DuCharme: I especially learned a lot when writing the new chapter “Query Efficiency and Debugging.” I read all the academic papers I could find on SPARQL optimization, but through more informal research I gained a much better appreciation of the value of reducing the search space as early as possible to make queries faster.

In other words, if you give a processor a few conditions for things you want it to look for, it’s best to tell it to start with the conditions that will minimize the choices it has to evaluate when it deals with the later conditions, because then it can do its job faster. The same principle of reducing the search space early applies with any kind of computerized searches, whether it’s in a relational database or a web search; in a SPARQL query, it can mean that simply moving your eighth triple pattern up before the first seven can make your query run in a quarter of the time. This has already been a big help in my work at TopQuadrant.

About the author

Jennifer Zaino is a New York-based freelance writer specializing in business and technology journalism. She has been an executive editor at leading technology publications, including InformationWeek, where she spearheaded an award-winning news section, and Network Computing, where she helped develop online content strategies including review exclusives and analyst reports. Her freelance credentials include being a regular contributor of original content to The Semantic Web Blog; acting as a contributing writer to RFID Journal; and serving as executive editor at the Smart Architect Smart Enterprise Exchange group. Her work also has appeared in publications and on web sites including EdTech (K-12 and Higher Ed), Ingram Micro Channel Advisor, The CMO Site, and Federal Computer Week.