By submitting my Email address I confirm that I have read and accepted the Terms of Use and Declaration of Consent.

By submitting your personal information, you agree to receive emails regarding relevant products and special offers from TechTarget and its partners. You also agree that your personal information may be transferred and processed in the United States, and that you have read and agree to the Terms of Use and the Privacy Policy.

more meaning to published information. It is one of the foundations to what many call the "Semantic Web." The purpose behind RDF is simple, let people describe the essence of resources using a syntax more powerful than standalone keywords, for the benefit of machine analysis. In doing so, RDF creates a need of another nature, how to search and access such statements. Next, you will learn about another standard developed by the W3C that aids in such a process: SPARQL - Query Language for RDF.

In order to understand what SPARQL solves, its necessary to take a sidestep into RDF's syntax. RDF operates on the premise of triples, expressions composed of a subject, predicate and object, that in turn provide more context to a resource. This is the reason why RDF is often considered metadata markup language. To support this metadata, RDF promotes the use of vocabularies to enforce pre-defined structures that can cover any topic imaginable. One such vocabulary is VCard RDF used to represent personal contacts. Listing 1.1 shows a VCard RDF structure with a supplemental and custom vocabulary named peopleInfo.

What an RDF structure like the one illustrated in listing 1.1 achieves is the capability to publish machine consumable -- or Semantic Web -- resources. This same approach can be extrapolated to vocabularies covering Web documents, exposing resource metadata like author, publication date and copyrights, to more exotic vocabularies used for exposing metadata on things like flights, movies or music.

RDF documents are interesting by the mere fact that they expose the essence of a resource, without the need to use any sophisticated data mining algorithms. However, the more interesting question is how to search for information in an RDF structure? With RDF being a markup language, the simple answer would be using the same tools used to search XML or HTML structures -- like DOM or SAX -- but the reality is that relationships in an RDF ontology can become quite complex to rely on the same tools. Stepping into fill this void is SPARQL.

SPARQL has an SQL syntax like the one used to perform queries against relational databases, supporting qualifiers like ORDER BY, DISTINCT and LIMIT, therefore granting the same query power to perform searches on markup language structures. That said, let's analyze a few SPARQL queries used to perform searches on an RDF structure like the one in listing 1.1. Listing 1.2 contains such queries and the corresponding results.

The first SPARQL query is as simple as it gets, indicating a query be performed on the VCard RDF namespace for an #FN value of "John Doe", with the ?Homepage value representing a variable to which the corresponding result is assigned. The output for this particular query points toward the rdf:about value which is the contact's main page.

The second SPARQL query uses the PREFIX value to assign an RDF namespace to a variable that is later used inside the query. This particular query uses the supplemental VCard RDF value of seniority in the http://www.example.com/peopleInfo namespace, as well as, SPARQL's FILTER qualifier. The results for this query return all the VCard RDF contacts with a seniority greater than 10, with the results also pointing toward the matching contact's rdf:about value. The last SPARQL query simply performs a search on both namespaces included in the VCard RDF structure, and outputs every contact's name, seniority and title.

In order to execute SPARQL queries, you can rely on a tool like Jena -- a Semantic Web framework written in Java that includes a SPARQL query engine. Jena's SPARQL query engine will allow you to input both an RDF structure and SPARQL statement, and output query results to either a standalone console or integrate them as part of a greater Java application.

Additionally, it is also worth mentioning that SPARQL queries are often performed against an abbreviated form of RDF named Turtle - Terse RDF Triple Language , which is nothing more than a compact version RDF, though harder to read by humans, it is a more efficient syntax for machine processing.

Though RDF and SPARQL aren't mainstream compared to other markup and query languages, they both set an important precedent for organizations wanting to make their content Semantic Web friendly, and with it facilitate the creation of custom data mining applications. Keyword data mining via search engines and their supplemental Web services will never go out of style because they are simply too easy to use, but it's important to realize major search engines can take months, years or often times never extract relevant meaning on certain content. By using RDF, a resource's meaning can improve dramatically aiding in any data mining efforts, and with SPARQL providing the foundations to search such structures, any organization is capable of creating more intelligent data mining applications leveraging both technologies.

About the author Daniel Rubio is an independent technology consultant with over 10 years of experience in enterprise and web-based software, he blogs regularly on these and other software areas.

0 comments

E-Mail

Username / Password

Password

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy