Maps, JSON and Sparql – A Peek Under the Hood of MarkLogic 7.0

Blogging while attempting to run a full time consulting job can be frustrating – you get several weeks of one article a week with some serious momentum, then life comes along and you’re pulling sixty hour weeks and your dreams begin to resemble IDE screens. I’ve been dealing with both personal tragedies – my mother succumbed to a form of blood cancer last month less than a year after being diagnosed – and have also been busy with a semantics project for a large television network.

I’ve also had a chance to preview the MarkLogic 7 pre-release, and have been happily making a pest of myself on the MarkLogic forums as a consequence. My opinion about the new semantics capability is mixed but generally positive. I think that once they release 7.0, the capabilities that MarkLogic has in that space should catapult them into a major force in the semantic space at a time when semantics seems to finally be getting hot.

As I was working on code recently for client, though, I made a sudden, disquieting revelation: for all the code that I was writing, surprisingly little of it – almost none, in fact – was involved in manipulating XML. Instead, I was spending a lot of time working with maps, JSON objects, higher order functions, and SPARQL. The XQuery code was still the substrate for all this, mind you, but this was not an XML application – it was an application that worked with hash tables, assertions, factories and other interesting ephemera that seems to be intrinsic to coding in the 2010s.

There are a few interesting tips that I picked up that illustrate what you can do with these. For instance, I first encountered the concat operator – “||” – just recently, though it seems to have sneaked into ML 6 when I wasn’t looking. This operator eliminates (or at least reduces) the need for the fn:concat function:

XQuery has a tendency to be parenthesis heavy, and especially when putting together complex strings, trying to track whether you are inside or outside the string scope can be an onerous chore. The || operator seems like a little win, but I find that in general it is easier to keep track of string construction this way.

Banging on Maps

Another useful operator is the map operator “!”, also known as the “bang” operator. This one is specific to ML7 [Author Update: I’ve been informed that this was also available in ML6, so if you have that release, bang away!], and you will find yourself using it a fair amount. The map operator in effect acts like a “fold” operator (for those familiar with map/reduce functionality) – it iterates through a sequence of items and establishes a context for future operations. For instance, consider a sequence of colors and how these could be wrapped up in <color> elements:

save that it is not necessary to declare a specific named variable for the iterator.

This can come in handy with another couple of useful functions – the map:entry() and map:new() functions. The map:entry() function takes to arguments – a hash name and a value – and as expected constructs a map from these:

Here, the context changes – after the first bang operator the dot context holds the values (“red”,”blue” and “green” respectively). After the second bang operator, the dot context now holds the map output from these keys on the $colors map: (“#ff0000″,”#0000ff”,”#00ff00″). These are then used in turn to set the color of the text. Notice that you can also assign a context to a value (and use that further along, so long as you are in the XQuery scope for that variable) – here $key is assigned to the respective color names.

Again, this is primarily a shorthand for the for $item in $sequence statement, but it’s a very useful shortcut.

Maps and JSON

MarkLogic maps look a lot like JSON objects. Internally, they are similar, though not quite identical, the primary difference being that maps are intrinsically hashes, while JSON objects may be a sequence of hashes. Marklogic 7 supports both of these objects, and can use the map() operators and the bank operator to work with internal JSON objects.

For instance, suppose that you set up a JSON string (or import it from an external data call). You can use the xdmp:from-json() function to convert the string into an internal ML JSON object:

=> Aleria [half-elf]=> Gruarg [half-orc]=> Huara [human]The xdmp:to-json() function will convert an object back into the corresponding JSON string, making it handy to work with MarkLogic in a purely JSONic mode. You can also convert json objects into XML:<map>{$characters}</map>/*

This format can then be transformed to other XML formats, a topic for another blog post.

SPARQL and Maps

These capabilities are indispensable for working with SPARQL. Unless otherwise specified, SPARQL queries generate JSON maps, and use regular maps for passing in parameters. For instance, suppose you load the following turtle data:

The query passes a map with two entries, one specifying the gender label, the other the species label. Note in this case that we’re not actually passing in the iris, but using a text match. These are then used to determine via Sparql the associated character name and vocation label, with the output then being a JSON sequence of json hash-map objects. This result is used by the bang operator to retrieve the values for each specific record.

It is possible to get sparql output in other formats by using the sem:query-results-serialize() function on the sparql results with the option of “xml”,”json” or “triples”, such as:

sem:query-results-serialize($character-maps,”json”)

which is especially useful when using MarkLogic as a SPARQL endpoint, but internally, sticking with maps for processing is probably the fastest and easiest way to work with SPARQL in your applications.

Summary

There is no question that these capabilities are changing the way that applications are written in MarkLogic, and represents a shift in the server from being primarily an XML database (though certainly it can still be used this way) into being increasingly its own beast, something capable of working just as readily with JSON and RDF employing a more contemporary set of coding practices.

In the next column, I’m going to shift gears somewhat and look at higher order functions and how they are used in the MarkLogic environment.

Share this:

Kurt Cagle is the Principal Evangelist for Semantic Technology with Avalon Consulting, LLC, and has designed information strategies for Fortune 500 companies, universities and Federal and State Agencies. He is currently completing a book on HTML5 Scalable Vector Graphics for O'Reilly Media.

About Avalon Consulting, LLC

Avalon Consulting, LLC transforms data investments into actionable business results through the visioning and implementation of Big Data, Web Presence, Content Publishing, and Enterprise Search solutions. We are the trusted partner to over one hundred clients, primarily Global 2000 companies, public agencies, and institutions of higher learning.