Marklogic Semantics

Built-in RDF Triple Store for More Connected Data

As a multi-model database, MarkLogic combines the benefits of a document store and an RDF Triple Store. This approach is ideal for integrating and accessing all of your data. JSON and XML documents provide incredible flexibility for modeling entities, while RDF triples — the data format for semantic graph data — are ideal for storing relationships. MarkLogic Semantics is a great data format for storing metadata, improving data integration, and building applications using that integrated, highly connected data. Popular use cases leveraging MarkLogic Semantics include advanced search apps, recommendation engines, fraud detection, knowledge graphs, drug discovery, and network security.

It works, it just works.

I’m excited because MarkLogic claims to have brought RDF, XML, and other data together —including their various indexes— and they deliver. It’s hard but they went ahead and did it, and my experience is that it works great.”

Dean AllemangCEO and Principal Consultant at Working Ontologist

Semantic Data and Querying: RDF and SPARQL

Graph databases have risen quickly in popularity in recent years, and RDF Triple Stores—where semantic data is stored—are considered a type of graph database. When data starts to take on a graph structure in which entities (people, places, and things) and the relationships between them are the most important thing, it is better to use semantics, which provides better context for your data.

The standard way to represent semantic data is with RDF Triples (Resource Description Framework), and the standard query language is SPARQL. Triples are derived from subject-predicate-object constructions based on entities (people, places, or things) and their relationships. One example is, “John lives in London.” Another example is “London is in England.” Combining these two facts, inferences can be made, such as “John lives in England.”

In this way, simple facts can all be linked together to form a graph of hundreds of billions of facts and relationships. Such knowledge graphs power applications you use every day, including Google’s search and LinkedIn’s “people you may know” feature.

RDF (and by extension SPARQL) becomes more important as the data models themselves become more complex, more associational, and more heterogeneous, simply because the variety of information will dominate over factors such as volume or velocity.”

Kurt Cagle“Why SPARQL Is Poised To Set the World on Fire” - June 4, 2016

Why RDF Triples?

It’s simple. Because it adds context to your data, which improves data integration.

Triples have an advantage over relational databases for many use cases involving relationships — you don’t need to worry about foreign keys, nested queries, and complex joins.

DID YOU KNOW?

Triples are universally understood and can be easily searched and shared

Triples connect together to form graphs that are machine readable, and can even be used to infer new facts

Common standards are defined by W3C for RDF triples and the query language, SPARQL

Triple stores can scale to hundreds of billions of facts and relationships

Triple stores can leverage ontologies to organize and categorize data (ontologies are like taxonomies, but are richer and more useful)

New to the World of Data Integration? Start Here.

Download a free copy of our Data Hub Guide for Architects. This 75+ page eBook is the most authoritative guide to building and using data hubs in the industry, and is a must read for anyone architecting data integration solutions in the enterprise.

“Using RDF triples allows us to create real time connections between data, such as organization structures and relationships between documents and data… We have scaled our platform to more than 40,000 documents per hour with an inventory of 50 million data points in 8 million documents. We have yet to reach a MarkLogic limitation.”

Michael HenryPrincipal, KPMG

The MarkLogic Advantage: Multi-Model Database for Documents, Data, and Triples

When to Use Semantics

Not every use case requires the MarkLogic Semantics option. Sometimes, the document model alone will work just fine. But, when your data involves relationships that you need to store and query at scale, then Semantics provides a great addition. Here are some of the most popular use cases for MarkLogic Semantics.

MarkLogic Semantics acts as the glue for master data, providing an ideal model for reference data and metadata (provenance, lineage, etc.). MarkLogic stores entity data such as Customers and Orders as documents, and can store the relationships between those entities as RDF Triples. You can also describe metadata such as when a document was created, or how it relates to other documents using an ontology. With MarkLogic’s multi-model capabilities, these semantic relationships can be stored inside the documents themselves, or as standalone RDF Triples.

MarkLogic Semantics can help deliver personalized, real-time recommendations and intelligently expand search queries. Graphs are all about highly connected data, and with MarkLogic Semantics, you can leverage those relationships to suggest related people, products, questions, or anything else that is in the graph to help improve the front-end user experience. You can also intelligently expand searches based on semantic ontologies. Even if a document doesn’t mention that keyword you searched for, you still get an expanded set of results that are relevant. It’s a smarter way to build search apps.

MarkLogic Semantics makes it possible for financial services firms to examine relationships between parties and counterparties to uncover liability exposure or potential fraud. Or, for insurance companies, it makes it possible to uncover crime rings and fraudulent claims since there are usually connections that can be drawn between billing addresses, known associates, and historical records. Often these connections are lost in un-integrated or un-indexed data. Semantics brings it all to the surface — quickly.

Intelligence data can be significant in volume, complex in structure, and comes streaming in from multiple sources in different formats and types. To make sense of it all, it needs to be integrated. And, to analyze it all, you need to understand the relationships. MarkLogic Semantics enables you to connect data and visualize the relationships in order to draw conclusions. Whether it is a person of interest that the military is tracking, or police forces tracking neighborhood wrongdoing, MarkLogic Semantics makes it easier than ever before to use your data more intelligently.

MarkLogic Semantics makes it possible to leverage the trillions of triples available in the world that describe all sorts of things about the world. These facts are freely available — just see DBPedia, the CIO Factbook, and Geonames. Or, you can use your own. Either way, those triples can form the fabric of a knowledge graph to help improve search and discovery. For example, it may be helpful to surface facts about London when a user searches for London, or facts about who owns a company and what its subsidiaries are when a user searches for that company. There are limitless possibilities with the world of linked data.

MarkLogic Semantics helps manage IT assets across large organizations or really any asset. Most large organizations have hundreds, if not thousands of IT assets. They are valuable, but require lots of ongoing maintenance. Consider the racks and servers in a data center. With MarkLogic Semantics, you can store the data about that as triples, and run a simple query to say, “show me a list of all the Dell servers that are more than two years old” and get an instant result.

Ontology Driven Entity Extraction

This unique, MarkLogic Semantics feature improves search and classification by identifying entities in free-flowing text. Use the feature to automatically identify entities (people, places, and things) in free-flowing text and then return a list of those entities (extraction) or mark them up in the document (enrichment).

Entities are defined in a user-maintained dictionary, or you can build a dictionary automatically from a SKOS Ontology. If you need NLP (Natural Language Processing) to define an entity, you can use third-party tools such as Smartlogic, PoolParty, Expert System, NetOwl, or Calais.

Powered by MarkLogic and Using Semantics

KPMG’s Digital Labor Automation Platform (DLAP), built on MarkLogic, is at the forefront of Intelligent Automation (IA), enabling KPMG to apply deep subject matter expertise on tax and regulatory reporting to their client data in a highly automated manner. They use semantics to enrich documents as they are ingested, and improve search.

MarkLogic makes it possible to bring together different systems and capabilities to enable information sharing and knowledge creation. The end result is that the OECD can build and promote policies that improve global economic and social well-being. The OECD chose MarkLogic as their core database platform to create a data hub for their new OECD Network Environment, or “ONE” platform.

Pearson provides global education curriculums across multiple levels. They use MarkLogic to store, manage, and deliver all of their digital content through a rich, interactive experience. They leverage the power of MarkLogic’s multi-model capabilities with semantics to connect data in course assessments and student’s relationships to instructional content as a whole.