Epinomy LT Quick Tutorial

Hi and welcome to the Epinomy LT overview and tutorial.

Epinomy LT is a taxonomy manager, autotagger and faceted search tool for MarkLogic. It is intended to get you started in the world of semantic enrichment with a powerful and easy-to-use tool for your enterprise documents.

In this brief tutorial, we will be covering the major functions of Epinomy LT.

So let's get started.

When you first start up Epinomy LT, the default landing page is an empty search page. Don't worry, we're going to put a taxonomy and some documents into the application.

Let's go to the Registry page first. The Registry is where you load or create taxonomies. In this case, let's load the Integrated Public Sector Vocabulary. The IPSV is an open taxonomy created by the UK government.

To upload it, click on the "Select File" button and choose the IPSV XML file that is shipped with Epinomy LT. It will take a few seconds to load around 4000 terms from the file.

That's all there is to it. You can also create a taxonomy from scratch, but we won't be talking about that in this tutorial.

Let's have a look at the IPSV taxonomy we just loaded.

Just click on the taxonomy name to be brought to the Concept Manager. In Epinomy LT, we call taxonomy nodes "concepts". You may hear the word "term" or "node" or "taxon" or "concept". In Epinomy LT, they are pretty much the same thing.

The Concept Manager page has four panels, the Concept Tree View, the Concept Details panel, the Document View panel and the List of Documents panel.

Let's start with the Concept Tree View. This is kind of like an explorer view of the taxonomy. If you want to work on a different taxonomy, select it from the drop down at the top of the Concept Tree View. Since we only have one taxonomy loaded now, it isn't very interesting.

Large taxonomies can be tough to navigate. You can use the Find Term box to locate a term in the taxonomy. Just type the first few letters of the concept and you can pick from a drop down of matching terms. For example type C O N S to find Conservation Areas.

The tree itself behaves pretty much like you would expect a tree editor to behave. You can right click to get a menu of operations for a selected node, you can drag and drop nodes around in the hierarchy, and you can cut/copy and paste nodes to your heart's content.

Let's make a simple change. Since this is a product of the United Kingdom, it contains the British spelling for all of the terms. Defence is one of them. Now that's fine with me. They invented the language, so they get to say how the spelling works. But for illustration, let's change the spelling of Defence with a "c" to Defense with an "s". There are a few ways to do that, but let's use it as a segue to talk about the Concept Details panel.

Click on "International affairs and defense". Now to change the name, we simply modify it in the Concept Name field. While we're up, let's also change the classification rule to be much more broad. Rather than matching the exact phrase "International Affairs and Defense", we want to tag any document that contains any variation on the phrase "international affairs" OR "defense", no matter which spelling is used.

Click Save to apply the changes.

We'll come back to this panel in a minute, but first, let's add a few documents. One of the best features of Epinomy LT is that it allows you to dynamically enrich the metadata attached to documents stored in MarkLogic.

In addition to the IPSV, we also bundle some Microsoft Word documents that contain business glossary information.

To upload them, click on the Upload New Documents button. This will open a dialog that allows you to drag and drop or just open multiple documents to add to Epinomy LT and MarkLogic.

Pick all of the Glossary documents and add them. What Epinomy LT is doing now is executing all of the concept rules in the IPSV and tagging the documents based on which concepts rules match each document.

You will notice that after we loaded and tagged all of the documents, the hit counts in the Concept Tree were updated. For each term that has children, there are two numbers. The first number is the number of documents that were tagged with the rule for the the concept rule itself. The second number is the sum of unique documents that are matched by all child concepts.

You can expand the tree and see that leaf concepts, concepts without children, only have one count. That is the number of documents that match that node.

Let's double-click on "Directors". This does a few things. First, it populates the Concept Details with the "Directors" concept information. Second, it populates the search box on top of the List of Documents with the "Directors" concept rule.

The search box is a handy way to test queries without having to actually update a node. You can enter any search expression in this box and see what documents will be returned.

Also, clicking the "Test" button in the Concept details will do the same thing as double-clicking on the concept name.

We see that 8 documents out of our 26 glossary documents contain the word "Director" or a stemmed variation of it. Let's click on Business Glossary N to see why.

You see when you click on Business Glossary N, the Preview panel is populated with a highlighted view of the text from the document. This allows you to see exactly why Business Glossary N was tagged with Directors. The orange bars down the right side of the panel let you jump to specific instances of the concept rule terms.

You will notice that the rule "Directors", plural, also matched the singular "Director". This is called stemming, and will find other word forms as well, like tenses for verbs.

Other tabs on the View pane include Properties, which allows you to view and add new metadata to documents in MarkLogic. You can see the typical metadata that a document has. Go to the bottom and you can add a new property to the document. Let's create a property called "Publisher" and give it the value "Penguin". That value is instantly indexed and searchable as we'll see when we talk about the search page.

Finally for the view pane, let's look at the Links tab. This tab contains all of the Concepts that have been tagged to this document. You can see that IPSV has a lot of common concepts that have been applied to this document. Let's click on "Directors" and see that the concept Directors is highlighted in the Tree View Editor. Everything is live and linked together.

You can manually tag a document by dragging a concept from the Taxonomy Tree over to the drop area on the Links list. This is handy if you have a concept that should be applied to a document, but is kind of awkward to write a tagging rule. You can also unceremoniously delete a tag from the document by clicking on the red X next to it.

Now lets go back to the concept details. If you click on the Links tab, you will see all of the properties of the selected Concept. It has all the properties you would expect like name, date modified and Rule. It also has the list of all documents that are tagged with this item. You can click on the documents here, just like in the List of Documents.

Scroll down to the bottom, and you can add new links. Since the technology underlying Epinomy LT is RDF and triples, you can specify two different kinds of links. The first kind is a string - which is just a value. The second type is a Concept reference. This is a very powerful feature of triples where you define a relationship between two objects.

The value on the left is called the "Predicate". This is the kind of link you are creating. You can create any kind of link you like here, and as you create new ones, they are added to the global list of predicate types. Let's create a relationship between Directors and Filmmaking.

We create a new predicate called WorksIn. This defines the kind or relationship between Directors and the job that they do. If you start typing F I L M, you see that Films and Filmmaking is a concept in this taxonomy. Let's select that to create the link.

We now have a very powerful relationship stored in the MarkLogic triple store. We won't go into deep dive of how triples work, but if you are into RDF, I hope you recognize how interesting this feature is.

Now let's have a look at the search page. I'll be frank with you. The search page works fine - but we figure that most people will already have a search page or will want to build their own search page to integrate with their system. We're happy to talk about professional services, or to help your developers jack in to our data structures.

First, let's execute a broad search for "work". It brings back 23 documents and update the Concepts tree. The Concepts Tree on this page resembles the Tree Editor on the surface, but it behaves differently. In the Concept Manager, the tree always displays all terms in the taxonomy, regardless if any documents are tagged with that term. On this Search page, we only display the terms that actually exist in the current search results list. Click on a document to see a preview window with highlights.

Let's do a much more restrictive query to illustrate this. Remember we entered "Penguin" in the Publisher property of one of our documents? Let's search for "Penguin". Only one document is found, and our list of concepts has shrunk. If we expand a few, notice that the maximum count of any of our Concept Nodes is one, and that any Concepts that have no documents are not displayed.

So, that's it for our little tour of Epinomy LT. I hope you enjoyed it and will take it for a spin.

To find out more, please visit Epinomy.com. Contact on the contact page. Follow us on Facebook, Twitter or Linked In.