DOCstore 1.2 – the semantic search tool is released and live

Search just got a whole lot more powerful.

It’s here! DOCstore v1.2 is ready with a whole host of new features, including an all new Connectors package. Powered by elastic, DOCstore provides faceted, semantic search for unstructured data. The ideal tool for a range of roles, from bench scientists to business analysts, it allows you to:

Create a highly enriched, more analytical in-house version of Medline.

Combine multiple data sources such as grants, trials and literature into a broad literature search tool

We thought it would be the perfect time to share the new features with you, so here’s a rundown of what you can expect.

Advanced Search

Customisation of the User Interface

Optional Google Analytics Integration

Better memory profile

DOI field now searchable

API additions

Text attributes added to the data model

Connectors – automated data management pipeline

These developments are a direct result of our customers’ feedback – as always at SciBite, we love hearing from clients as to how we can make things even better.

Let’s look at each one a bit more closely.

Advanced Search

Search just got a whole lot more powerful with the ability to add multiple queries on multiple fields. So, for example, if I wanted to search for documents that mention ‘PDE5A’ in the title, but also mention ‘university’ in the organisation field I can now do this using the Advanced Search feature.

I can also filter on publication date or document index date.

And that’s not it. Additional filters exist for SubSource , Project ID and SubProject ID fields if they’re populated. Each of these fields represents a way to sub categorise a document. They’re settable at DOCstore load time and allow you to hold multiple copies of the same underlying document, perhaps indexed in different manners, under different contexts.

Customisation of the User Interface

You can now customise the user interface by supplying custom HTML snippets in the configuration. Examples include adding bespoke links in the ‘Explore’ dropdown, or adding an icon to the results panels with an icon linking back to the source document.

Additionally, DOCstore is now able to serve static content such as original PDF files. If you’ve indexed PDF files, you can now link to the original ones and have them show up in your browser.

Optional Google Analytics Integration

If you want to integrate Google Analytics into your DOCstore server to monitor patterns of usage, you can. Extremely useful to see what it’s being used for and when, which we know matters to organisations.

Better memory profile

DOCstore can now be run in 2G of memory with Medline and ct.gov data. That’s about a 6x saving compared to DOCstore v1.1 for this dataset.

DOI field now searchable

The doi field is now included in the list of searched fields, making it possible to search for articles that have a doi.

API Additions

There is now more comprehensive input validation to the REST API, and better error messages. There is also the ability to add document unique identifiers into Co-occurrence Matrix API calls for the top 200 (sorted on publication date) documents that fulfill the co-occurrence criteria.

You can also do document or sentence level searches and retrieve only document metadata, such as ids, sources etc, rather than the entire documents themselves. This reduced payload option is ideal if you only need to use a small part of the data from each returned data set. You can now simply get the data you require, instead of a huge amount of other information that would just cost transfer time and slow down your processing.

A new operation was added for this to happen on the sentence level:

and a new parameter for the document level:

example output:

Text Attributes added to the data model

Attributes in the termite output ‘attributes’ section are now stored as key-value text in DOCstore. They’re not yet searchable (you’ll have to wait for the next version for that), but they’re returned in the data and if you apply the customisations detailed above to the user interface, you can see the data there.

Colour coded entities in the User Interface

There are now visual cues as to the entity types in the User Interface, such as different coloured underlines.

Connectors

Now you can take the pain out of maintaining your data pipeline into DOCstore. Imagine being able to automate the management of that data pipeline without manually setting up, checking and approving each update, or having to outsource the task.

With Connectors, this is all handled via a web-based user interface with no command line access required.

And that’s not all. The parameters required to run TERMite vary according to data sources. Again, Connectors helps here, where it will suggest the appropriate ones for the right data source.

Connectors moves the burden of loading DOCstore from the hands of the IT technician to the scientist.

Other features include:

Run updates regularly or everyday, with the option to schedule more precisely

Control for you – you define the pipeline and the number of steps

Intelligent – utilising a checkpoint system, Connectors will return to the last sound update point, should anything go awry

Non expert and expert modes

Extensible architecture – simply plug in code into the pipeline

And that’s DOCstore 1.2 . Faster, more efficient search, allowing you to cut out the noise of unwanted information and customise what you see. And with Connectors, once again, our developments are democratising data management for the life sciences.

To find out more about how DOCstore and the rest of the SciBite platform can transform your data, contact us on info@scibite.com

Article info

Posted: 22/01/18

Author: Abbas Arezoo

Category:

Technology

Related articles

Drug Repurposing, Rare Diseases and Semantic Analytics

A look at potentially reducing the cost of and speeding up the repurposing pipeline.