Walk through the process of building a knowledge base by mining information stored in the documents

Making machines understand the data in documents

One of the biggest challenges in the industry today is how to make machines understand data in a document like humans understand the context and intent of a document by reading it. The first step towards this goal is to somehow convert the unstructured information (free-floating text and tables text) to a semi-structured format and then process it further. That’s where Graphs play a major role – giving shape and structure to the unstructured information present in the documents.

This code pattern provides a detailed description on building a domain-specific knowledge graph. The code pattern covers and addresses all the aspects of this process, from the challenges that you can face while building the knowledge graph and how to resolve them, to how to fine-tune this code pattern to meet your requirements. The code pattern uses Watson NLU, the Extend Watson text Classification code pattern to augment the entities picked by Watson NLU, and the Correlate documents from different sources code pattern to augment the relations picked by Watson NLU. Basically, it uses the best of both worlds – rule-based and dynamic Watson NLU. Then, the results are filtered to meet the needs of that domain.