Watson Knowledge Studio elaborating Entity Types/Subtypes

My question is referring to use cases and how it can be relevant in data extraction. In my case, I would be referring to unstructured reviews of a product. Let's say a review comes in as a document and it looks like the following:

"This camera is great. I really enjoy the picture quality. I think it is a little expensive, however."

What's important here is: a) recognizing document sentiment. I would say, overall, this review is most likely positive, which we can get from sentiment analysis. No problems here. and, however, b) recognizing specific product features and their opinion sentiment. A la product x -> opinion y ("picture quality" -> "enjoy" or "price" which would be gathered from -> "expensive").

Assuming this, would it be best for machine learning to extract this relevant data as:

1 reply

WKS does not discover entity types and subtypes; rather, it facilitates (1) annotating documents in a type system the user gives it, (2) assessing inter-annotator agreement to help the user determine e.g. where types are confusing or are understood inconsistently among annotators, and (3) training and testing a machine-learning information-extraction model using those annotated documents and that type system.

The choice between making all entity-type distinctions at the top level and using no subtyping, vs. fewer entity-type distinctions but then subtyping for some or all of them, is a relatively arbitrary choice. The machine-learning model will function the same way, as it detects entity types and subtypes conjoined. If you have a lot of types, you may find it advantageous to use the subtype feature, in order to keep the top-level entity-type menu a manageable size. However, most people find it more advantageous to keep all entity types at the top level, sparing the annotators all the clicking requiredo drill repeatedly into the subtype menus. Plus, if you have so many types that a single-level menu is unwieldy, you might consider how (un?)likely it is that your annotation team will understand all those types and the boundaries among them consistently. Consider starting with a small corpus and analyzing inter-annotator agreement early and often.