What’s misssing? DLs, OWL & the Ecology of Semantic Systems or “Ontologies don’t make the tea” or “There’s more to KR than ontologies, or even logic” Alan.

Similar presentations

Presentation on theme: "What’s misssing? DLs, OWL & the Ecology of Semantic Systems or “Ontologies don’t make the tea” or “There’s more to KR than ontologies, or even logic” Alan."— Presentation transcript:

1
What’s misssing? DLs, OWL & the Ecology of Semantic Systems or “Ontologies don’t make the tea” or “There’s more to KR than ontologies, or even logic” Alan Rector BioHealth Informatics Group University of Manchester rector@cs.manchester.ac.uk rector@cs.manchester.ac.uk Copyright University of Manchester 2012 Licensed under Creative Commons Attribution Non-commercial Licence v3

5
Problems I am trying to solve (III) ►How to reconcile ICDs traditional classification and legacy with new requirements ►Retain stability with previous versions A classification – not an ontology Fixed depth; mutually exclusive and exhaustive at every level ‣ Every patient event counted exactly once at every granularity ►Overcome shortcomings of previous versions Shorten 20-year revision cycle & support Social Computing approaches Reconcile with modern knowledge Support multiple views & new requirements ►Multi-layered structure ►Ontology layer – hopefully reconciled with SNOMED ►Foundation layer – lots more around the “skeleton” of the ontology ►“Linearizations” – traditional classifications linked to Foundation layer 5

6
Problems I am trying to solve (IV) ►How to create an “Ontology of Clinical Research” that fits into standards ►Must ultimately integrate with UML ►Must carry many arbitrary “rules” and “calculations” Mix of formal and text Eg ‣ Criteria for inclusion and exclusion of patients ‣ Algorithms for calculation of statistics ►Must provide a way of Indexing and discovering trials as a whole based on its characteristics Represent or link to detailed trial protocols ‣ Complex contingent transition networks / plans Recording “journies” of individual patients through those protocol ‣ Which may or may not conform to the protocols -And can describe the reasons for deviations from protocol 6

15
Is this sensible? ►A rigorous logical model for the index ►…but ►No model for ►The information indexed …or for the ►Metadata required for processing it …or for the ►Editorial data required to authenticate it 15 Use DLs to Provide

18
Axioms & Templates: Fundamentally different ►Axioms restrict ►The more you know the less you can say If there are no axioms, you can say anything Hard to find what is permitted to say ►Violations of axioms  unintended inferences (often of unsatisfiability) Global ►Over-riding impossible - monotonic ►Open world - Must be closed for instance validation Often impossible in practice (or require nonstandard “constraints”) ►Templates permit ►The more you know the more you can say If there is no field/slot in the template you just can’t say it Represents what it is permitted to say directly (“sanctioning” easy) ►Violations of templates  validation errors Local ►Over-riding natural – usually non-monotonic ►Closed world - Instance validation natural 18

19
Templates are fundamental to Knowledge Acquisition: No one likes staring at a blank page 19 Or screen

20
(Unconstrained development is hard) ►Most KB development in two stages ►“Gurus” set up schemas/templates ►Domain experts fill in domain information ►Most domain experts expect prompts/forms based on the schemas/templates (“Sanctioning”) ►The of properties that apply to each class ►The permitted values for that property for that class ►The template for annotations for the class ►Immediate notification if they make an error ►…they don’t expect / won’t tolerate ►Delayed feedback… especially as incomprehensible inferences &/or misclassifications Number 1 reason given for avoiding OWL and using frames or similar 20

21
Templates also intuitive for instance validation / value sets ►Straightforward query ►Does the instance satisfy the constraints Closed world Easy to indicate missing values ‣ Unknown values from existential quantification don’t count Quick ►Fits into notion of “Contraints” ‣ Motik et al, 2007 But notion of “constraint” not fully integrated into a system of templates And not part of any standard 21

24
Close to UML Take advantage of good diagramming tools ► Plus a bit of effort to sort out the multiplicities ad cardinalities ► If we use subproperties & property paths & a bit of external checking,we can produce a bridging property, which can be transitive ► has_cause ⊑ inv(hasTopicC) o hasObjectC 24 PneumoniaBacteriumCause hasTopicChasObjectC AssociationDomain Entity Top

25
Reifying associations: An approach to “particulars” ►Natural representation for “some”/“may” ►FAQ: “How do I say ‘may’ in OWL” – E.g. Pneumonia may be caused by Bacteria? ►As useful an approximation as the usual FoL for “some” ∃ xy. C(x) & D(y) & p(x,y) ( Reified associations slightly weaker: do not assert existence of any instances) ►Natural attachment point for strengths of association ►FAQ: “How do I represent probabilities in OWL” Attach them to the associations ►Natural representation for “sanctioning” ►Just ask for minimal non-redundant set of associations with a given topic Number 1 complaint from users converting from Protégé frames to Protégé OWL – “Where is the list of properties” 25

27
Value sets: Awkward or impossible to express as DL queries: Choices ►Use “Most specific” strategy & Knowledge Exploration ►Most specific value set for a given association with a given topic And any other qualifications ►Represent all values as individuals ►But this sacrifices most of benefits of ontology and inference Baby gone with bathwater ►Represent values as the classes, per se. ►Create a second layer of meta-individuals (puns?) for classes ►Can form queries easily But complicated Any errors lead to “Unsatisfiable ontology” 27

28
… and beware of brittle reasoning performance ►Restrictions that are very efficient in an isolated models may stop classifier if included deeply nested in expressions ►Which is where you find value sets ►A reason for not using DL reasoning for value sets ►Or for finding a way to partition the reasoning 28

30
Key approaches III: Hybrid queries and visuation methods ►What does it mean to Quality Assure the Content of an ontology? ►That all expected inferences are made ►That only acceptable inferences are made ►How do we know what is expected and acceptable? How do we know what’s there? ►Compare labels/names and inferences against experts, external sources, and consequences in applications ►Requires Visualisation up the hierarchy as well as down Mixed queries – lexical, syntactic, “exploration”, DL (& linguistic) 30

31
Visualisation for QA: Look up the hierarchy as well as down ►Most subsumption lattices fan in upwards ►Easy to see unintended inferred subsumptions ►Within the given signature ►Experts have no trouble deciding which things they don’t want … and often even spot what’s missing ►Example… 31

34
Combining lexical, exploratory, syntactic and semantic search ►Hard to spot what is missing ►Hypertensive disorders included some complications as well as kinds of hypertension. Did it contain them all? ►Using OPPL2 ►?C:CLASS= MATCH (“.*[Hh]ypertensive.*”)  Lexical SELECT ?C SubClassOf Finding  Semantic WHERE FAIL ?C SubClassOf “Hypertensive disorder”  “Exploratory” (Closed world) ►Syntactic queries (Missing from OPPL so far) ►Replace all occurrence at any level of nesting of “Hypertension” with “Hypertension OR is_caused_by SOME Hypertension” Or vice versa Find all occurrences at any level of nesting of an DL/OWL entity, expression, … 34

35
Lexical Search is only heuristic Many false positives ►Only the highlighted classes are really “hypertensive disorders: ►Others just contain the string “hypertensive” But if I can reduce the search space from 300,000 to 11, it helps 35

36
Understanding this ontology Knowlede Exploration: What do I know about this class/concept (Needed in many applications) ►Semantic Information that is present but hard to get ►What’s a non-redundant set of what’s known about this class? What are the least named subsumers for this expression? What are the non-redundant “interesting” inferred restrictions of this class in this ontology? What’s asserted / inferred about this class? ►What is not provable in this ontology Difference operator between query results ►What’s the canonical form for this class? What practical notions of “canonical” are possible? ►What’s the difference between these two classes? 36

38
Beware! Literal logical may not be clinically correct ►There are subdural hematomas that are not in the head ►But they are very rare, and always described as “spinal” SNOMED is literal logically right but clinically wrong Use in a rule would be life threatening ►Some think “Post operative MIs” are not caused by ischemia ►But again, always qualified “Myocardial infarction” on its own always means “ischemic” SNOMED has probably used an old name ‣ Modern name is “Infarction equivalent” or “Infarction-like event” 38

40
Things we need (i): Probabilities and DLs ►Many of my collaborators use adaptive Bayesian networks ►Major breakthroughs in last 10 years Faster algorithms Use of distributions instead of point probabilities Specify initial models ►Well developed and mature theory E.g. See Chris Bishop http://research.microsoft.com/en-us/um/people/cmbishop/m 40

41
Interwork or Integrate? ►Statisticians are not about to abandon 25 years of development … nor are logicians ►Can DLs & Bayes Models to work together? ►use DLs to structure the models according to context? Different models for the elderly, adult, child, newborn, … ►…and then use Bayesian statistics to propagate probabilities? ►Plausible ►…but some nasty problems to get models that are both logically satisfiable and statistically consistent And computationally efficient ►A Grand Challenge 41

43
Things we need (iii) ►Rich meta data and (limited) higher order reasoning ►Metadata for payloads needs tobe modeled ►One route to calculations, alternative reasoning schemes ►Multi-level reasoning ►Queries we need to be able to form: ►Two drugs of the same type ►Prevalence, incidence, etc. of diseases Property of disease itself, not the instances ►Size of a class ►“Ipsilateral” / “Contralateral” Same side, different sides ‣ Key concept in many medical rules 43

44
Broader perspective: We use DLs, but most people don’t (& we have problems) Some reasons why ►Can’t say what they need to say! ►Or find the information they need ►Not integrated with software engineering tools ►Especially UML/Model Based Architectures ►Lack of tooling & specialist user-facing representations ►Even where solutions are known, implementations are hard to find ►Users need higher level of abstractions – OWL/DLs is low level ►Inference still unstable and unpredictable ►Although improving 44

45
Some notable oddities ►OWL/DLs hardly mentioned in the Ontology Summit 2012 Communique ►http://ontolog.cim3.net/OntologySummit/2012/communique.htmlhttp://ontolog.cim3.net/OntologySummit/2012/communique.html ►OWL is not in the list of “well known vocabularies” for Linked Open Data ►www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/ Although the use of owl:sameAs is “borrowed” ►Deep Knowledge Representation/Project HALO 2012 challenge questions do not mention OWL (& are not answerable with OWL alone) ►sites.google.com/site/2nddeepkrchallenge/sites.google.com/site/2nddeepkrchallenge/ ►There remain as many (or more) users of Protégé-frames as Protégé-OWL ►There is no routine adequate transformation betwen OWL & UML ►Little discussion of the fundamental differences ►Few OWL ontologies follow OWL-DL semantics (even if they claim to) 45

48
Two key events ►Development of description logics ►With enormous progress in past 20 yrs ►… but despite being originally called “Frame languages” had almost nothing to do with “frames” And focused on universal rather than existential restrictions ►Better answers to narrower questions No answers to the easy questions I have unless answers to the hard questions I don’t have ►Borrowing of the wod “Ontology” by Gruber & others ►Brought recognition & popularity “Ontology” ≡ “Good” ►… but confused the universal & particular any world & this world ►…and invited philosophers to both clarify and confuse 48

55
Face out: Make DLs part of a KR “Ecology” ►Focus on use ►The questions users have rather than the ones we can answer ►Take “annotations” seriously ►Interact effectively with other KR communities ►Don’t be afraid of heuristic solutions or approximations ►Factor the problem: Identify where DLs add value (& don’t) Are DLs all of the answer? Part of the answer? Not relevant? ►Extend DLs / OWL where practical & sensible Make it easier to get the information that’s there implicitly Layered models, metadata, constraints, modules… ►Fit DLs/OWL into hybrid systems where not Software engineering/UML, possibilities, probabilities, terminology binding and value sets,… ►Make it easy to build user facing & problem-specific UIs / intermediate representations ►Transformations, scripts, … ►Make it available in standard tools, APIs, Services… 55