Transcription

2 + Content management Mars was photographed by the Hubble Space Telescope in August 2003 as the planet passed closer to Earth than it had in nearly 60,000 years. Image Credit: NASA, J. Bell (Cornell U.) and M. Wolff (SSI) 2 2 This 360-degree panorama from NASA's Curiosity Mars rover shows the surroundings of a site on lower Mount Sharp where the rover spent its 1,000th Martian day, or sol, on Mars, in May Image credit: NASA/JPL A sunset on Mars creates a glow due to the presence of tiny dust particles in the atmosphere. This photo is a combination of four images taken by Mars Pathfinder, which landed on Mars in Image credit: NASA/JPL This May 29, 2015, view of a Martian sandstone target called "Big Arm" covers an area about 1.3 inches wide in detail that shows differing shapes and colors of sand grains in the stone. Image credit: NASA/JPL The Planetary Data Store (PDS) is a distributed repository of 40+ years imagery & data taken by a range of instruments on many diverse missions, available for scientific research.

3 + Smart search 3 3 Provenance/sources for tracking family members in the 19 th century include early census data (often error prone), military records, passenger & immigration lists, online documents (e.g., county histories, church histories, etc.) Historical/forensic research requires cross-domain search of a wide variety of resources within a given geo-spatial/temporal context Similar capabilities are essential for business intelligence, law enforcement, government applications all require terminology reconciliation

7 + Historical Context Knowledge Representation Cross-disciplinary field with historical roots in philosophy, linguistics, computer science, and cognitive science Goal is to represent the meaning of knowledge unambiguously, so that it can be understood, shared, and used by computational agents acting on behalf of people to accomplish some task Plato and Aristotle at the School of Athens, by Raphael Philosophical origins Socrates questioning, Plato s studies of epistemology the nature of knowledge Aristotle s shift to terminology, development of logic as a precise method for reasoning about knowledge Arguments for the existence of God dating back to Anselm of Canterbury Medieval theories of reference and of mental language, Scholastic logic 7 7

8 + Historical Context Brain Cells for Grandmother Neuroscientists continue to debate today how we store memories One theory, documented as recently as this month in a Scientific American article, suggests that single neurons hold our memories, as concept cells Each concept each person or thing in everyday experience may have a set of corresponding neurons assigned to it Rodrigo Quian Quiroga, Itzhak Fried and Christof Koch, February 2013 issue, Scientific American a relatively few neurons, numbering in the thousands or perhaps even less, constitute a sparse representation of an image. Our brain may use a small number of concept cells to represent many instances of one thing as a unique concept a sparse and invariant representation What is important is to grasp the gist of particular situations involving persons and concepts that are relevant to us, rather than remembering an overwhelming myriad of meaningless detail. The full recollection of a single memory episode requires links between different but associated concepts If two concepts are related, some of the neurons encoding one concept may also fire to the other one. Visual perception - Neural coding from the University of Leicester, Department of BioEngineering 8 8

9 + Definitions An ontology is a specification of a conceptualization. Tom Gruber Knowledge engineering is the application of logic and ontology to the task of building computable models of some domain for some purpose. John Sowa Artificial Intelligence can be viewed as the study of intelligent behavior achieved through computational means. Knowledge Representation then is the part of AI that is concerned with how an agent uses what it knows in deciding what to do. Brachman and Levesque, KR&R Knowledge representation means that knowledge is formalized in a symbolic form, that is, to find a symbolic expression that can be interpreted. Klein and Methlie The task of classifying all the words of language, or what's the same thing, all the ideas that seek expression, is the most stupendous of logical tasks. Anybody but the most accomplished logician must break down in it utterly; and even for the strongest man, it is the severest possible tax on the logical equipment and faculty. Charles Sanders Peirce, letter to editor B. E. Smith of the Century Dictionary 9 9

10 + What is an ontology? An ontology specifies a rich description of the Terminology, concepts, nomenclature Relationships among and between concepts and individuals Sentences distinguishing concepts, refining definitions and relationships (constraints, restrictions, regular expressions) relevant to a particular domain or area of interest. * Based on AAAI 99 Ontologies Panel McGuinness, Welty, Uschold, Gruninger, Lehmann

11 + Logic and ontological commitment Logic can be more difficult to read than English, but is more precise: (forall ((x FloweringPlant)) (exists ((y Bloom)(z BloomColor))(and (haspart x y)(hascharacteristic y z)) ) ) Translation: Every flowering plant has a bloom which is a part of it, and which has a characteristic bloom color. Language: ISO Common Logic, CLIF syntax Logic is a simple language with few basic symbols The level of detail depends on the choice of predicates the predicates represent an ontology of the relevant concepts different choices of predicates represent different ontological commitments

12 + Ontology-based technologies Ontologies provide a common vocabulary for use by independently developed resources, processes, services Agreements between organizations sharing common services can be specified as formal ontologies, or ontologies with rules, to assist in enforcing explicitly stated policies evaluate usage criteria for services ensure that the meaning of relevant concepts is expressed unambiguously By composing / mapping ontologies and mediating terminology across participating events, resources and services, independentlydeveloped services can work together to share information and processes consistently, accurately, and completely Ontologies also ensure Valid conversations among agents to collect, process, fuse, and exchange information Accurate searching by ensuring context using concept definitions and relations in addition to statistical relevance Policies and rules are consistent with one another to assist in semiautomated policy analysis and enforcement

13 + KR language features Vocabulary Domain-independent logical symbols and reserved terms Domain-dependent constants, identifying individuals, properties, or relations in the application domain or universe of discourse Variables, whose range is governed by quantifiers Punctuation that separates or groups other symbols Syntax rules for combining the symbols into well-formed expressions rules may be stated in a linear grammar, graph grammar, or independent abstract syntax Semantics a theory of reference that determines how the constants and variables are associated with things in the universe of discourse a theory of truth that distinguishes true statements from false ones Rules of Inference rules that determine how one pattern can be inferred from another if the logic is sound, the rules of inference must preserve truth as determined by the semantics 13 13

14 + Classifying logics Logics vary from classical FOL along a number of dimensions: syntax the subsets of FOL they implement for example, propositional logic without quantifiers, Horn-clause, which excludes disjunctions in conclusions such as Prolog, and terminological or definitional logics, containing additional restrictions their proof theory: Intuitionistic logic and relevance logic rule out certain extraneous information Non-monotonic logics allow introduction of default assumptions Access-limited logic restricts the number of times a proposition can be used in a proof; Linear logic allows a proposition to be used only once Modal logic incorporates modal auxiliaries ( p means p is possibly true; p means p is necessarily true), temporal logic extends model logic to include always, sometimes Intensional logics express concepts such as need, ought, hope, fear, wish, believe, know, expect, and intend their model theory, which determines how expressions in the language are evaluated with respect to some model of the world: classical FOL is twovalued; a three-valued logic introduces unknowns; fuzzy logic uses the same notation as FOL but with an infinite range of certainty factors (0.0 to 1.0) ontology frameworks may include support for built-in components, such as set theory or time 14 14

15 + Description logic A family of logic-based Knowledge Representation formalisms Descendants of semantic networks and frame-based languages such as KL- ONE Describe domain in terms of concepts (classes), roles (relationships), and individuals (instances) Distinguished by Formal semantics Decidable fragments of FOL Closely related to propositional, modal, and dynamic Logics Provision of inference services Sound and complete decision procedures for key problems Implemented systems (highly optimized) Applications include Configuration product configurators, consistency checking, constraint propagation, first significant industrial application (e.g., CLASSIC) Question answering and recommendation systems, for suggesting sets of responses or options depending on the nature of the queries Model engineering applications, including those that involve analysis of the ontologies or other kinds of models to determine whether or not they meet certain methodological or other design criteria 15 15

16 + Knowledge bases, databases, and ontology An ontology is a conceptual model of some aspect of a particular universe of discourse (or of a domain of discourse) Typically, ontologies contain only rarified or special individuals, metadata, representing elemental concepts critical to the domain A knowledge base is a persistent repository for Ontology & metadata representing individuals, facts, & rules about how they can be combined or relate to one another Metadata, individuals, facts & rules only in some applications and frameworks the ontology is separately maintained Most inference engines require in-memory deductive databases for efficient reasoning (including commercially available reasoners) A knowledge base may be implemented in a physical, external database, such as a relational database, but reasoning is typically done on a subset (partition) of that knowledge base in memory

17 + Reasoning and truth maintenance Reasoning is the mechanism by which the assertions made in an ontology and related knowledge base are evaluated by an inference engine. In classical logic, the validity of a particular conclusion is retained even if new information is asserted in the knowledge base. This may change if some of the preconditions are actually hypothetical assumptions invalidated by the new information. The same idea applies for arbitrary actions new information can make preconditions invalid. Reasoners work by using the rules of inference to look for the deductive closure of the information they are given. They take the explicit statements and the rules of inference and apply those rules to the explicit statements until there are no more inferences they can make. When some kind of logical inconsistency is uncovered, then the reasoner must determine, from a given invalid statement, whether or not others are also invalid. The housekeeping associated with tracking the threads that support determining which statements are invalidated is called truth maintenance

18 + Negation If all new information is positive (monotonic), then all prior conclusions will, by definition, remain valid Problems arise if new information negates a prior assumption, causing it to be withdrawn Conclusive information is not available? The assumption cannot be proven? The assumption is not provable using certain methods? The assumption is not provable given a fixed quantity of time? The answers to these questions can result in different approaches to negation and differing interpretations by non-monotonic reasoners. Solutions include chronological and intelligent backtracking algorithms, heuristics, circumscription algorithms, justification or assumption-based retraction, depending on the reasoner and methods used for truth maintenance. Reasoning efficiency is dependent, in part, on the algorithms applied for truth maintenance.

19 + Explanations and proofs When a reasoner draws a particular conclusion, many users and applications want to understand why? Primary motivations include interoperability, reuse, trust, and debugging Understanding the provenance of the information and results is crucial, especially when web-based information is involved What information sources were used (source) How recently they were updated (currency) How reliable these sources are (authoritativeness) Was the information directly available or derived, and if derived, how (method of reasoning) Methods used to explain why a reasoner reached a particular conclusion include explanation generation and proof specification 19 19

20 + Analysis approaches Domain analysis the systematic development of a model of some area of interest for a particular purpose The analysis process, including the specific methodology and level of effort required, depends on the context of the work the requirements and use cases relevant to the project the target deliverables Approaches to analysis range from high-level mind mapping and brainstorming to detailed collaboration, dialog, and information modeling to support knowledge sharing Common capabilities include drawing a picture that includes concepts and relationships between them producing sharable artifacts, that vary depending on the tool often including web sharable drawings

26 + Considerations Intended use of ontologies, including domain requirements (e.g., scientific and engineering apps require formulas, units of measure, computations that may be challenging to represent) Intended use of KRSs that implement them, including reasoning requirements, questions to be answered For distributed environments, the number and kinds of resources, processes, services requiring ontologies how distributed, how unique, developed collaboratively or independently, dynamic community participation or static What kinds of transformations are required among processes, resources, services to support semantic mediation Ontology and KRS alignment / de-confliction / ambiguity resolution requirements Ontology and KRS composition requirements, dynamic vs. static composition, in what environment and under what constraints Performance, sizing, timing requirements of target environment 26 26

27 + A little methodology Requirements, domain & use case analysis are critical Develop initial source/reference material Focus on system or application requirements Iterative development starting with a thread that covers basic capabilities can ground the work and prioritize decisions Need to understand and communicate Architectural trade-offs, cost & technical benefits The nature of the information & kinds of questions that need to be answered drive the architecture, scope and design Use discipline from formal domain analysis and use case development to document and explain requirements identify information sources and models (including modularity) needed limit scope creep Reuse standards and well-tested, available models whenever possible 27 27

28 + What to look for A controlled vocabulary Royal Horticultural Society Color Chart A hierarchical or taxonomic structure Linnaean taxonomy, the Plant List from the Royal Botanic Gardens, Kew, and Missouri Botanical Garden Knowledge supporting structured queries (especially for web site, database, informationoriented applications) Find deciduous trees native to northern California that are drought tolerant, resistant to oak root fungus, and grow no taller than feet Requirements specifications Reference materials that support domain analysis Corporate standards for modeling style as well as content

29 + Using use cases to gather requirements A good summary for every use case should include: A description of the basic business requirement / need the use case is intended to support Primary goals Scope identify any known boundaries as a starting point Pre-conditions and post conditions any assumptions you know about the state of the system /world before and after Actors and interfaces identify primary actors, information sources, interfaces to existing or new systems Triggers what kicks off the use case, any particular series of events, situation, activity, etc., and any that affect the flow Performance requirements including any sizing or timing constraints, ilities, etc.

30 + Use case content Outline the major process steps for both the normal, or primary scenario, and alternative flows, such as if things don t go well Use case and activity diagrams typically done in UML, but could be Visio, PowerPoint, or whatever tools your team is comfortable with Usage scenarios you should have at least two narrative stories that describe how one of the main actors would experience the use case, with the intent of identifying additional requirements Competency questions identify as many of the questions you want the knowledge base / application to answer as possible Resources describe any known contributing resources and repositories, other external systems that participate in the use case to the degree possible 30 30

34 + Capturing definitions Layout a high-level architecture for key ontologies and ontology elements Identify the relationships among elements roles, domain, interface, process, utility Define an approach for gathering content from subject matter experts, possibly based on IDEF5 (Integrated Definition Methods) Ontology Capture Method Analysis, that includes Understanding and documenting source materials An interview template Traceability back to your use cases For each ontology element Describe its domain and scope, how it will be used Identify example questions and anticipated/sample answers for the application(s) it will support Identify key stakeholders, ownership, maintenance, resources for instance knowledge Describe anticipated reuse/evolution path Identify critical standards, resources that it must interoperate with, dependencies Resources

35 + Terminology analysis ISO 704, Principles & Methods for Terminology Work, provides a methodology for describing concepts & terms Uses ISO 1087 for terminology Uses ISO 860 for terminology harmonization (alignment) methods Basis for typical methods used for taxonomy development today Describes how to flesh out definitions Aristotelian genus/differentia structure For classes a <parent class> that, including text that provides content you ll later express as restrictions, other refinement For properties a <parent property> relation [between <domain> and <range>] that SBVR style formal definitions (Semantics of Business Vocabularies and Rules see Recommendations strategies for relating terms to one another using standard vocabulary ISO 1087 great resource for language to describe kinds of relationships, acronyms & other designations, preferred vs. deprecated terms, etc. ISO 860 augments this with recommendations for vocabulary comparison 35 35

36 + Successful strategies Successful ontology/vocabulary development model reflects small development team with broader user Community of Interest / stakeholders, especially for reusable content models Provide readable documentation, even for small communities State maintenance policies clearly Identify versions Publish models for ease of accessibility by COI members Where different stakeholder communities have unique requirements, explicitly specified models can be mapped to one another, translation services developed Common terminology services (CTS2) standard from the OMG for healthcare 36 36

38 + Separation of concerns / modularity Critical dimensions aid in determining module boundaries separate business-related content from technical detail aspects of the business content, such as marketing and branding, from others describing products or manufacturing processes separate disciplines into independent modules Other considerations include separation based on back-end store / source repositories application boundaries, system interfaces distributed resources the need to reason over some parts of the knowledge base but not others to answer sets of critical questions performance requirements, for reasoning, query answering, etc. asserted vs. inferred content

39 + Naming and versioning Naming conventions and versioning policies are critical for every organization Namespace definitions for ontologies are critical, along with policies for their management that are well understood in YYYYMMDD form>/filename.extension is common practice in some organizations, with content negotiation point to the latest version Levels of hierarchy may be added for large organizations or modularized ontologies Namespace prefixes (abbreviations) for individual modules, especially where there are multiple modules, can be important If you post something at a URL, the idea is that it should be permanent, so namespace design and governance is essential 39 39

40 + Best practices in namespace development Availability people should be able to retrieve a description about the resource identified by the URI from the network (albeit internally) Understandability there should be no confusion between identifiers for networked documents and identifiers for other resources URIs should be unambiguous URIs are meant to identify only one of them, so one URI can't stand for both a networked document and a real-world object Separation of concerns in modeling subjects or topics and the objects in the real world they characterize is critical, and has serious implications for designing and reasoning about resources Simplicity short, mnemonic URIs will not break as easily when shared for collaborative purposes, and are typically easier to remember Persistence once a URI has been established to identify a particular resource, it should be stable and persist as long as possible Exclude references to implementation strategies, as technologies change over time (e.g., do not use.php or.asp as part of the URI scheme), and organization lifetime may be significantly shorter than that of the resource Manageability given that URIs are intended to persist, administration issues should be limited to the degree possible Some strategies include inserting the current year or date in the path so that URI schemes can evolve over time without breaking older URIs Create an internal organization responsible for issuing and managing URIs, and corresponding namespace prefixes 40 40

41 + Model element naming For naming entities in an ontology there are some rules of thumb that vary by community of practice data modelers often use underscores at word boundaries, spaces in names which semantic web tools may not handle well (ok in labels) some name properties <domain><predicate><range>, others <predicate><range>, & others <predicate>, but not necessarily consistently semantic web practitioners typically use camel case (upper camel case for classes, lower camel case for properties) using verbs to the degree possible for property naming, without incorporating the domain/range is preferable for very large ontologies, some use unique identifiers to name concepts and properties, with human readable names in labels only depends on tooling that helps people interpret the content The key is to establish & consistently use guidelines tailored to your organization 41 41

42 + Metadata for ontologies and elements Metadata should be standardized at both the model level and the element level for every model (not just ontologies) Model level metadata can reuse properties from the Dublin Core Metadata Terms, from the Simple Knowledge Organization System (SKOS), ISO Metadata Registry standard with ISO 1087 Terminology support, W3C Prov-O vocabulary for provenance and others Most annotations should be optional at the element level, but a minimal set, including names, labels, and formal, text definitions, is important for reusability & collaboration Consistent use of the same annotations (properties, tags) improves readability, facilitates automated documentation generation, and enables better search over ontology repositories Model level metadata may reuse organization-specific taxonomies to enable better search through RDFa tagging, for example Latest version of an OMG architecture board recommended vocabulary for specification metadata, and related annotations from the Financial Industry Business Ontology (FIBO OMG) are available at see AnnotationVocabulary.rdf 42 42

Chapter 2 AN INTRODUCTION TO THE OWL WEB ONTOLOGY LANGUAGE Jeff Heflin Lehigh University Abstract: Key words: 1. INTRODUCTION The OWL Web Ontology Language is an international standard for encoding and

UNIVERSIDAD POLITÉCNICA DE MADRID FACULTAD DE INFORMÁTICA FREE UNIVERSITY OF BOLZANO FACULTY OF COMPUTER SCIENCE EUROPEAN MASTER IN COMPUTATIONAL LOGIC MASTER THESIS Defining a benchmark suite for evaluating

On the Relations between Structural Case-Based Reasoning and Ontology-based Knowledge Management Ralph Bergmann & Martin Schaaf University of Hildesheim Data- and Knowledge Management Group www.dwm.uni-hildesheim.de

How semantic technology can help you do more with production data Doing more with production data EPIM and Digital Energy Journal 2013-04-18 David Price, TopQuadrant London, UK dprice at topquadrant dot

LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model 22 October 2014 Tony Hammond Michele Pasin Background About Macmillan

Fundamentals of Database Systems, 4 th Edition By Ramez Elmasri and Shamkant Navathe Table of Contents A. Short Table of Contents (This Includes part and chapter titles only) PART 1: INTRODUCTION AND CONCEPTUAL

Evaluating Data Warehousing Methodologies: Objectives and Criteria by Dr. James Thomann and David L. Wells With each new technical discipline, Information Technology (IT) practitioners seek guidance for

Definition of the CIDOC Conceptual Reference Model Produced by the ICOM/CIDOC Documentation Standards Group, continued by the CIDOC CRM Special Interest Group Version 4.2.4 January 2008 Editors: Nick Crofts,

Analysis of the Specifics for a Business Rules Engine Based Projects By Dmitri Ilkaev and Dan Meenan Introduction In recent years business rules engines (BRE) have become a key component in almost every

Ontological Modeling: Part 6 Terry Halpin LogicBlox and INTI International University This is the sixth in a series of articles on ontology-based approaches to modeling. The main focus is on popular ontology

REQUIREMENTS By Harold Halbleib Requirements Management Identify, Specify, Track and Control Requirements Using a Standard Process About the author... Harold Halbleib has a degree in Electrical Engineering

A little refresher: What are we modelling? Lecture 9: Requirements Modelling Requirements; Systems; Systems Thinking Role of Modelling in RE Why modelling is important Limitations of modelling Brief overview

business transaction information management What CAM Is The CAM specification provides an open XML based system for using business rules to define, validate and compose specific business documents from

Chapter 7 Ontologies and the Web Ontology Language OWL vocabularies can be defined by RDFS not so much stronger than the ER Model or UML (even weaker: no cardinalities) not only a conceptual model, but

Scalable End-User Access to Big Data http://www.optique-project.eu/ HELLENIC REPUBLIC National and Kapodistrian University of Athens 1 Optique: Improving the competitiveness of European industry For many

Leading the Evolution WHITE PAPER BUSINESS RULES AND GAP ANALYSIS Discovery and management of business rules avoids business disruptions WHITE PAPER BUSINESS RULES AND GAP ANALYSIS Business Situation More

Metadata Quality Control for Content Migration: The Metadata Migration Project at the University of Houston Libraries Andrew Weidner University of Houston, USA ajweidner@uh.edu Annie Wu University of Houston,

Authoring Within a Content Management System The Content Management Story Learning Goals Understand the roots of content management Define the concept of content Describe what a content management system