Σχόλια 0

Το κείμενο του εγγράφου

Ontology andReasoning

in MUMIS

Towards

the Semantic Web

Atanas Kiryakov

ver.1.2

This

document presents a study on the formal representation of the MUMIS ontologythe reasoning components in relation to the Semantic Web. It outlines directions forfurther work to bring the MUMIS results in synch with Semantic Web and develop anontology-aware open hypermedia system on top of it. The later task isdiscussed

in thelight of an existing Semantic Web extension ofa subset of the MUMIS system,allowingautomatic semantic annotation, indexing, and retrieval.

The rest of this section provides quick introduction to the nature of the MUMIS,followed by a basic discussion on the Semantic Web. The next section

provides andoverview of approaches related one way or another to subject for ontology aware multi-lingual, multimedia information extraction. In section three, the knowledgerepresentation currently used in MUMIS is shortly presented and discussed. Next, in thefifth section, some basic semantic extension of GATE are presented, followed bypresentation of a

richersemantic approach in section 6. Finally, the necessaryreengineering of the domain ontology and the lexicon are briefly commented.

Information extraction from English, Dutch, and German (with three different systems)is carried out on textual sources and information extracted from transcribed spokencommentaries from radio and television broadcasts. The three IE systems target a shareddomain and multilingual lexicon of the football domain. As the information is extractedfrom multiple sources describing the same events in various ways, a merging componentis in charge of solving conflicts andfusinginformation. There isauser interface allowingprofessional users to query a database of annotations and play video fragments matchingthe query (e.g., “all goals scored by Owen”).

The textual sources used for this project are taken from reports of the Euro2000Championships: ticker reports that give a minute by minute objective account of thematch; match reports that also give a full account of the match but may be subjective;and comments that give general information such as player profiles. English reports aredrawn from a variety of online media sources (BBC-online, Press Association, TheGuardian, etc.). These sources report the same events in different ways: as an illustrationa source may say “Substitute Westerveld comes on for van der Sar” while another maysay “van der Sar (Westerveld 65)” to refer to a substitution event. The elements to beextracted that are associated with the events are: players, teams, times, scores, andlocations on the pitch. The system extracts the information and produces XML output.The extraction of temporal information is essential tothe

task because it is the key forlocating interesting fragments in the video material.

1.2.

The Semantic Web

TheSemantic Web2

is the abstract representation ofdata

on the World Wide Web, basedon the RDF3

standards and other standards to be defined. It is being developed by the

1

MUMIS is a project within the 5th Framework Programme IST of the European Union

W3C, in collaboration with a large number of researchers and industrial partners. Aspresented in [Berners-Leeet al.

2001],

"The Semantic Web is an extension of the currentweb in which information is given well-defined meaning, better enabling computers andpeople to work in cooperation."

The spirit and the development approach behind the SemanticWeb(SW) require asmuch as possibleformal data/knowledgeto be providedin formats

thatothers

can readand interpret for unforeseen purposes. In other words:



Automatically processablemeta-data;



Presented in astandard form;



Allow flexible anddynamic interpretation for unforeseen purposes.

1.3.

MUMIS and the Semantic Web

Due

to theclear

decoupling of the different analysis phases and components in

MUMIS,its

resultscan be easily alignedwith the latest trends ofSW

with modificationswhich canbe limited toonly a single stage, namely the storage of

the merged event descriptionsandthe domain ontology in a central database with relevant meta-data.

Although it is the casethat the information extraction and merging components can improve performance onthe basis of a better handling of theformal knowledge they use, this

isan optionalpathfor improvement rather than arequirement for SW compatibility.

Thekeypoint is to store the meta-data

(the results, theknowledge

that

havebeenextracted and distilled)

in aSW

compliant format, so that those to beeasily

accessiblethroughtheUI (and other)

tools developed outside MUMIS.

There is a lot of formalknowledge

used for different tasks within MUMIS, most heavilyfor multi-lingual extraction and merging. In the ideal case, MUMIS may have been usingSW standard

knowledge/ontology representation for those tasks. This would makepossible reuse of many existing tools such as editors, reasoners, ontology middleware,etc. In the same ideal case, there would be no need of conversion of the results frominternal format

to suitable SW format. However this

ideal scenario turns to beunrealistic

for anumber of reasons:



At the present stage of the project it is too

late to reorganize the internal KR;



When the project started, the SW

was more a concept than something you canreally use or align to. This

opinion is quite consensual for number of researcherswith good overview of the real state of the field, for instance,[Davies at all,2002],[Ewalt, 2002],[Ossenbruggenet al.

2002];



Evennow, the SemWeb tools are not mature. For instance, there is no singlecomprehensive user-friendly

RDF(S) editor. Also there is no single reasonercovering the full DAML+OIL semantics,but even with various limitations in thecomplexity

theexisting

reasoners do not

scale forreal worldinstance reasoning;

2

As defined by its inventor and authority, the W3C consortium athttp://www.w3.org/2001/sw/

2002], the authors consider each document bearing staticsemantic (the one corresponding to the authors intention and understanding) andmultiple dynamic semantics, determined by the usage patterns and emotions of the usersof the documents. This sub-symbolic view to social semantic is close to the ideas ofcollaborative filtering. Authors’

approach considers latent semantic analysis

(see[Deerwesteret al.

1990])

of short browsing sub-paths (in a web context, of course) forcapturing the dynamic semantics of the documents. This interesting work is in a proof-of-concept stage, partiallydue

to difficulties withgathering

browsing path in thenecessary scale.

Even withthese limitations

it is important with its approach addressingboth dynamic and multimedia Semantic Web.

[Ossenbruggenet al.

2002]provides a broad overview of the relations between thesemantics web and hypermedia. One important issue discussed there is the tradeoffbetween the embedded linking (mostly used in the current web) and the openhypermedia systems, such encoding “virtual” links externally to the documents beinglinked, which is also the MUMIS approach. This quite directly leads also the dynamicaspect of the Semantic Web, already mentioned above–

the embedded

links are static,which is a constraint towards user annotations

and impose serious limits on the linkcomplexity. Luckily, RDF(S), the basic structuring the paradigm for the Semantic Webisan external linking language.

Semantic annotation

of documents with respect to some ontology and a knowledge basewith instances

isdiscussed

in[Carret al.

2001]

and

[Kahanet al.

2001]

–

althoughpresenting interesting and ambitious approaches, they do not concern in particularusageof information extraction for automatic annotation.Semantic annotation is used also inthe S-CREAM project presented in[Handschuh

et al.

2002]

–

the approach there

is useof machine learning techniques for extraction of relations between the entities beingannotated.Similar approach is taken also withinthe MnM project (see[Vargas-Veraet al.

2003]), where the semantic annotationscan be

stored as “virtual” links

(seeabove) to anontology and KB server (WebOnto), which can be accessed via standard API.All thesemantic annotation

techniques

referred abovelack of upper-level ontologies and criticalmass of world knowledge

to serve as atrusted and reusablebasis for the automaticrecognition

and annotation, as in the approach presented in [Bontchevaet al.

2003] anddiscussedbelow.

An overview of the different languages and standards for ontology and knowledgerepresentationwas made in the beginning of the MUMIS project and reported in[Ursuet al.

2000]. This provides a broad comparison of the different XML based approaches.A more visionary overview of the “heavy” ontology languages can be found in [Fensel,2001] which provides the rationales behind OIL together with

its evolution throughDAML+OIL into OWL. Out of those and other publications, it becomes evident thatthere is little consensus on anything behind RDF(S).

Finally,discussing

multimedia on the web, it is mandatory to mention theSynchronizedMultimedia Integration Language (SMIL, see[Hoschka, 1998])

which can be seen as anHTML extension in XML syntax, whichallows integration of

a set of independentmultimedia objects into a synchronized multimedia presentation. Using SMIL, an authorcan (i) describe the

temporalbehaviour

of the presentation, (ii) describe the layout of thepresentation on a screen and (iii) associate hyperlinks with media objects.

The latest twoallow pretty muchwhat can be done via HTML for static objects, say images, butaugmented with further behavioural attributes.

SMIL is notdirectlyto MUMIS, as thelater is morecolncerned withthe analysis of the multimedia content than with itspresentation.

3.

The KR Currently Employed

in the Project

The analysis refers tothe key deliverables on the appropriate issues

with the purpose ofaccounting of what is already in place and better understanding the evolution necessary.

D2.1 "Multilingual Lexicons"

The approach foraligning

to the ontology is straight forward and clear; each lexiconentry is related to an ontology concept. For each concept in the ontology there is a mainterm, i.e. the best candidate out of all the entries related to the concept.

D2.2 "Domain Ontology"

It represents good analysis of the domain, however, formalizedinsemantically poorlanguage

(see[Kokkinakis

et al.

2002]). The XML representation of the ontology has twomain problems:



The XML schema fulfils its restrictive functions, but is missing predictive power.There is no formal semantics defined for XML (Schema), i.e.

nothing to

enableinterpretationof the syntactic

structure. That is the reason

why there are no XML

reasoners.



XML

is not a standard way for representing ontologies (and any

other sort oflogically-formalized

knowledge). This leads toquite direct

disadvantages,suchas

(i) it is impossible to usemost of the publicly available

toolswithin the project

and (ii) it is impossible for other people to make use of MUMIS results withintheir tools and projects.

D6“Merging Component”

D6 is interesting with respect to the use of formal knowledge for consistencychecking

However, section 3.2of the deliverable can be extended further to better justify the usageof

such apowerful

language

andreasoner (knownto have incomplete inference.4)

KR used for Information Extraction

A custom knowledge representation formalism called XI (see [Gaizauskas andHumphreys, 1996]) is used to support the IE work for English (WP2). It is a specifickind

of semantic network (implemented as an

extension of PROLOG) that has much incommon with the so-called description logics (DL). In contrast to a typical DL languageXI does not employ number restriction, but only uses functional attributes5.

XI allowsquite complex instance reasoning. Although this formalism is well suited for co-referenceresolution in Englishit

has some limitations when it comes to capturing the necessary

4

In other words, following model theoretic semantics, the system is not able to syntactically infer all resultsthat are semantically expected.

5

In a way similar to what OWL Lite does, see [Patel-Schneider

et al. 2003]

domain-knowledge. A typed feature-structure knowledge representation is used tosupport IE in German.

4.

Ontology-aware Information Extraction

We will present here a relatively simple and straightforward approach for IE frameworkaligning to the Semantic web. A deeper but also more complex approach is discussed inthe next section.

For the

latest two releases

of GATE (2.0 and 2.1) number ofextensions

were made inorder to make possiblemore

“ontology-aware” language engineering. Here

we

will justsketch few of the issue, which are more extensively presented in [Bontchevaet al.

2003].

First of all, a rather simpleOntology

interface was added to the GATE frameworkwhich allows manipulation of some basic semantic primitives common to RDF(S) andDAML+OIL without getting deep into some arguable features of both of thoselanguages. In essence,theOntology

interface provides support for class hierarchy,relations, domain and range restrictions. There is an implementation of this interfacewhich allows DAML+OIL ontologies to be imported and exported. A base levelOntology Editor is also providedto enable visualization and editing of ontologiesaccessible trough implementations of theOntology

provided with GATE (e.g., countries, cities) to be mapped to their correspondingclass in the user’s ontology (see the figure below). The ontological information assignedbythe OntoGazetteercanbeused by the later NLP modules

either directly or takingbenefit from the changes to the pattern matching engine (JAPE). The later now canconsider the class subsumption (a task “sub-contracted” to the knowledge server thoughthe Ontology API) while evaluating the subsumption of the feature maps of theannotations.

Finally, the class information can be usedduring DAML+OIL export

–

another new feature allowing the annotations to be exported in this format.

(http://www.ontotext.com/kim) is a platform for semantic annotation, indexing,and retrieval. It allows

(semi-)automatic annotation and ontology population for theSemantic Web,using Information Extraction (IE) technology. KIM is based on twomajor platforms; it combines GATE6

and Sesame/OMM7

in orderto bridge the

gapbetween current IE results and the requirements of the Semantic Web.

The key objectives

can be outlined as follows:



To make the formal knowledge IE extracts from the text semantically well-founded. Technically it means creating annotations related to a formal ontologyof classes and instances, expressed in RDF(S) (or compatible language);

6

One of the most mature language engineering platforms, specificallytuned and well-developed forinformation extraction,http://gate.ac.uk

7

An RDF(S) repository allowing storage and retrieval of formal knowledge in a scalable and reliablefashion, see[Broekstra and Kampman, 2001]. OMM (the Ontology Middleware Module) is an extension ofSesame, which provides the multi-protocol access (RMI, SOAP), as well, as tracking of changes in therepository, security and meta-information.

For more information, see [Kiryakov et al. 2002],http://sesame.aidministrator.nl

andhttp://www.ontotext.com/omm. Both Sesame and OMM weredeveloped in the course of the On-To-Knowledge project(http://www.ontoknowledge.org)

To make possible retrieval of text documents based on world knowledge, whichcomprises a information need satisfaction, which is currently provided ininconsistent fashion from three different technologies–

the DBMS, informationretrieval,

and IE. Such example is a query with the following precise definition“give me, ranked by relevance, all documents referring to company involved inan accident in France,which took place in November 2002”;



To provide means for implementation of the Dynamic Semantic Web

–

KIMallows automatic annotation of the content at the server or access time at thereader’s site.

To achieve the above goals,KIM relies on huge instance

data and appropriate lexical(thesauri) information represented in RDF(S). The system is based on upper-levelontology named KIMO having about200 classes (discussed later) covering in asemantically sound fashion the most important entity types and providing ground for (i)expansion to include more complex knowledge like relations, scenarios, events8, (ii)domain or task-specific knowledge and (iii) integration with third party/customerinformation systems.

KIM is extensively presented here as far as itwas driven by objectives quite similar tothose of a further MUMIS development towards the Semantic Web and could serve as atechnological background or useful experience for an alternative system combining andIE platform and Semantic Web backend.

5.1.

Semantic Annotation

The semantic annotations offered by KIM are

quiteclose tothe output ofthe named-entity recognition offered by many existing IE systems. The major difference is thatproper semantic information is being kept for the type of the entity (viaURIto anontology

class) combined withreference tospecific information to a formal meta-dataabout the entity itself, as illustrated at the diagram below.

Although different conventions for encoding of the annotation types are present in theIE systems

those usually lack of proper and consistentknowledgerepresentation, as well,as comprehensive taxonomy.

This is the problem which was targeted and resolved inKIM via

extensionand minor reengineeringof GATE.

8

This feature will be extensively used for the MUMIS implementation.

As presented on the figure, the annotations for the entities has references, namely URIs,to the proper resources in the RDF(S) repository bearing the KIM Ontology, KIMWorld KB, and all the knowledge about additional entities, either imported for a differentformal source, either extracted automatically from the text.

5.2.

KIM Front-ends

The KIM fronts-ends deliver the benefits of KIM to the end user in simple and intuitiveshape. They require zero or minimal installation and make use of the KIM Server, whichco-operates with Sesame and uses our GATE-based IE tools to process the documents.

Those tools demonstrate how once having the documents semantically annotated (whichcould be just a change in the output format of the IE involved in MUMIS) general-purpose visualization, navigation, and queering

tools could be used in addition to thespecialized UI components.

5.2.1.

Highlight and Explore

Entities

KIM Plug-in

for Internet Explorer canhighlight

the entities in the currently loaded webpage, in colours corresponding to their classes. Hyperlinks are putat the annotations,which pop-up theKIM Explorer.The lateris a straightforwardmeta-data9

browser,allowing the user to surf over the knowledge about the entity, following its RDF(S)representation

with a few readability abstractions.

9

It can be easily presented as ontology, knowledge, semantic browsing tool.

Technically, the plug-in sends the page content to a KIM Server which processes it andreturns the annotations to be displayed. This way the plug-in is a quite tiny client module,with minimal requirements towards the client application and easy installation. Since allthe real processing is done on the server, upgrades and reinstallations at the client site arenot necessary, while the system can still evolve on the server.

For each entity the explorer presents (i) the most specific classes it belongs to (in the caseabove City), (ii) its properties and relations to other entities, and finally (iii) the entitiesrelated to it. All the other entities are hyperlinked, so, they can be explored further. Theabstractions over the “native” RDF(S) representation include:



the resources are presented with their labels, rather than with the URIs



number of “auxiliary” properties are filtered out.

Let us remember, that the KIM Explorer pane pops up when the hyperlinks of theentities annotated in the KIM Plug-in are followed. This provides smooth transitionfrom the text to the formal knowledge available.

The future plans for development of theexplorer include also showing documents, where the entity is referred.

5.2.2.

KIM Semantic Query

KIM Semantic Query allows queries for entities according to arbitrary patterns over theexisting “world knowledge”. Such an example could be the query

Give me all companies X, which name contains “Bahn”,involved in accidents in Europe in the period 5-10.11.2002

The user interface is put in the form of Dynamic HTML page as on the snapshot below

The Query Restrictions

The interface concept considers patterns involving up to three10

entities referred with thevariables X, Y, and Z. The userchooses

the classes to which the entities belong from thecombo-boxes, which present the valid part of the class hierarchy. The name of each ofthe entities can be given (partially or exactly) or left unspecified.

Further the entities in the pattern can be connected via relations corresponding to theirtypes, offered in

the corresponding combo-box. On the other hand the classes of theentities also depend on the possible values of the previously selected properties. Forinstance, when the users selects the class for X to be Company, then in the combooffering relations X

to Y relation, only the relations applicable for Companies (and theirsuper types) are offered. Next, when the relation between X and Y is selected, the classesoffered for Y are only those which are valid participants in the X-to-Y relation. In thecaseabove the Companies can be involved in any sort of Happenings, includingAccidents. The last relation can be either relation between X and Z either between Yand Z. All those dependencies are taken from the domain and range restrictions on thepropertiesin the KIMO ontology.

The interface also allows number of attribute restrictions to be given

(see the next sub-section for discussion about attributes). Before starting the search, the user can specifywhich of the entities in the pattern are of interest for him, so only they appear in the

10

The number three here is chosen as balance between power and complexity, it can be easily increased.

result. In the above example the user is interested in both the Companies (X) and thecorresponding Locations11

(Z) of the accidents.

5.3.

Relations vs. Attributes in RDF(S)

Here we present a short discussion on one of the often criticized aspects of RDF(S)which has some importance for both KIM and MUMIS.Within RDF(S) there is a singlenotion for Property defined in[Lassila and Swick, 1999]as follows:

Aproperty

is a specific aspect, characteristic, attribute, or relation used to describe a resource.Each property has a specific meaning, defines its permitted values, the types of resources it candescribe, and its relationship with other properties

...

In contrast to this broad notion gathering in a single class all sorts ofbinary predicates,there are many other paradigms distinguishing at least the following two sorts:



Attribute

–

a characteristic of an object or entity which is in a sense asymmetric,related much more to the entity at the first place of the relation than to any otherentity. An easy formal definition of attribute would be “a property with literalvalues”–

this is the notion used in the KIM Semantic Query above. Formally, inRDF(S) those are properties withrdfs:range

defined asrdfs:Literal.

Within OWL

(see[Deanet al.

2002]), the

attributes are distinguished as

datatypeproperties;



Relations–

binary predicates relating two objects/entities. Those aredistinguished in OWL as object properties.

As far as the above distinction is well recognizedin the

community and supported in the higherlevel ontology standard OWL, we have nodoubts maintain it in KIM.

This distinction isalso important within the MUMIS domainmodel, as we will see later on.

5.4.

KIMO Ontology

KIMO covers the most general200 classes

ofentities and40 relations, with the followingobjectives:

•

basic level of intelligence/recognitionpower for general text analysis;

•

best performance forbusiness

andpoliticalnews;

•

to provide well-structured base forextension with domain-

and application-specific resources.

11

Some properties in the ontology are subject of special handling in the queries. For instance, the “tookplace in” relation is transitive with respect to the location inclusion. This

means that if something tookplace in Paris, it is also considered that it had taken place in France and even in Europe. So, in the abovequery will return accidents which took place in any location which is a part of Europe. However, in theresult the specific location will be provided.

The “true” ontology is consists of the classes under thekimo:Entity

class and all thesemantics related to their descriptions and relations. It can be considered as a quitetypical upper-level ontology which is trying to combine:

•

Some well-known (say, since Aristotle) philosophical distinctions;

•

The experience from number of existing upper-level ontologies, such as UpperCyc12

and DOLCHE

(see[Masolo

et al. 2002]);

•

The experience from lexical knowledge bases, such as, WordNet and EuroWordnet, including the top ontology of the later one, and “ontological”refinements on the former one such as the OntoClean project

(see[Oltramari etal. 2002]).

Those were combined in a pragmatic fashion, sacrificing distinctions which seamirrelevant for IE applications for the sake of simplification and in order to avoid theinvolvement of “expensive” semantic primitives and axioms.

Thus finally, the top-leveldistinctions are:

•

kimo:Object–

entities for which it could be said that they exist. Objects canplay some role in someHappenings. Objects could be material (as the EifelTower or the body of Lenin) or immaterial (say,a electrical current between twopoints). One of their important characteristics is that those can occupy someregion in the space.

•

kimo:Happening

–

entities for which it could be said that theyhappens. It canbe either dynamic as "drawing a circle" or static as "being a president". In all thecases, the events has some location in the time, in the simplest case start and endpoints.

•

kimo:Abstract

–

entities

which neither happens neither exists, e.g. Currency,

aTheorem or a sort of Sport.

5.5.

KIM World Knowledge Base

The KIM World

KBwas builtwithgoal of almost-exhaustive

coverage of the mostimportant entities in the world, their names, relations, and properties.

The World KB is used in KIM in a fashion pretty similar to the way gazetteers are usedin the classical IE systems.For each of the entities number of aliases are maintained withthe corresponding information about them, for instance characteristics such as“language”. “short/long”, “official”, “old”, etc.

It is not a surprise that such an extensive gazetteer–like information boosts the recall ofthe named-entity recognition phase, butif remain unhandled brings levels of ambiguitywhich can lower the precision down to quite unacceptable levels. To solve this problem,

12

Seehttp://www.cyc.com/cyc-2-1/cover.html

and the new and extended version published as a part ofthe OpenCyc project,http://www.opencyc.org

KIM employees a Hidden Markov Model learner, which once trained over manuallyannotated corpus13

5.6.

Lexical Resources in KIM

The lexical resources in KIM are stored and maintained as a part of the RDF(S)repository. There is a separate branch in the KIMO ontology underneath thekimo:LexicalResource

class dedicated to lexica of different sorts. This is the KIMapproach

of presenting any sort of information usually stored gazetteer lists or lexicons.For each lexical resource,the following properties are relevant:



rdf:label

–

property is expected to bear thecharacterstring, i.e. the actualphonology or surface realization transcripted in Unicode;



kimo:language

–

the natural language for which this is a valid lexical entity;



kimo:status

–

the universal holder of any meta-information related to thespecific resource.

Number of specific classes of lexica are specified in present taking the best experiencesfrom number of GATE applications, particularly ANNIE and MUSE.Such sub-class forinstance isOrgLexica, having on its own sub-classesOrgBase,OrgKey,OrgPre,andOrgSpur. The properties listed above can easily be extended withnew ones relevanteither for all sorts of lexical resources either for specific sub-classes.

5.7.

Entity Aliases

There is one sub-class ofkimo:LexicalResource

which deserves a closer look–

kimo:Alias.The instances of this class are special with the factthat

they representnames or aliases of some named entities. The entities are linked to their aliases viakimo:hasAliasproperty, which is a one-to-many relationship. In cases when twoentities share one and the same alias (for instance the country Brazil andits capital)–

those are kept as separate lexical resource, although having one and the same phonology.kimo:hasAlias

has an important sub-propertykimo:hasMainAlias, denoting to themost important alias of the entity, the one used by default when the entity should bereferred in generated text or in user interface. Each entity is expected to have a singlemain alias.

Here follows a diagram presenting a snapshot of a KIM repository, what can be seen isan entity with its aliases.

A company with one of its aliases in English is given. Todemonstrate the commonalities and the differences with the representation of the rest ofthe lexical resources, one of the so-called OrgBases is shown–

those are just tokensbeing used to recognize unknown organizations, i.e. such for which no alias can bematched.

13

The learner delivers acceptable results even when trained on corpus as small as 30 documents.

6.

Adapting the MUMIS Ontology and Lexicons

As already mentioned above, MUMIS can be easily put in synch with the Semantic Webby means of refactoring the ontology and conversion of the

central database with theevent descriptions without major changes to the existing components.With the KIM-based approach proposed here more ambitious target is followed–

to let the IE andmerging components benefit from richer world and domain knowledge to achievehigher

performance.

Although most of the classes in the current existing ontology will maintain there place inthe new taxonomy, the upper level will have to be reconstructed.The definition ofEntity

currently is mixing both abstract entities and objects–

this is a problem becausethere is no proper level for encoding ofcommon

sense knowledge relevant to theobjects, like for instance alocatedIn

relation to aLocation, which is no appropriatefor abstract entities.

It can also be noticedthat

some useful classes are missing, such as, for instance,the

classAgent

to be used as a common super-class of bothPerson

andOrganization

–

this isimportant from information extraction point of view, because there are many linguisticpatterns such as “XXX offered …” where it is obvious that XXX is a sort of agent, butimpossible to classify it further. So, in case of missing common supper class for all sortsof agents, either, no annotation should be assigned, either, two ambiguous ones shouldbeplaced.

Apart from the changes to upper-level, following the mechanism demonstratedin theprevious section, themultilingual lexicon

can be kept together with the ontology and theworld knowledge base, thus allowing for better consistency and all-in-one viewers andeditors.

KIM Ontology & Lexica

Company

Company.1

type

Company.1.1

hasAlias

label

Alias

LexicalResource

English

type

language

“XYZ Corporation”

subClassOf

OrgBase

OrgBase.1

type

label

“Committee”

6.1.

Extending the KIM World KB with MUMIS specific knowledge

The MUMIS case-study with the Euro2000 Championships is quite a good example for atask and domain where fairly limitedvolume of information needs

to be handled. It is thecase that all the information about the teams, players, coaches, matches, and locationscan be easily entered and structured in an RDF(S) repository, thus enabling high-qualityrecognition and indexing

on one hand and more advanced access to the informationabouttheentities

and the documents referring them.

7.

Conclusion

This

study on the extension of knowledge representation, reasoning and ontologies usedin the MUMIS towards the Semantic Web providesinteresting ideas

for

further

development. Itoutlined

how

in the case of proper decoupling and design, themultimedia, semantic and natural language aspects can benefit from each other withoutbeing bound to specific technologies or solutions.Semantic Web knowledgerepresentation standards and technologies can be used for representation of the ontologyand the central event database without need of major changes in themultimedia (A/V)processing and the Information Extraction modules.

The representation of the MUMIS ontology in RDF(S), based on a well defined upper-level ontology can provide easy transition to a Semantic Web Information Extractionplatform, facilitating better dissemination of the results, more efficient informationextraction, and usage ofaricher knowledge engineering infrastructure.