The Semantic Desktop needs a well-thought use of ontologies and ontology
languages. Existing ontology languages like RDF/S, OWL, SKOS and Topic Maps are
very well suited for certain application areas, but do not fulfill some of the
requirements given on the Semantic Desktop. In this report, a new ontology
language, extending RDF/S, is proposed, the PIMO ontology language, which addresses the
requirements of the Semantic Desktop and uses existing solutions as an
inspiration to build a suitable solution. The language contains a core upper
ontology, defining basic classes for things, concepts, resources, persons, etc.
and also stops at these basic entities. Extending the ontology definitions of
classes and relations is possible by PIMO-domain ontologies. The core
application area of the PIMO-language is to allow individual persons to express
their own mental models in a structured way, the different mental models can
then be integrated based on matching algorithms or on domain ontologies. Based
on the core upper ontology elements, each user can extend his personal mental
model in an open manner.

Accompanying to the description of the ontology is a RDF/S version of the
ontology language, created using the popular Protégé tool. An example of the
mental model of a user is given, the fictional user "Paul" is further described.
With the gnowsis-beta open source software, an implementation based on the PIMO
language exists, that allows validating ontology files. A web-service for that
will be provided soon. The ontology language, this document, the open-source
reference implementation and the example documents can provide a stable basis
for discussions on this topic and allow you to extend your own work.

Purpose and target audience

This report contains a description of the problems faced with personal
ontologies, gathers requirements and shows a possible ontology language for
personal ontologies. Example data and a user interface mockup give more
practical hints. Common problems in the field are not ignored but addressed and possible
solutions suggested. The target
audience are ontology authors or application developers that are interested in
creating ontologies for the purpose of personal information management, further
researchers from the Semantic Desktop field will find a summary about the
practical and theoretical background of our work. The
gnowsis developers at DFKI are a good example of such a group.

As technical architecture we will reference the Semantic Web, so you should
be familiar with the RDF(S), and OWL standards and know how to build
ontologies using OWL or RDF(S). We will also reference some semantic and
philosophical ideas from Topic Maps, to illustrate the problems and
solutions from a different angle. The reader is expected to have a level of
familiarity with
RDF(S) that at least corresponds to the tutorial material in [RDF-PRIMER],
[RDFS], [OWL]. Knowledge about Topic Maps
would be very helpful to understand our ideas [TM-empolis]. For a background on
the technical aspects of the Semantic Desktop, you may refer to [Sauermann2003].
We will also refer to philosophical findings, giving an introduction to these in
this document.

Traditionally, Personal Information Management(PIM) is about managing
calendars, contacts, and e-mails. If we make a reality check, every piece of information a person encounters has to be managed.
Hence PIM in a digital world is about everything accessed by a person through
his or her computer, not just calendars, contacts, and e-mails.

Googling for PIM, we find products like the
KDE-PIM, solving similar problems like Microsoft Outlook, synchronisation
tools like
amphire or intellisync, and niché products like the text processor
treepad. On wikipedia, the
personal
information manager roughly covers what can be categorised as "Microsoft
Outlook lookalike". To manage information as such, ideas and thoughts, we
find products that allow us to express our ideas using bubbles and arrows,
mind-manager or mind-mapping tools. They might even be standardized like topic
maps [XTM] or use the Resource Description Framework [RDF-PRIMER]
like mindraider does. There are
integrated research prototypes like
haystack which also aim at PIM in a very elegant way.

In 2003, Leo Sauermann started the gnowsis project, which aims at building a
so called "Semantic Desktop", a platform that extends the operating system to
allow an enhanced PIM, based on Semantic Web technology.

Authoring and using ontologies for personal information management
is not enough, in the gnowsis project we aimed at bringing the ontology into the
daily applications of the end user. Here is a short description of the main
ideas behind gnowsis.

The Semantic Web is a way of enriching the existing WWW with data.
This technology may also be used on a personal computer. Similar to the Web, a
server application publishes the available data. This local Semantic Web Server
can be used to access the information stored on the PC. Files and other
resources can be used like Web resources. They are identified by their URI and
links can be created from one document to another. The existing data is
available in the RDF syntax and can be used by any application. Interaction
between the applications is simplified by these standards.
By taking the Semantic Web technologies and using them on a personal computer, a
semantic layer in system architecture is created. Parallel to the existing
access to resources using already installed applications, they can be accessed
through the Semantic Web server. The resources are identified independent of
their file format. Based on this layer, a system-wide classification system can
be created. Concepts like private or business are represented once in the
system and can be reused in all applications. The class my XML Lecture is
represented similar to a group in a web directory (like yahoo), the class can
contain files, emails, webpages, etc. All resources can be identified by their
URI and can be opened using a browser like program. A crawler similar to google
can be built on top of such a system to create a full text index of all existing
text data.
The links between local resources are bidirectional, this is an extension of the
common WWW principle of unidirectional links. A relation between two resources
can be entered at one place into the system and can be found from all connected
resources.
As shown by these examples, bringing web technologies to a personal computer can
be used in many ways. Accessing the data on the computer through the interface
of a Semantic Web server creates a new perspective of existing data, they can be
addressed by URIs and organized in web directories etc.

Existing applications are not replaced but instead enhanced with
semantic features. Users can create an ontology and use it from within their normal
e-mail client or text processor. Software plugins to these applications can
connect to it and relate the current file to another photo using semantic
web technology.

Using the standard RDF ontology languages, we came to a dead end when faced
with complex problems in the field of PIM. The RDF standard and especially OWL and RDF(S)
as ontology languages are not enough for personal information management. During
the last years of implementation and testing, these flaws in existing
approaches appeared:

We employed a mixture of RDF/S and OWL ontologies on the semantic
desktop, leading to confusion. For an integrated project, the ontology basis
should be the same. But public best-practice (like in foaf) is to mix RDF/S
and OWL elements.

the distinction between a property that just describes a resource and a
property that draws relations to other resources is not clear. ObjectProperties
of OWL give a syntactical mean how to write this down, but the semantics are not
clear.

the next problem that often occurs is when users are beginning to work
with Protégé and OWL ontologies, they sometimes come to a point where a tertiary
connection is needed (or, in database speak: a N-N relationship with additional properties)
and then have a conceptual problem and either use RDF-reification to annotate
relations or give up doing so with reification (its painful to implement,
cannot be described in ontologies, etc).

The representation of concepts is not clearly stated. How do we
represent "love". Or,
simpler, the "w3c homepage at www.w3.org" versus the document found
at "http://www.w3.org/". Ontology
engineers using RDF/S and OWL are not trained to separate between concepts
and resources, people from the Topic Map or SKOS community are.

What does a URI identify? What is the
right URI to identify the concept of "Love", and then again - as every person
sees the world in his/her own way, how to find this URI? Many different
approaches where suggested how to identify concepts using URIs, but no
solution can be recommended as best practice.

The goal at hand is now to create a simple, minimal ontology language that
fulfills our requirements for the Semantic Desktop and combines the best
existing solutions while avoiding the usual pitfalls. Luckily, others have already invested much time into creating ontologies for
different application areas, and we learned much from their efforts. These
findings can now be used by us to transfer best practice into the field of PIM.
We found these existing approaches very useful as input for this document and
will discuss them:

First and foremost, the Enquire system by Tim Berners-Lee. This
system is, from the semantic ideas and architecture, quite close to PIMO. Some
parts-of his ontology are resembled in our PIMO-language. Enquire led to the
WWW, led to the Semantic Web. One of our goals is to bring Enquire back to the
Semantic Web. Luckily, the manual of Enquire is online: [TimblEnquire].

The OWL ontology language. It features very precise properties like inverseFunctionalProperties or inverseProperties [OWL]

A paper by Jean Rhomer on 10 years of practical experience in PIM [Rhomer2005]
which claims that inverseProperties and other things are a prerequisite for
such systems.

A paper by Huiyong Xiao and Isabel F. Cruz describing a
Multi-Ontology Approach for Personal Information Management [Xiao2005].
They suggest to use different ontologies for resources and concepts and then
map between them.

The SKOS ontology language and the ongoing SKOS effort. It shows how
concepts can be modeled, starting in the thesauri and taxonomy domain but
extensible for other domains [SKOS-core,
SKOS-Effort].

The Topic Map standardisation effort and its end-products. This standard
offers - philosophically seen - much more ways to express mental models than
OWL or SKOS allow. Merging ontologies, representing documents seperated from
the ontology and identification issues are well defined here. The usability of
topic maps was proven in many products that are sold on a growing market [XTM,
TM-empolis, TAO,
TM-survey]

public available thesauri, aiming at representing the world in a
semantic way. One is the wordnet
project, mapped to RDF by Dan Brickely. Semantic explanations to most
day-to-day things are available here (example:
car). [Wordnet],
[Wordnet-RDF]

public available social information tools like wikipedia. Here we
also find nice explanations of the term
car. With the advantage,
that everybody can extend it.

The whole field of linguistics, cognitive science and philosophy. Sadly,
the findings of these important knowledge bases are not used in many Semantic
Web projects today and we have to look at ideas from them to fully understand
what problem we really want to solve with a PIM ontology. Gestalttheorie, a
constructivistic view of the world and other secrets like the relation between
things and their representation are such findings.

These existing approaches can help us very much to find an ontology for PIM
that meets our requirements. This report will start with listing the
requirements we have found for ontologies in PIM, listing features, our
experiences from the gnowsis project, our findings and solutions for
identification of things. Then we will focus on the background from the
information sciences and have a look at mapping and merging of ontologies. The
requirements we have found are then mapped to the existing ontology languages we
focus on (RDF-S, SKOS, XTM). Based on all these requirements we design our
solution for a PIM ontology. The solution will consist of an ontology language,
an upper ontology defining some basic classes and instances and one example of
how one person can use our solution. We will also show how a user interface
might look like to edit the ontology and propose ways how to integrate the
ontology to applications using plugins.

Finally, we face ourselves with frequently asked questions
that will typically come up in your mind after reading this report. We also try
to answer these questions.

As we cross different ontology languages (RDF, SKOS and Topic Maps) and also
reference to cognitive science, reading this document is not so simple if you do
not think in the same language as this document was written in.

thing

The top of our ontology, everything is a thing. Abstract concepts,
physical objects, people are things.

resource

A web page, a file on your hard-disk, any way information can manifest
inside a computer system. Resources are beyond normal files, in the RDF world,
a resource can also be an address book item, an entry in a database or an
appointment in your calendar. In Topic Map terms, this
is an occurrence.

concept

A thing that represent the mental conceptualization of a user inside his
information model, stored on a computer. The mental conceptualization is part
of the cognitive system of the person, the concept in the information model is
the point where such a mental conceptualization can be represented, identified
and visualized on a computer system. Note that they are subjective to the
person, as each human being has his own cognition, the concepts in the
computer are tightly coupled to that. The term grounding can be used here: the
mental conceptualization is grounded in the concept. (for grounding-see also [Haikonen2003]).
In Topic Map terms, this is a topic.

For the data handling tasks like calendars, address book, process support,
and e-mails we will not include a solution in our ontology. As
these documents are already well handled by existing applications, we will only
represent them on a very basic way. To be clear on this point: the suggested
ontology of this report does not provide the perfect Microsoft Outlook data
model, please use the software of your choice and their data model.
Concentration is on the things that are badly handled by current PIM
solutions: tagging and all the higher ontological features like inference and
relations.

Beyond the normal look at PIM, which is address book, calendar, todo lists
and e-mails, we want to help people use
computers to organize their information life. People organize their files in a very
personal way, and often they employ a categorization scheme for their file
folders or bookmark folders. The way of classifying files has
grown over years and cannot be reproduced by others - the same extent to which
people are different, their filesystems are different. On MS-Windows filesystems (without soft links) each file
remains in exactly one folder and folder hierarchies cannot loop. This
classification system only allows the creation of taxonomies, where one file can
be placed in one class. So typically, the folders have a partitive semantic:
subfolders are part-of the super-folder and cannot be part-of any other folder.

But these schemes only apply to file systems, with the semantic features of
RDF we can go beyond these basic features. Based on the previous work mentioned
above
and other existing ontologies [SKOS-Effort,
XTM], we
assume that people will need the following features on a Semantic
Desktop:

a representation of abstract concepts: Love, Rome, Acme Inc.

a representation of concrete, addressable resources: "w3c homepage at
www.w3.org"

a representation of documents: "the document at http://www.w3.org/"

multiple names for a thing: "Love", "Liebe",
"W3", "WWW"

same name for two different things: "Apache - helicopter",
"Apache - software".

class-subclass relations: a subclass has all properties of the superclass + its own

a representation of time: the document was created in 2005. The
project started on 1.1.2006

If you are familiar with Topic Maps, you will find the basic concepts of them
in above list, if you are yet not familiar with Topic Maps, we encourage you to
read the Topic Maps Handbook by Holger Rath, which is a very well written
and colorful introduction to XTM [TM-empolis].

For a detailed look about part-of and has-part relations, have
a look at the OWL W3C draft about that [OWL-partof].

allow the user to annotate any concept using free text and
also data properties

make the use as simple as a wiki and a blog combined

allow the linking of any concept to any other concept

populate the ontology automatically by analyzing the user's
data

relate existing file folders and email folders to
concepts

synchronize folder structures with ontology structures

import external ontologies by copying, connect the copied
concepts to the originals.

try to be compatible with the beauty and buzz of the FOAF-community

Based on our previous workshops[1] on the topic, and the discussions
there with gnowsis end users and developers, we found that several
features in personal information management are currently only available on
the web and now move to the desktop. Tagging items using keywords is one of those features.
It was requested again and again in end-user interviews and looking at related
desktop and web products. A similar feature that is also relevant is the
integration with existing folder and file hierarchies. Existing file folder
hierarchies can be used to populate the ontology, these folders say much about
the way people organize their mental models. So some parts of a PIMO can be
populated automatically by looking at existing file folders, and the file
folders are then linked to the created part in the ontology. Similar, structures
found in email folders or in flickr
tags can be used to generate. If the user has a flickr account or a weblog,
the categories and tags can be analysed and linked into the PIMO.

The personal file-system of the user is a key point on the Semantic Desktop.
There is much semantic to be found in folder structures: "I put project related files in the project folder. Inside the project
folder, there is a sub-folder for each project". Here we find first the
class "Project"
and then instances of this class. Another semantic relation may be that of time: "All
files created in the year 2005 are in the 2005 folder. Inside this folder, I
have sub-folders for each month. The folder January is then a part-of 2005".
A partition based on topics is possible: "I have a folder Topics
where I put documents by topic inside the folder. I have the topics NewProducts,
CRM, Sales, Accounting, Paragliding". Each file in the folder CRM would then
have the topic CRM (customer relationship management) .

These implicit semantic relations hidden in the file-system, only
known by the user can be expressed explicitly. Files inside a folder
called "Marketing" are automatically assigned the semantics of "files
having topic Marketing". At
DFKI KM Lab we also evaluated
different methods how to use this information for natural language processing
and text indexing, connecting text documents to ontologies opens many doors. The
creation of subclass or instance relationships can also be automated based on
folder structures. Assume the user has a folder /myFiles/Topics/
containing many subfolders and each subfolder represents exactly one topic,
instances of the class "Topic" can then be created automatically from
these folders.

So called Active Folders are a possible name for this. Users can write rules that say what is part of a folder and what not.
We see these in popular like iTunes, where you can create smart
playlists2. In the upcoming windows file
system winfs, we find a similar feature. From the scientific view, smart or
active folders have been discussed starting with semantic filesystems. [SemFs].
We will name these features Active Folders in the discussion.

To define and implement Active Folders, part of the PIMO has
to include classes and properties. An API-focused part of the ontology will deal
with these.

A very important need that has risen over the years is the
separation between resources and mental concepts. Right from the beginning of
the project, also in discussions with others, we had a strong feeling towards
the separation of resources (be they files, e-mails, web-pages) and concepts in the mental model.
Although files, web-sites, e-mails are very important, they are just artifacts
that contain and reference our mental models. This can also be seen in the Topic
Maps standard, where there are two manifestations of documents. Using the
language of this standard, it is possible to say:

First the web site "http://www.dfki.de/"
is a Topic of interest I work on. I represent it using the topic LeoTopics#www.dfki.de.
I can identify it using the field subjectIdentity
and resourceRef, an addressable subject. Second, the topic of
the company DFKI can be modeled using a topic LeoTopics#DFKI. As the dfki website (as a document)
mentions the topic DFKI, I can connect it via an occurrence.

You can find a detailed description of this topic in the Q&A
section here.

From practical experience we have
noticed that many topics are created with mistakes. At some point, the user may notice that "Rome" is not only
a Topic, but instead a city. Therefore, the class of a thing has to be
changeable. Also, the user may accidentally create two concepts that mean the
same. For example for "Rome" and "Roma". Then, it has to be possible to merge
different things, and to delete them. A need in all of these is checking the user's actions for validity. If "Rome"
has a property that only a "City" can have, then the type cannot be changed or
the property is removed during the process. This checking should be rule-based,
obviously.

When people start to use Semantic Web technology to model
ontologies, at a certain point they all hit a barrier, appearing concealed in question
like:

Which URI do I give to the concept "Love"?

The true ontologist wants an answer that fits "the world", meaning
that many other people will also agree to use the same URI to identify Love.
That is the spirita of shared understanding and that's also where the problems start. The question is: what URIs are good to identify things, documents,
concepts, persons, places, etc. On top of that, are these URIs useful in a
distributed scenario or are they only useful on a single computer. Do they resemble a public agreement, are they standardized?

A PIMO system has to cope with something known as the "web's
identity crisis" or "uri crisis". The problem is, that
we use URIs to identify all of the above, we have URIs for documents, URIs for
topics, URIs for everything. But what does a URI identify? Tim Berners-Lee wrote
an interesting article to clear this problem, here is an exercise [TimblUri2002]:

So basically, it is not so easy. Even if you think that you found a
URI that identifies something, it may identify too much. That's why the
article didn't solve the problem and people discussed in mailinglists forever,
creating the impression that we are all stuck in an infinite loop of arguments.
People stated that their URI scheme is better than yours. So
Tim Berners-Lee wrote another article in 2005 about the same problem [TimblUri2005].
The proposed solution from the technical architecture group (TAG) is basically
this (abbreviated):

The W3C Technical Architecture group eventually decided to resolve the
architectural problem that if an HTTP response code of 200 (a successful
retreival) was given, that indicated that the URI indeed was for an
information resource, but with no such response, or with a different code, no
such assumption could be made. This compromise resolved the issue, leaving a
consistent architecture.

This may serve well from the technical side, but the semantic
problem remains:

Where do we get URIs that identify topics/things/concepts?

Where is the URI for LOVE?

Shouldn't we place an HTML website at this URI so that people can
click on it? (which the TAG solution seems to discourage).

Over time, others have also shone more light on it and proposed
different solutions for this problem. David Booth wrote a very constructive
article which can be found here [Booth2003]. He
looks at the problem and then comes to the conclusion that there are
many options and it cannot be determined which is "perfect". Most important is
that we all use the same standard and things are compatible.

Steve Pepper has written an article on how to cure the web's identity crisis
[Pepper2004].
The argumentation there is sounds correct. He proposes to solve the distinction of Resource VS Concept
by using Topic Map syntax. Topic Maps do determine between
addressablesubjects and
non-addressable subjects. The use of resourceRef or subjectIndicatorRef state a
clear message about the topic in question: it is addressable (its a document) or
it is not addressable (it is a concept) and in the Topic Map Community, usually
one URI is not used in both uses. Also, the Topic Map community treats all
concepts in the context of a Topic Map
- every resource (be it addressable or not) is first re-instantiated in a local
Topic Map using a local identifier (usually something like <topic id="opera">).

Pepper suggests to expand the use of rdf:about with rdf:subject and
rdf:indicator to extend the identification. But Pepper didn't take the last needed
step: to create a new
view on RDF where each Resource is re-instantiated in the current ontology. He
does not propose to create topics versus resources. We more tend towards SKOS: there, each Topic is called a Concept and is
identified uniquely inside the SKOS ConceptScheme, using the URI of the concept
scheme [SKOS-Effort].

Summing the identification problem up:

To identify concepts, resources, people, books, ... we need to agree on
a URI to identify it. There may be multiple URIs possible to identify
something.

For every URI we use, there are multiple ideas the URI identifies.

All these issues of identification can
be considered harmful to any approach taken, and our solution should build on
the existing work of our colleagues, not to make the same mistakes again. Our
proposed solution is closely bound to the approach taken in Topic-Maps and can
be found below.

Seen through the lens of a constructivistic worldview, we assume that people abstract from things percepted
in the real world to a level of internal representation. A real world object is
perceived as sensory impression and then triggers an internal representation,
which might resolve to a mental concept. For example, the perception of the
color red in the real world leads to a signal in the eyes which every
human eye decodes the same. But each person may associate it a different
mental concept, which may be the same or not. At this point, perception
determines how the signal is matched against previous signals and other mental
concepts. At least communication of the
mental concept can happen via language - then the perception of the red lightwave is written or said as being "a red light", "rouge" or "ein
rotes Licht".

So we differ, from the
philosophical view between the following manifestations:

real world things (red light waves, a car, the chair I sit on right
now)

perception of real world things

abstract topics (time, "love", organisations) that people can
perceive and express

mental representation of things and topics

language we use to express our mental representation

documents containing this message

to come: semantic documents that contain the message better!

This model of information representation is relevant for the
Semantic Desktop for two reasons

each manifestation from above should be considered in the ontology

when two people communicate, a concept switches manifestations and
hopefully less information is lost when we do this right

In communication, facts about real world things or topics are
expressed using a language. In the classical model by Shannon, the sender has an
idea in his mind and expresses it using language, this results in a message. The
message is sent to the receiver who reads the message, triggering again mental
representations of the concept.

In semantic web communication, the communication process is
different. The message is much more precise than normal verbal or textual
communication. Single words from the text can be annotated using the relating
concepts from the sender's ontology. The annotation of certain words (or the whole
message) can be sent together with the message to the receiver. As an example,
Paul sends an email to Peter that mentions Paul's concept of Rome in the text.
In the packaged metadata sent within the email, Paul's concept of Rome is
described using Paul's way of thinking. paul:Rome is linked to a public
identifier from a website like wikipedia or wordnet.

Thus public thesauri like wikipedia or wordnet can play a role in
this scenario as representing the "Plationian Representation", namely a
public identifier that can be used to identify the concepts Bob and Alice talk
about. We assume that the cultural background of Bob and Alice is overlapping
enough that both recognize Wikipedia and can relate to the concepts described
there. But this requires that both users have some personal ontology at hand. We
hope that this way of communication is superior to normal e-mails today.

From the examples raised in the philosophical requirements, we see the need
of communication and merging of different PIMOs.

If I want to represent IMDB movies in my ontology, I would have to include a
movie ontology and the IMDB website somehow. The external movie ontology defines its own public identifieres, etc. When
representing the movies in my PIM, they should be linked to the IMDB website to
allow identification. When merging others and communicating, how to do this?

importing other RDF(S) ontologies and use their properties and
classes in some way

Why aren't the existing ontologies enough?

a new ontology language that can be used to express personal mental
models. This will be called PIMO-language

An initial fill of the PIMO-language with a few basic concepts that
many users will need and also programmers will like. It serves as "upper ontology" and contains people, places,
topics, documents, time.

an example ontology of a person called Paul.

instructions how to implement software with the ontology

user interface mockups that show how to interact with it

Namespace and Ontology language

The namespace of the PIMO-language is
http://ontologies.opendfki.de/repos/ontologies/pim/pimo#. We will
abbreviate this namespace with
pimo.

xmlns:pimo="http://ontologies.opendfki.de/repos/ontologies/pim/pimo#"

As ontology language RDF(S) is used. We also employ some extensions by the
Protégé ontology editor for InverseProperties and cardinality restrictions.

If the PIMO of a person is like a landscape, we can find different counties
and regions in this mental land. When the user starts to explore the land of his own knowledge, the first area
he reaches is Thing. There, the mental models are represented on an
conceptual level, it is the land of knowledge, where a user models concepts that
the computer can reflect, expressing ideas as near to the mental models as
possible. Going further he comes to ResourceManifestation where the
files and web/pages start. They are connected to the Things. Then there will be the
mysterious caves of metadata about the ontology,
where only programmers have to find their righteous path - InformationSystemConcept.

the class Thing and its subclasses: these serve as representation
of mental concepts of the user. Here you find people, places, organizations,
processes, documents, projects and topics (vulg: tags). The user can express
mental concepts using a semantic PIM application like mindraider, gnowsis or
any other application that supports it.

the class ResourceManifestation and its subclasses: instances of
ResourceManifestation handle resources found on digital systems. Files,
e-mails, webpages, address book items, bibtex items, bibtext files are all
living here. The user creates instances of these by working in native
applications like the filesystem or PIM applications like KDE-PIM
or MS Outlook.
Programmers are responsible to extend this part of the ontology.

the class InformationSystemConcept which belongs to the machine
room of the developers. Here metadata about the user and his PIMO are
collected, namespaces defined, classes for annotations, meta-classes etc.
Nobody should have the urge to change this part of the ontology, but
programmers are encouraged to use it.

These three separated areas fulfill our requirement to separate
resources from concepts. Abstract
concepts are represented as Things, documents as resources.

The subclasses of the
pimo:Thing and instances pimo:Thing are used to represent concepts that
come from the user's mental model. To the right you see the basic class
structure provided by the ontology. We find classes for a person, a group, a
location, document types, organization, event, task, project and topic.

If we have a person called Paul, we can create Paul as instance of
Person. If Paul wants to represent his friends Peter and Tim he can also create
them as being a Person. When they are working in the city Rome, then they can
create the thing Rome as being a city. When they plan to start a new
branch office of their company in Rome, Paul can create the project
BranchOfficeRome as being a project. If Paul has a meeting with others about the
status of the project, he can create the ProjectStatusMeetingRome. You may have noticed that the labels given
to these three things are words without funny characters nor spaces. For
labeling things, we suggest to use wiki-names3.
You might label things any way you want (from the technical point of view) but
for these examples and practical reasons, wiki-names are much better. In our
example, instances of the class pimo:Thing are:

Identifying things with URIs

Technically, the pimo:Things are RDF instances and therefore need
URIs to be identified. All things created by a user are identified using URIs
that come from a namespace personal to the user. That reflects the subjective
view of the world: the Thing Paul identifies as Rome is different from the Rome
of others, therefore the own URI. For Paul's namespace we use
gnowsis://paul@example.com/resources/pimo/. The namespace is abbreviated
using the name of the user, in our case
paul. More about it at the section about the
namespace. The names of things are therefore:

Your own classes

If Paul faces a thing that is not of any type defined in the
PIMO-language, he has to create a new class. Classes created by Paul have
to be subclasses of Thing directly or of existing subclasses of Thing. For
example, if Paul has the new concept of making business plans, he creates a new
class paul:BusinessPlan. As these are a kind of document,
paul:BusinessPlan is a subclass of pimo:Document. Technically seen, Paul
says:

paul:BusinessPlan rdfs:subclassOf pimo:Document

paul:BusinessPlanRomeBranch rdf:type paul:BusinessPlan

Again, the classes are identified using URIs that come from Paul's
personal namespace.

Things, classes and also properties can be labeled. The primary way
of naming things is the pimo:label. The use of pimo:label is not
restricted, you may use any string. There is exactly one pimo:label for a Thing,
it is mandatory. If a thing has more names, the pimo:altLabel property
can be used. These alternative names are useful during text search.

For various applications, a wiki-name of things may be useful. Wiki-names
have to be unique and should not contain any symbols besides alphabetical
letters and numbers, whitespace and funny characters should be avoided. Using
CamelCase to combine more words into a wiki-word is suggested. Wiki-names help
in tagging applications and other simple applications we envision for the
semantic web. Also, the wiki-name might be a good candidate as local name in a
URI.

To show a thing in a GUI, you can also provide icons to
optically represent a thing or all instances of a class. You can assign icons to
things, classes and properties. Use the pimo:hasIcon property to connect
to icons, the icons have to be identified by a URL pointing to a local file or a
web image.

Things can be, once defined, described in more detail using RDF
properties. The most important property is that of the label, which names
the thing and allows us to show it in a user interface. Labels are modeled using
the RDFS:label property. The label may be any string value, but should be
short and descriptive. For practical reasons, we also created a property
pimo:label that allows us to label things inside the Protégé editor. Every thing
must have a label. In the wiki context - when things are used inside text - wiki-names
are needed.

Things can also be described in more detail using wiki-text. For
this, use instances of the pimo:PimoWikiContent and connect them to things using
pimo:hasWikiContent. Although using plain text would be fairly enough to
describe things, wiki-text allows many researchers to create wonderful
applications on top of the PIMO that will surely surprise us all.

Other describing properties can be added by creating sub-properties
of pimo:DescribingProperty. Typically, the user will add properties to
add comments, more descriptions, etc.

For example, if the user wants to enter the fact that Rome has 2.8
mio inhabitants, he may say:

paul:Rome - inhabitants - "2.8 mio".

This will automatically:

create a new rdfs:Property, a sub-property of
pimo:describingProperty. The property is identified using the URI
paul:inhabitants and has an rdfs:label "inhabitants".

create the triple: paul:Rome paul:inhabitants "2.8 mio".

We do not go into details about data-types here, everything is
represented as a string at the moment. The main idea is:

The user creates describing properties on the go. If a new
property is needed, it is instantly created. There is no strong domain/range
typing, if the user wants to use the property in an "unintended" way, the user
is always right to do so.

Practically, this means that we make "ontology by example". If the
property paul:inhabitants is used primarily on cities, then when editing a city,
the user interface should suggest to enter the paul:inhabitants and also list
the existing values that Paul has entered ("2.8 mio").

The things inside the PIMO can have relations to each other,
semantic links. In RDF these are called triples, in Topic
Maps they are Associations. We decided to support both, but focus on the triples
first. A few basic relations between things are assumed:

partOf - one thing is part of another.

hasTopic - a thing is about another

related - a thing is related to another

For each property, we also define an inverse property that
tells us how the relation is from the other perspective. Here are
the inverse properties and their names.

pimo:partOf - pimo:hasPart

pimo:hasTopic - pimo:isTopicOf

pimo:related - pimo:related

Notice that pimo:related is the inverse of itself. This means it is
a symmetric property. If A is related to B, B is related to A. Looking at
Paul's PIMO and his project, we find that:

The meaning is, that the project is related to the city of Rome,
that the business plan is part of the project and that the topic discussed at
the meeting is Rome. If the user wants, he just might add all this as plain
relations using pimo:related, this depends on the intentions of the user.

Creating your own relations

If the user wants to connect people to projects using a more
precise semantic, he can create new properties. Any property created by the user
must be a sub-property of either partOf, hasTopic or related. For example, the
property "supervises" can be created to say that a person supervises
something. The property will be inside the namespace of the user and has to have
a super-property.

paul:supervises - rdf:type - rdf:Property

paul:supervises - rdfs:subPropertyOf - pimo:related

paul:supervises - rdfs:label - "supervises"

We also instantly create the inverse property, after asking the
user what it may be called. "is supervised by" is a good name.

paul:supervises - protege:inverseProperty - paul:isSupervisedBy

paul:isSupervisedBy - rdf:type - rdf:Property

paul:isSupervisedBy - rdfs:subPropertyOf - pimo:related

paul:isSupervisedBy - rdfs:label - "is supervised by"

Note that sub-properties of related do not have to be symmetric.

Inverse Properties are mandatory

As a rule, every property defined to connect one thing to another
has to have an inverse property attached to it. This was a requirement made by
Jean Rhomer that is realized here [Rhomer2005].
The inverse properties are needed to allow the user to make connections from
any thing to any other thing. As all properties are directed (with the
exception of pimo:related), the inverse properties are needed to express this.

If the user creates his own custom properties, each property has to
have an inverse property connected to it automatically. If the user creates a
property called "supervises" the inverse property must be created.
It can be initialized with a bogus label like "inverse of supervises"
which then can be changed by the user to "is supervised by".

Symmetric properties have themselves as inverse.

Relations are flexible - ontology by example

The user can create relating properties on the go. If a new
property is needed, it can be created instantly. There is no strong domain/range
typing, if the user wants to use the property in an "unintended" way, the user
is always right to do so.

Practically, this means that we make "ontology by example". If the
property paul:supervises is used primarily on people, then when editing a person,
the user interface should suggest to enter the paul:supervises relation and also
list the existing values that Paul has entered. The domains and ranges of
properties may be defined to something specific, but as an alternative every
property may be used on a pimo:Thing. The user interface may support a stronger
typing and may support the user by suggesting useful properties that fit a
certain type.

If binary relations are not enough, then an instance of the
pimo:Association class is needed. For example, the role of Peter in relation
to the branch-office project may be more complicated than just "manages the
project". Peter may have begun his job sometime during the project and his job
ended sometimes during the project, to be replaced by Jacob. Then the role Peter
has had was only true for a certain amount of time. For this, the pimo:Role
class helps, which is a sub-class of pimo:Association. By making an instance of
pimo:Role, you can connect a person to a project and tell more about the role
the person is having and also more metadata. For example, if another person is
monitoring if Peter is playing his role good, you may add another person to the
relation.

We will not go into further detail on associations, because we will
not support them in the gnowsis beta GUI. If you need more information, please
consult the XTM standard documents, where the concept of associations is
explained in more detail. [XTM,
TM-empolis, TAO].

Files
on the hard-disk, web-pages or any other resource from local information systems
can be integrated to the PIMO of a user. The ontology does not end at the border
of our PIM application, it can be used to connect outside resources. The main
class to represent resources is ResourceManifestation. The class itself
and its subclasses are considered to be addressable resources in the sense of
RDF. We deliberately did not use the term Resource here, as our
pimo:Thing is also a resource in the meaning of RDF. On the right you see
typical classes of resource manifestations. Note that this scheme can be
extended by programmers, if you write an application that brings in a new
resource-type, create a new class for it. Comprehensive and useful class
structures for these ResourceManifestations are currently developed in the
Aperture project, where also adapters and extractors are written that can gather
such resources from a system and from the web [Aperture].
The resource structure is deliberately held very simple to be open to such
extensions.

Instances of resources are not created by the user manually, they
should always be created automatically by the system itself. They are identified
using a URI that fits best - for a file, the file:// uri scheme, for an e-mail
the messageid or IMAP uri. You only must guarantee that you can find the
resource based on its URI and that the URI is unique on the desktop. You may
also assure, that the URI is unique globally (in the whole world). The uri may
be added to the resource as additional property pimo:uri but as the URI
used as RDF identifier anyway, this is just to simplify modeling in Protégé

Example of such resources are:

branchinrome - The folder for the project filesfile:///C:/Documents/Paul/documents/projects/branchinrome/

Picture23.jpg - A photo of the new office in Romefile:///C:/myDocuments/Paul/documents/projects/branchinrome/photos/image23.jpg

Meeting at 29.9.2005 - meeting in outlook, topic is the
branch office in Rome.gnowsis://paul@example.com/resources/outlook/appointment/00000000ECD4B99358B9814B9DAFE2255CD8AE9A44EF3000

Note that the third resource again has a funny looking URI, this
time created using an identifier from Microsoft Outlook in combination with a
Semantic Desktop URI scheme for a personal namespace (see
below).

The main relation between mental concepts, expressed as pimo:Thing
and files, expressed as pimo:ResourceManifestation are links of the type
pimo:occurrence. An occurrence connection means that the mental concept
expressed in the Thing X can be found in the document found in the
ResourceManifestation Y. The definition of occurrences and the idea behind are
taken from the XML Topic Maps standard, where an occurrence means [XTM].

An occurrence is any information that is specified as being
relevant to a given subject.

Adding an occurrence triple between a thing and a
ResourceManifestation expresses exactly that. The inverse property of
pimo:occurrence is pimo:isOccurrenceOf.

These triples are generated by the user quite frequently and can be
also created by the system automatically. For example, Paul notes that the
website of the city of Rome is an occurrence of his thing paul:Rome by saying:

paul:Rome - pimo:occurrence - <http://www.comune.roma.it/>

<http://www.comune.roma.it/> - rdf:type - pimo:Webpage

Identification via pimo:occurrenceRef

There is a special occurrences in the PIMO, that is
also used to identify a Thing: the pimo:occurrenceRef. It connects a
thing to a public resource that can be used to identify the thing. More on this
way of identification in the section about
identification.

Custom sub-properties of occurrence

It is possible and allowed to create sub-properties of
the occurrence property to express a certain kind of occurrence. We strongly
discourage this because:

keep it simple. Represent the complex in the land of
pimo:Thing

the occurrence is annotated anyway: the resourse has a
type and some other properties

The principle of tagging is that resources are annotated using keywords. This has led to so called
Folksonomies, where people tag web-pages or photos collaboratively. Using the PIMO, a simple
kind of tagging is also possible: things are treated as tags and can be used to annotate resources
using the occurrence relation. Any thing can be used to tag any resource, allowing the freedom of
folksonomies on the desktop. This is a mixture of ideas from topic maps, folksonomies and RDF.

Here an example of how a user interface might look like that supports
this kind of tagging of e-mails. The user enters a few letters of the intended tag,
and the system searches for possible things to tag with. The user interface should filter to
only show the most relevant tags, ie down to seven tags to show. In the gnowsis semantic
desktop this will be supported by text analysis components.

An e-mail is by definition a
pimo:ResourceManifestation which can only be annotated using the relations
pimo:occurrence and pimo:occurrenceRef. So when you want to just "tag" the
e-mail, use these two relations. What happens now if you want to say that this
e-mail is "related" to another thing or "part-of" this project? Or use more
custom properties you created?

Then you reify the e-mail into an instance of
type pimo:Thing (or a suitable subclass). So the first thing to do is ask
the user what kind of document this e-mail is. Is it just a pimo:Document or is
it a paul:BusinessPlan? Is it a paul:Invitation? Normally, you would reify text
resources as Documents and for example address book entries as persons. After
having it reified, the e-mail is now existing both as a ResourceManifestation
and as Thing. The thing representing the e-mail can be annotated as you like,
using your own properties etc.

During the process of reificating resources as things,
it is surely possible to create as much data attached to the thing as possible.
For example, you may relate other persons to the reified e-mail if they are
mentioned in the to: and from: fields of the e-mail and if they already are
existing as persons, or you might suggest to the user that the software can also
reify the persons. More about this below, in section
Automatic
Creation of Things from ResourceManifestations. Result is:

Every person has - as already explained in
philosophical background - a personal
view of the world. So the mental models inside one's mind are subjective,
personal. No two people conceptualize the concept "Rome" the same. This is
reflected in the PIMO by employing a personal namespace for each user. The
personal namespace could be any namespace that can be referenced and resources
in the namespace can be downloaded via a protocol.

Sadly, the HTTP protocol is not a good candidate for the personal
namespace. Why? Because if Paul would use the namespace
http://www.example.com/~paul/pimo# to represent his mental models, this
would require that he is always online while modeling. Or, if he models offline,
it would require him to have a website called
www.example.com which he has to buy and host. Then he would have to install
a web-service there that serves his ontology. In the real world, most people do
not have a website, and those who have do not keep it up to date. Also, if Paul
has a change in his life that separates him from
www.example.com, his data would be lost (at
least frozen in the last state).

For the semantic desktop, and our gnowsis implementation (at least the
March2006 version), we decided to use the gnowsis URI scheme to identify
concepts and resources of individuals. The reason is that we do not use http as
a communication protocol but instead use the peer-to-peer protocol XMPP
(jabber). This allows any user to host the RDF resources where ever they want.
If they are online, via the jabber protocol, if they are offline via the jabber
server. For this and other reasons Frank Osterfeld and Malte Kiesel created the advanced jabber server nabu [ref-nabu].
Even if the user's do not employ the possibility of P2P communication, the idea
of the namespaces are practical.

Things and classes of things inside the user's PIMO are identified using URIs
from the user's PIMO namespace. These are clearly identifiers for the concepts
behind. Refering to the article by Booth, we can say that RDF URIs used to
identify instances of pimo:Class, pimo:Thing, and pimo:Slot identify the
concept represented by these [Booth2003]. As
the URIs in the personal information management domain identify the concept
bound to the subjective view of the user, the namespace is bound to the person.
We suggested to use Jabber-ids as part of the namespace, they are a technical
identification of the user, like an e-mail address. See the section
above.

Public identification for communicating things via the internet can be
achieved using the pimo:occurrenceRef property. This points to a URI of a
resource that can be used to publicly identify the thing. For the personal Thing
"Rome (a City, a Location)" from Paul's PIMO, the website of
Rome and the wikipedia entry can identify the city:

If the identifier of a thing is not an addressable URI (=locator) but only an
identifier, we recommend to use the property pimo:identifierRef or
sub-properties of it. Books can be identified using their ISBN number in this
field. They can also be identified using an occurrenceRef to the webpage of the
book at the amazon.com website, to allow this common practice of identification.

To identify persons, either the property containing the pimo:jabberId
can be used or a pimo:mbox to the e-mail-address of a person. URIs for
e-mail addresses are inspired by the foaf-project. Which raises the problem of
spammers and the use of e-mail addresses, they suggest to use the sha1-sum of
the e-mail address to solve this problem. We added both properties to the
pimo:Ontology (as sub-properties) to be compatible with this important feature
of foaf. Additionally we suggest to always use the lowercase version of an
e-mail address. If you have two e-mail addresses, prefer the shortest one (in
character length) but try to add all. So people can be identified in one of
these ways:

The identification of resources (documents, e-mails) is by their
URI. The URI used in the RDF triples is the identifier of resources. Other
identifications (like identifierRef, etc) are not supported. It can be expected
that the problem of identifying files, e-mails, web-resources is already solved
by the URI standard. Although we are aware of some deficiencies (for example in
identifying local files in a filesystem), we will not suppose a solution here
but instead wait until others have solved these problems.

If there is no useful URI scheme for a resource, we suggest that you as
developer make up a URI scheme. Many successful software projects have done so
(for example, mozilla with their XUL URIs). If you need URIs to identify
resources, and have no clue what to do, you can use a sub-space inside the
gnowsis URI space, which was made up by us for the gnowsis system. See
above for more details and contact the
author.

The class pimo:PersonalInformationModel is used to represent instances
of a PIMO. For example, Paul's PIMO is also identified inside using an instance
of pimo:PersonalInformationModel. This instance is used to link the created
things and relations to the PIMO. Every instance of thing, for example
paul:Rome, was created inside the PIMO and to note this, a link between the
PIMO and the Thing can be made, called pimo:metaDefines.

The instance inside Paul's model is identified using a URI from Paul's
namespace: paul:PIMOInstance. Explicitly, we would find these triples:

For each property, thing or class defined by the user, a relation has to be
created that states that it was created in the context of the
pimo:PersonalInformationModel. Basically, this means that everything entered by
the user must be annotated as belonging to the pimo-instance using the
pimo:metaDefines property. The pimo:metaDefines relation is mandatory. It
is needed to separate different ontologies from each other and to safely mix
them. Separation just based on the namespace is not an option.

Luckily, you are not alone. Other people might write very useful classes and
relations for the PIMO language you might want to import. Also, the existing
RDFS vocabularies like
foaf-relationships may be interesting. We call these ontologies "domain
ontologies" as they are often called so in literature.

There are two ways to represent and get such a domain ontology:

the domain ontology is expressed using the PIMO - import it

the domain ontology is expressed using RDFS or OWL - convert it to PIMO,
then import it.

You see that it is not possible to "just include" RDFS ontologies and hope
that it will work. There are stronger constraints on ontologies than in RDFS or
OWL, for example each defined Class has to be a direct or indirect sub-class of
pimo:Thing and that all properties must have an inverse property. Metadata is
needed about the ontology and all elements have to be connected to the ontology
using the pimo:metaDefines property.

Let us look at the conversion process to PIMO first and then how you import a
PIMO domain ontology. The conversion from RDF(S) is straightforward, you have
to:

add an instance of pimo:DomainOntology, you may use the same existing
instance of OWL:Ontology if you like. For example, foaf already defines an
OWL:Ontology instance for its metadata.

For all properties, they have to be sub-properties from one of pimo:partOf,
pimo:hasTopic, pimo:related, or pimo:describingProperty.

Each property needs an inverse property.

Each class has to be subclassed from pimo:Thing. Please choose wisely to
be compatible with the current PIMO.

Each property and class has to have a pimo:metaDefines. Basically,
each and every mentioned resource that is used as a subject in a triple has to
have a pimo:metaIsDefinedBy.

Your domain ontologies may import other domain ontologies. That is a
nice feature of RDF in general, that you can mix ontologies and build larger
ontologies in that way. For example, the foaf-relationships domain ontology may
require the foaf domain ontology. To state that a domain ontology imports
another domain ontology, the relationship pimo:metaImports is used.

Importing a domain ontology into a user's PIMO is then practically done by
downloading the domain ontology and storing it into the user's PIMO storage. You
should be able to savely mix domain ontologies into the PIMO storage, as each
resource in a domain ontology must be tagged by pimo:metaIsDefinedBy. After
importing, you then add a relation between the user's instance of
PersonalInformationModel and the imported ontology using the
pimo:metaImportedDomainOntology to say that you imported it. Attention: if
the imported ontology X defines to need another ontology Y by
pimo:metaImports, then you must import Y also. All or nothing.

Note that you cannot import one person's PIMO into another person's PIMO, as
you cannot look at the thoughts of other people we do not support nor encourage
importing PIMOs. They are private to the user and if the user decides to tell
something to others, parts may be communicated. We discourage to import the
whole PIMO of Paul into the PIMO of Peter.

When domain ontologies are imported, the user might notice that things
represented in his ontology are also represented in the domain ontology. For
example, the class "Car" might be represented already in your ontology
and a domain ontology you download. As the two classes are kept in different
ontologies, it might not be suitable to merge them, then it is possible to say
that they are just "other conceptualizations" of the same concept. For
this we provide mappings that point to these other conceptualizations, namely to
map classes, instances and properties.

pimo:hasOtherConceptualization - maps classes

pimo:hasOtherRepresentation - maps instances

pimo:hasOtherSlot - maps properties (slots)

The interpretation of these statements should be in the sense of owl:sameAs
or
skos-mapping:exactMatch. A user interface should display the classes mixed
together, search engines should also mix the results of two different
conceptualizations together. Same with otherRepresentation and otherSlot.

User interfaces should show the user's representation primarily and link to
the other representations. The user's point of view is always the right one.

What happens if the user notices that two things are the same? This will
happen when algorithms create things automatically and fail during the search
for matching things. For example, if a person is extracted from the data of an
e-mail and created as an instance of pimo:Person while the person was already an
instance there (but labeled using another name).

Merging two things should reuse as much of the data as possible. If
two things A and B are merged into thing C:

One of A or B should be deleted, C should have the URI of A or B (for
simplicity). Preferably a URI that contains the new label as local name. The
URI of the deleted resource should be forgotten.

C should have all classes of A and B as a union.

C should have all relations, occurrences, etc of A and B

This leads us to changing the type of a thing and multiple types. Changing
the types of a thing may imply that the properties and relations are not valid
anymore, if the properties are strongly typed. It is suggested to keep existing
properties and relations, even if the ontology is not valid then. If the user
changes the type back to the old type, the properties are valid again and also
the user may correct the errors manually. The user interface should remind the
user of possible errors but should not enforce the correction. Weak typing and
ontology-by-examples are the main principles behind this strategy, also that
"the user is always right".

Deleting a class entails that instances are typed to the super-class
of the deleted class. The topmost class is pimo:Thing. Classes defined in
the pimo language cannot be deleted.

Deleting an instance also deletes all relations to and from this
instance, including occurrence relations. Note, that also
pimo:creationSupportedBy links may be deleted by this, which may result that an
algorithm soon creates the instance again (if the user somewhere configured to
created instances automatically). See below in section
Automatic Creation of Things from ResourceManifestations. In this case, it may be better to use the
pimo:metaHidden property to hide it.

Sometimes, things may be outdated or not needed anymore, but they should
remain in the PIMO to be searchable. To do this, the user may hide a
thing, class or property. Even the default classes like pimo:Person can be
hidden, with an exception: pimo:Thing must not be hidden.

The user interface should clearly state that "some things are hidden", to
allow the user to find the resources again. Also, search results may choose to
rank hidden objects very low or give indications about them. Hiding things may
also be initiated automatically, for example by an algorithm that hides things
that have not been used for a longer time. To hide something, use this triple:

paul:Tim pimo:metaHidden "true".

Note that all system-classes and slots have a property pimo:metaViewLevel.
This property indicates if or if not the object in question should be displayed
to the user in common data visualizations. The view-level SYSTEM-ONLY
means that the object in question should be hidden from the user and only be
available in appropriate configuration situations. SYSTEM-USER means that
the object may be shown in any visualization. Creating generic visualization
algorithms can make use of these values. Note that this property is required and
you can always check it in evaluations.

It is obvious that many things that a user can think of are already represented
in his resources. Paul already has many entries in his address book, aren't they all
candidates for pimo:Thing instances? We know that many algorithms and approaches
exist to automatically generate or extend ontologies based on data in text files
or other data sources. For the PIMO, one of the requirements from gnowsis states to create active folders, or
active things. Automaticly creating thing from ResourceManifestations is needed
to make the semantic desktop work. For this, we create a few new relations that connect
things to resources and define the existing relations better.

Semantic Relations between Things and ResourceManifestations

You are already familiar with pimo:occurrence and pimo:occurrenceRef relations.
These connect things to resources in which the thing appears. In the context of
automatically creating things, the created thing can be connected to the resource
via the pimo:metaCreationSupportedBy property.
If a PIM concept or instance was generated due to the existence of one ore more resources,
here is the link to these elements. Binds an address book card to the person instance
or a PDF file to the instance of "contract".
It is also important for scanning resources: if a new resource is found, did it already
support the creation of a PIMO-thing? The property pimo:metaCreationSupportedBy is
intentionally not a sub-property of pimo:occurrence. If the resource is also an occurrence,
use both properties to relate the two.

An interesting question remains how a URI and a resource correlate to the
thoughts and concepts in the mind of the user. From the user's point of view, it is
important to correlate a thought in the mind to a resource in the computer. First
there is a thought in the user's mind that leads to the decision to search for a
resource in a computer; then the user has to find this resource that his thought is
related to. This relation reaches beyond a simple occurrence
or a pimo:metaCreationSupportedBy. When the idea represented in the thing is also represented
in the resource, they are bound together semantically.
The idea that is represented by the pimo:Thing is the same idea represented by the pimo:ResourceManifestation.
For this, the property pimo:groundingOccurrence is reserved. This
relations should be used in a one-on-one manner: one pimo:Thing is linked
to one or zero pimo:ResourceManifestation and one pimo:ResourceManifestation is
linked to one or zero pimo:Thing using the pimo:groundingOccurrence.
The thought is like an electrical current that originates in the user's mind and
searches for a place to discharge, a resource that matches the thought.
We defined the grounding resource as follows [Sauermann2003]:

For a thought in the mind of the user there may exist a grounding resource. It
is the resource that is associated best to the thought. The symbolic
representation of a thought is the grounding resource.

Under discussion: There can only be one grounding resource for a thought?

The term grounding was also mentioned in Georg Dorffner's book
"Konnektionismus". Concepts and symbols can be designed in a way that they can
be experienced in states of a computer system. He noted that the term was used in
the community of artificial intelligence and knowledge theory researchers.
[Dorffner1991, page vi].
The reason to speak of a grounding is that the stream of thoughts continues to
flow when it has reached the grounding, now the user can follow the links in the
PIMO in addition to the associations in the mind. Technically, the
pimo:groundingOccurrence is a sub-property of pimo:occurrence,
hence a grounding is also an occurrence. The user interface should treat
the grounding resource in a special way, perhaps integrating the metadata of
the resource or highlighting the link.

The implementation of these features will follow, then we can also provide
an RDF-Schema to configure active folders in a general way.
The extension of the PIMO to represent these active folders will be
part of the upcoming PIMO-api ontology. Look out for it in the next months.
Contact the developers of gnowsis at some time.

The semantics of the PIMO language allow us to verify the integrity of the data. In
normal RDF/S semantics, verification is not possible. For example, setting the domain
of the property knows to the class Person, and then using this property on an instance
Rome Business Plan of class Document creates, using RDF/S, the new information that
the Document is also a Person. In the PIMO language, domain and range restrictions
are used to validate the data.
The following rules describe what is validated in the PIMO, a formal description is
given in the gnowsis implementation's PIMO rule file.

All relating properties need inverse properties.

Check domain and range of relating and describing properties.

Check domain and range for rdf:type statements

Cardinality restrictions using the protege statements

Rdfs:label is mandatory for instances of ”Thing” and classes

Every resource that is used as object of a triple has to have a rdf:type set. This is a
prerequisite for checking domains and ranges.
Above rules are checking semantic modeling errors, that are based on errors made
by programmers or human users.

Following are rules that check if the inference engine
correctly created the closure of the model:
–

All statements that have a predicate that has an inverse defined require another triple
in the model representing the inverse statement.

The rules work only, when the language constructs and upper ontology are part of
the model that is validated. For example, validating Paul’s PIMO is only possible when
the PIMO-Basic and PIMO-Upper is available to the inference engine, otherwise the
definition of the basic classes and properties are missing.
The validation can be used to restrict updates to the data model in a way that only
valid data can be stored into the database. Or, the model can be validated on a regular
basis after the changes were made. In the gnowsis prototype, validation was activated
during automatic tests of the system, to verify that the software generates valid
data in different situations. Ontologies are also validated during import to the ontology
store. Before validating a new ontology, it’s import declarations have to be satisfied.
The test begins by building a temporal ontology model, where first the ontology under
test and then all imported ontologies are added. If an import cannot be satisfied, because
the required ontology is not already part of the system, either the missing part
could be fetched from the internet using the ontology identifier as URL, or the user can
be prompted to import the missing part first. When all imports are satisfied, the new
ontology under test is validated and added to the system. A common mistake at this
point is to omit the PIMO-Basic and PIMO-Upper import declarations.
By using this strict testing of ontologies, conceptual errors show at an early stage.
Strict usage of import-declarations makes dependencies between ontologies explicit,
whereas current best practice in the RDF/S based semantic web community has many
implicit imports that are often not leveraged.

For real-life applications (like the gnowsis beta version) we suggest to
separate the storage of resources from the storage of the PIMO ontologies. All
automatically extracted information about resources (i.e. from a lokal desktop
search engine) should be in one store. All hand-crafted or algorithmically
created data from the PIMO should be stored in a capable RDF database.

An analysis of existing ontology languages shows that many problems are
solved, but in different ontologies and using different practical ways. The
experience from the gnowsis workshops, implementations and many other papers
have helped to create a useful ontology for the semantic destop. The suggested
PIMO ontology builds a simple and practical ground for future personal
information management applications. A practical example and guidelines how to
create instances of the PIMO ontology should allow you to implement the PIMO and
extend it according to your own ideas. The ontology itself and the according
example can be downloaded here: [PIMO-lang,
PIMO-example]

With the described PIMO ontology and its practical description, I hope to
give a good technical and philosophical ground for the upcoming gnowsis beta
implementation.

Some parts are still missing, but will be addressed in upcoming publications
on the PIMO.

First of all, thanks to my wife Ingrid Brunner, who allowed me to write this
report mostly on my weekends. Just now, she is waiting for me to come home to go
for dinner.

In the DFKI Knowledge Management group, we had
much input from the EPOS project
and its team. Many thanks to Ludger
Van Elst who created the PIM-Basic ontology which defines classes and topics
and the PIM-Upper ontology which defines detailed classes for people, groups,
events, process support, and possible associations to link instances of them.
Heiko Maus and
Sven Schwarz gave input in
many ongoing discussions of the existing ontology languages.
Lars Zapf is an expert on Topic
Maps.
Andreas Dengel, our director repeatedly insisted to keep it simple and
usable and is supervising our work. The scenario for the semantic annotation
of meetings was thoroughly researched by Man Luo, a diploma student of
Leo Sauermann. Also many thanks to Frank Osterfeld and Norberto Fernández for
proofreading and error-correction.

Input outside the DFKI and the core gnowsis project came from our friends in
Berlin, where we repeatedly committed gnowsis workshops. Leading there,
Richard Cyganiak and
Anja Jentzsch gave much input regarding what
they expect from gnowsis, thanks also to all other participants in the gnowsis
workshops. Max Völkel from Karlsruhe also
discussed with the authors.

Thank you. We are currently coding these ideas into an open-source project
called gnowsis. It will be part of other software, too, so you might find it
somewhere else. If you want to use the PIMO, just download the gnowsis beta 0.9
from www.gnowsis.org in April 2006. Do not
download anything before the beta, PIMO is not part of the alpha. If you
are a developer, go to
http://gnowsis.opendfki.de and see how the status of the project is. If you
want to use the PIMO in your own applications, go ahead. The gnowsis project
helps you as a java reference implementation.

You can also contact the author. The author is also happy if you cite his
other publications.

Q: This is all so technical, where is the big theory behind?

Please read first the recommended literature indicated in the abstract. [RDF-PRIMER],
[RDFS], [OWL], [TM-empolis].
The theoretical background is that we use the ideas of Topic Maps and remix them
with RDF, SKOS and personal experience from the EPOS project and gnowsis.

This is intentionally technical, because we think that ontologies only work
when you have covered different dimensions:

If one of those is missing or failing, the ontology will fail. It is not
usable to talk about ontologies only focusing on one level. So the big theory
behind is: ontologies only work when describing the meaning behind, the
ontology as RDF vocab, example data and code. For PIMO we provide all of
them.

Q: Why did you use RDF-Schema as basis and not OWL-Lite nor OWL-DL or SKOS?
Or Topic Maps?

Our reasons where all inspired by the principle to keep it simple.

OWL-DL would feature that something can be a class and an individual in
one. needed by us but simpler to achieve with RDF-S.

AnnotationProperty and Ontologyproperty were not needed

As we focus on a simple frame-like structure with explicit classes, we
don't use much of the existence/inference features of OWL-DL. or OWL full.

We were mostly fond of InverseProperty and the nice support of it in
Protégé already on RDF-S level

RDF-Schema does not support InverseProperty but this is the only real need
for OWL..

SKOS does not support properties (like mbox).

We want to make a good ontology for one application domain: one person
expressing ideas for personal information management. Connect the ideas to
files, e-mails, webpages and everything else.

Other ontologies were created out of other needs.

Q: You proposed to identify concepts like "Rome" using URIs from wikipedia.
This might be a sound solution from the technical side, but first of all these
URIs might change and second the URIs at wikipedia identify websites, and not the concepts behind. Isn't this half baked?

True, this solution is far from perfect. Don't suggest us to use the
namespace YOU have invented to identify concepts. And don't suggest us to just
"change the url of wikipedia a little, so that it is a conceptual uri". We won't
just add ".concept" at the end or some other silly thing at the front. People
always suggested different solutions for that during the last years and Leo
Sauermann gets nervous when you mention these suggestions.

The real problem is: how to identify
abstract concepts in software systems, that we use URIs as syntax is a start but
the scheme and meaning of the URIs has also to be considered. Others have
proposed solutions that also look nice, see [IdentificationIssues].
But from my point of view most of the other solutions either don't scale or are
too artificial to be understood and broadly used. What we propose is to use the
most popular web-sites can be used to say "this public resource can be used
to identify the concept X", because it is popular and people can relate the
individual pages on such a website to the concepts behind. For broad adoption of
the semantic web, using popular websites can help as normal, non-technical
people can relate to them. For movies, people can understand that a website on
the Internet Movie Database (IMDB) describes
exactly one movie and that this may be used for identification of the movie.
Many movie and cinema websites link to IMDB anyway when content authors want
their users to know more about a certain movie. Individual persons use the links
on their weblogs and inside e-mails, the sentence "Let us watch
Life Aquatic tonight" is much
easier to understand with the link to IMDB. The same accounts for wikipedia,
a website many people can relate to. Also, these popular websites have an
intrinsic need to cover most of their user's interest, so we can hope that they
are complete enough to cover most concepts we face in daily life. On wikipedia,
the user might even add new pages when they miss to identify concepts. We named
now two extremely popular services on the web. This is another side of
identification - when people interact with mass media and the web, the most
popular sources seem to catch an exponential amount of interest. Known as
Zipf's Law, this usually
means that the top ten items get about half of the world's interest. In the
music business, this means that the top ten songs in the charts are sold more
than the top 100, inside the top 10 the top 3 are sold more than the others, etc.
Once a service has a technical or social advantage above others, it gets boosted
by word-of-mouth. The success of google can be seen in this light. From the
technical side, the web 2.0 trend for web-services helps, it is often
easier to integrate services like wikipedia into custom software than other
public ontologies. So you see that the problem is not just of identifying things
with URIs but worldwide adaption, social interaction habits of people and
statistical indications.

David Booth suggested to either make the distinction between the "four things
a uri identifies" either by using different URIs for each context of to use the
same URI and a way to express the context. Leo Sauermann has bad feelings about
using special URIs to identify concepts, feelings that got stronger during four
years of Semantic Web hacking. He therefore chose the "Differentiation by
context" approach, that was similarly proposed by Steve Pepper and is also a
basis of the Topic Maps standard. So, we give it a try.

Our biggest mistake now would be to proclaim "we build a
completely new service to publicly identify concepts, called SEMANTICX
and better than anything". This service probably won't beat wikipedia (or
whatever is popular at the time) in terms of adoption, scalability and
performance and so it won't provide the needed functions. Over time, we expect
that different services will get popular, then these other services will be
integrated by application developers into semantic applications, we see this as
an evolutionary process. If you think our argumentation sounds good, copy
it.

<!-- this is the website I sometimes have to edit -->
<topic id="www.dfki.de">
<!-- to identify the website, I use its web address.
my colleagues will hopefully do the same -->
<subjectIdentity>
<resourceRef xlink:href="http://www.dfki.de/" />
</subjectIdentity>
</topic>
<!-- this is the DFKI company -->
<topic id="DFKI">
<!-- the company is mentioned on the webpage at the specified address -->
<occurrence>
<instanceOf><topicRef xlink:href="#webpage"/>
</instanceOf>
<resourceRef xlink:href="http://www.dfki.de/" />
</occurrence>
</topic>

So we see that the website can be connected to a concept/topic in two ways.
Either an occurrence or a subject identity. Separating concepts from documents
and linking to resources with two different ways is a needed approach.

Q: I want to tag an email or a webpage, how do I do that?

You find a useful name for the tag, like "SemanticDesktop". Then you connect
the e-mail to the thing that is named after this tag. This works like this:

find the URI of the e-mail. suppose its email:123

see if the e-mail is already represented as pimo:Email in the place where
you store all your ResourceManifestations. If not, save the email into this
place and add a type and label triple there.

now you search for the URI of the thing that you can use to tag.

is there a pimo:Thing that has "SemanticDesktop" as pimo:wikiName?

is there a pimo:Thing that has "SemanticDesktop" as pimo:label?

if not, create a new pimo:Topic with the pimo:label "SemanticDesktop" and
give it the URI paul:SemanticDesktop (if not taken already, otherwise
create any uri inside the user's namespace).

connect the pimo:Thing paul:SemanticDesktop with the email uri email:123
using the property pimo:occurrence and store the triple in the PIMO of
your user.

Q: I looked at the way of tagging things, this is so complicated and
not as cool as web.2.0. Can't we make it simpler?

You can automate the process completely. If you think it too complicated,
shortcut the search for pimo:Things and just always link to the URI matching the
tag-name paul:SemanticDesktop. But be warned, this may lead to some
quirks later on. Web 2.0? Knowledge isn't simple and you want to have a
long-term use of your annotations. At the end, we want to make a Semantic Web
and not just web 2.0. If we all tag like del.icio.us, at the end the most
important tag will be "important" which is rather dull.
Leo Sauermann blogged
that he expects, slowly, when people see these problems, the web 2.0 community
will embrace ontologies and favor them compared to folksonomies. The representation of
tags as things and semantic web resources allows the user to make one tag
"part-of" the other, which helps during search. Think of the semantic web as
investing to a better future.

Q: Why did you use your own way of importing ontologies, and
inverseProperty isn't OWL enough?

No, OWL is not enough. First: we don't use OWL, we use RDFS. We also won't
use "just a few relations from OWL" because we think that either use OWL as
intended or don't use it at all. Developers will thank us that they aren't
tempted to look at OWL and get nervous by this. We have experience in OWL and we
like OWL, just not for this special case here. Also read above at our longer
answer to RDFS, OWL disallows some things we needed.

Also, OWL doesn't state that the other ontology has to be at the URL given to
the ontology. This ignited the rather dull practice of ontology authors not to
put their ontologies on the URL indicated by the namespace. Some authors
invented funny namespaces for their ontologies, practices we discourage. Tim
Berners-Lee also discourages the sloppy behavior regarding namespaces and
putting documents there, this is a real stopper for the semantic web.

Wordnet for the Web. This
page documents the xmlns.com Wordnet vocabulary service, an RDF schema based
on version 1.6 of the Wordnet lexical database. Terms of wordnet can be
referenced using this namespace: http://xmlns.com/wordnet/1.6/

Nabu is a plugin for Jive
Messenger, a server implementation of the Jabber Instant Messaging protocol.
It provides server-side logging of chat conversations and related events. The
logged data is stored in a semantic graph, using the RDF W3C standard.

On the second gnowsis workshop in Berlin we had a vivid
discussion on the future of gnowsis and the semantic desktop in general.
Input from the Berlin group again showed that tagging and simplicity may
play a key role in the future systems.