This note is primarily intended to stimulate further work and debate across a
broad range of Information Technology disciplines and initiatives. As such it
does not contain any direct recommendations, but instead centers on ideas for
the application of the Semantic Web in Systems and Software Engineering.

This section describes the status of this document at the time of its publication.
Other documents may supersede this document. A list of current W3C publications
and the latest revision of this technical report can be found in the W3C
technical reports index at http://www.w3.org/TR/.

Publication as a Working Group Note does not imply endorsement by the W3C Membership.
This is a draft document and may be updated, replaced or obsoleted by other
documents at any time. It is inappropriate to cite this document as other than
work in progress. Other documents may supersede this document.

1. Introduction

Until recently work on accepted practices in Systems and Software Engineering
has appeared somewhat disjointed from that breaking ground in the area of formal
information representation on the World Wide Web (commonly referred to as the
Semantic Web initiative). Yet obvious overlaps between both fields are apparent
and many now acknowledge merit in a hybrid approach to IT systems development
and deployment, combining Semantic Web technologies and techniques with more
established development formalisms and languages like the Unified Modeling Language
(UML). This is not only for the betterment of IT systems in general, but also
for the future good of the Web, as systems and Web Services containing rich
Semantic Web content start to come online.

2. Background

2.1 Composition and Reuse

Throughout the history of computing the concepts of component construction
and reuse have undergone a quite remarkable evolution. Over time many different
types of software and systems building block have been advocated, demonstrating
ever increasing levels of abstraction and encapsulation. In the earliest computer
systems, 'functions' were the predominant unit of composition, returning the
same results for a given set of inputs every time. However, it was soon realised
that this approach was somewhat clumsy, giving way to more substantial aggregation
mechanisms such as 'subroutines' and 'libraries' in the 1960s and 1970s. All
the same, even these approaches could not effectively manage shared data and
concurrency properly, resulting in systems of unnecessary complexity. Consequently,
the mid 1980's saw a number of advances in abstract systems thinking, culminated
in the introduction of classified components commonly known as 'Objects'.

The construction of objects effectively encapsulates both data and functionality
into usable bundles for processing to take place and today Object Oriented
(OO), theory and techniques are accepted as mature concepts in most areas of
Information Technology. As such they provide a number of acknowledged engineering
advantages, including component reuse.

By collecting objects together into meaningful 'artifacts' or 'assets', and
artifacts into 'systems' or 'applications', reuse can be achieved at higher
levels. Therefore, the terms 'object', 'artifact' and 'asset' are often considered
to be interchangeable and also incorporated the idea of pure data as a valid
component type. The terms 'system' and 'application', however, generally imply
larger computational units, composed of both usable data and functionality.

2.2 A Heritage in Model Driven Architecture

Even with such advances in representation and composition, engineering systems
of any significance is still difficult. For this reason, in all well-established
engineering disciplines, modeling a common understanding of domains through
a variety of formal and semi-formal notations has proven itself essential to
advancing the practice in each such line of work. This has led to large sections
of the Software Engineering profession evolving from the concept of constructing
models of one form or another as a means to develop, communicate and verify
abstract designs in accordance with original requirements. Computer Aided Software
Engineering (CASE) and, more recently, Model Driven Architectire (MDA) provide
the most prominent examples of this approach. Here models are not only used
for design purposes, but associated tools and techniques can be utilised further
to generate executable artifact for later use in the Software Lifecycle. Nevertheless
there has always been a frustrating paradox present with tooling use in Software
Engineering. This arises from the range of modeling techniques available and
the breadth of systems requiring design: Engineering nontrivial systems demands
rigour and unambiguous statement of concept, yet the more formal the modelling
approach chosen, the more abstract the tools needed, often making methods difficult
to implement, limiting the freedom of expression available to the engineer and
proving a barrier to communication amongst practitioners with lesser experience.
For these reasons less formal approaches have seen mainstream commercial acceptance
in recent years, with the Unified Modeling Language (UML) currently being the
most favoured amongst professionals.

Nevertheless, approaches like the UML are by no means perfect. Although they
are capable of capturing highly complex conceptualisations, current versions
are far from semantically rich. Furthermore they can be notoriously ambiguous:
A standard isolated model from such a language, no matter how perfect, can still
be open to gross misinterpretation by those who are not overly familiar with
its source problem space. It is true that supporting annotation and documentation
can help alleviate such issues, but traditionally this has still involved a
separate, literal, verbose and longwinded activity often disjointed from the
production of the actual model itself. Furthermore, MDA does not currently support
automated consistency checking. What is needed in addition is a way to incorporate
unambiguous, rich semantics into the various semi-formal notations underlying
methods like the UML.

Fortunately, with the advent of Semantic Web technologies, semantically rich
formal languages are now available which are much less syntactically abstract
and imposing than those previously adopted for high-end specification purposes.
Therefore, it is now possible to construct models with rich and highly rigorous
semantics using relatively simplistic predicate constructs and naming vocabularies
closely resembling natural language. This compelling combination suggests that
highly formal semantic specifications, although still not easy to produce, could
be amenable to a much wider range of IT professional and could realistically
start to increase the levels of formality prevalent in mainstream IT systems.

3. Proposed Ideas

3.1 Ontologies as Formal Model Specifications and
the Incorporation of Such Models in Semi-Formal Languages

In many respects semantic models (often referred to as 'ontologies') can be
simply considered as rigorous descriptive models in their own right, being akin
to existing conceptual modeling techniques like UML class diagrams or Entity
Relationship Models (ERM). As such, their purpose is to facilitate mutual understanding
between agents, be they human or computerised, and they achieve this through
explicit semantic representations using logic-based formalisms. Typically, these
formalisms come with executable calculi that allow querying and reasoning support
at runtime. This adds a number of advantages, specifically in the areas of:

Conformance and consistency checking

Rigorous classification and identification

Knowledge assertions and inferencing

Ease of formal specification. With the aid of graphical modeling
tools, levels of formal logic can be achieved with relative ease currently
uncommon in everyday practice.

Hence, given the semantically rich, unambiguous qualities of information embodiment
on the Semantic Web, the amenable syntax of Semantic Web languages, and the
universality of the Semantic Web's XML ancestory, there appears a compelling
argument to combine the semi-formal, model driven techniques of Software Engineering
with approaches common to Information Engineering on the Semantic Web. This
may involve the implanting of descriptive ontologies directly into systems'
design models themselves, the referencing of separate semantic metadata artifacts
by such models or a mixture of both. What is important is that mechanisms are
made available to enable cross-referencing and checking between design descriptions
and related ontologies in a manner that can be easily engineered and maintained
for the betterment of systems' quality and cost.

3.2 The Semantic Web in Systems and Software Engineering

Having raised the idea of using of the Semantic Web in Software Engineering,
a commonly asked question arises, namely; how does one broadly characterise
the Semantic Web in terms of Systems or Software' Engineering use? In attempting
to answer this question, consensus appears to be forming around two loose definitions:

As a 'classification', merely to group together related tools
and techniques for modeling rigorous semantics during specification and design
stages of the Software Lifecycle.

Primarily such tools and techniques should be viewed as being formally
descriptive in character, but there appears little reason to restrict this
definition other than standards alignment. Therefore, it may also be relevant,
at some appropriate point in the Semantic Web's future, to include prescriptive,
invasive and/or other types of approach under this heading.

As a 'mechanism' for identifying and sharing artifacts amongst
discrete subsystems, systems and systems' design teams both during design
and at runtime.

In such circumstances the Semantic Web could be viewed as a single formalised
corpus of interrelated, reusable content, which can further be classified
as being either:

3.3 A Corpus of Reusable Content and the Use of Metadata
as Relational Data

Given that The Semantic Web uses triple-based data representation as its primary
mechanism for information storage and that this is merely a specialisation of
the categorisation scheme employed for organising content in relational database
technologies, the attraction of considering the Semantic Web as a specialised
relational framework has been recognised for some time. So, by suggesting use
of the Semantic Web as a system for runtime information and component sharing
there is an implicit need to provide means for clearly identifying participating
artifacts based on composites of characterising semantic properties (metadata
in the form of name-pair/predicate-object values), and this differs from current
Semantic Web schemes for unique identification, such as FOAF sha1. In such frameworks
the Semantic Web can be seen as a truly global relational assembly of content
and, as with every relational model, issues dealing with composite object identification
have to be addressed.

Such unique identification schemes should be capable of supporting both the
interlinking of broadly related ontologies into grander information corpora
(thereby implying formal similarities and relationships between discreet ontologies
and/or systems through their classifying metadata), and the transformation of
design time component associations into useful runtime bindings. This will,
therefore, realise metadata use across a broader spectrum of the Software Lifecycle.
In so doing, this approach carries a number of obvious implications for systems
employing such techniques:

That Semantic Web technologies could be used to formalise associations between
sub-components within a given system.

That the Semantic Web could be used as a framework for design-time data
and component sharing, and this includes the concept of design models being
considered as valid, sharable artifacts in there own right.

That the Semantic Web could be used as a framework for runtime data and
component sharing between discreet and disparate systems.

That new forms of system could be created through the integration of discreet
and disparate information and functionality with semantically similar metadata.
This appears especially appealing given current advances the areas of Web
Services and Service Oriented Architectures. If underlying metadata were used
as a basis for parameterised dynamic systems behaviour, there are further
intriguing potentials in the areas of Web Service Choreography and autonomic
systems.

4. Previous Experience

Many, however, would argue that such approaches have been tried a number of
times before with only limited success, holding up numerous grandiose project
attempts at 'Corporate Enterprise Architecture' as classical examples of failure.
This may indeed by true, but it is important to remember that past attempts
have always been isolated to some degree. Standards-based formal semantic representation
targeted at hugely open problem spaces, like the Web, is, however, a new concept
and deliberately sets out to remove isolated problem solving from the equation.
It not only offers a number of distinct technical advantages, but it is also
available to a hitherto unprecedented global development community. Furthermore,
this community is steeped in a tradition of free and open knowledge exchange
and source distribution. And, if the history of the Web to date is anything
to go by, this community will eventually produce a groundswell of support and
enough impetus to kick-start a number of revolutionary changes in systems and
software engineering as a direct result of the Semantic Web. To recognise this
potential and provide early direction is hence considered to be a significantly
worthy initiative.

5. Issues

It is acknowledged that the Semantic Web still faces a number of well known
issues when attempting to implement public mechanisms for component sharing
via semantic metadata association:

Trust: How does a content consumer know if the provider of
any identified content or associated metadata is trustworthy, erroneous or
hostile?

Authority: Even if trust can be established, how does a content
consumer know if a content provider is allowed provide him with the metadata
he needs to accurately determine the relevance of the components being investigated?

Temporality: How does a content consumer know if metadata
is accurate relevant at the current point in time

Situation and Identity - A Generalisation of Inverse Functional Properties.
Tom Croucher, University of Sunderland and Joe Geldart, University of Durham,
http://osiris.sund.ac.uk/~cs0tco/eswc2005.pdf.Currently
under conference submission restrictions