1. Introduction

This memo describes Swish, a software package that is intended
to provide a framework for building programs in Haskell that
perform inference on RDF [1] data.

Swish is primarily intended to be used as a starting point for
developing new RDF applications in Haskell, but it also includes
a stand-alone program that can be used to perform some simple
manipulation of RDF data.
Currently, only the Notation 3 [4]
serialization form is supported, but I'd like to add support
for full RDF/XML in due course.

The software can be downloaded from the web by following links
from http://www.ninebynine.org/Software/Intro.html.
This software is made generally available under the GNU General
Public Licence (GPL), version 2 (or later), a full copy of
which is available at
http://www.gnu.org/licenses/gpl.html, and
also as file GPL.TXT in the Swish software distribution.
Other licensing arrangements may be negotiated with the
author: see LICENSING.TXT in the Swish software distribution.

2. Background

The Swish software package was developed as a result of
experience using CWM [5],
an off-the-shelf general-purpose RDF processing program,
for evaluating simple inference rules on network access
control information expressed in RDF
[19].
Specifically, while the general inference capabilities
of CWM were almost sufficient for the network access
application, some capabilities were required that are
unlikely to be provided by any completely general-purpose
tool; e.g. analysis of IP network addresses by subnet
and host address.

Additionally, the framework for datatyped literals
currently proposed by the RDFcore working group
[8] is quite open-ended,
and it is not specified how generic applications may
provide support for new or non-standard datatypes.

In light of these considerations, I sought ways of combining
the full expressive capability of a general purpose programming
language with the declarative style of inference rules and
formal specifications.
Using Haskell [11],
a pure functional programming language,
is the approach adopted.

To use Haskell as a basis for performing inference on RDF data,
certain capabilities are neded:

I anticipate that the main use for Swish will be as a
support library for new utilities that apply predefined
RDF inference rules. Where CWM is a general-purpose tool
for manipulating RDF, I expect to use Swish as a toolkit for
creating tools to deal with specific RDF processing requirements.
In time, this may lead to identification of some useful
capabilities that can guide the design of future
general-purpose RDF processing tools.

The programming language Haskell was chosen for a number of
reasons:

it is a pure functional language supporting a style
of programming that is directly based on a specification
that is being implemented,

it is non-strict, employing lazy evaluation,
meaning that, expressions are generally not evaluated
unless and until their value is needed; this further
supports a style of programming that is close to the
typical form of mathematical-style of description
that is often used to specify RDF inference rules, and

it is relatively mature, with good-quality
implementations available for a variety of platforms,
supported by an active and extensive user community,
and performance for compiled code that is comparable
with that of more conventional programming languages.

More information about Haskell can be found at
[11].
A useful paper discussing some particular characteristics of
functional programming languages is [17].

Swish.hs, SwishMain.hs, SwishMonad.hs, SwishCommands.hs:
Swish is a command line utility that performas some simple
RDF processing tasks: checking Notation 3 syntax, comparing
RDF graphs, merging RDF graphs. The source code may also
be used as a basis for creating new RDF processing
utilities.

SwishTest.hs,
RDFGraphTest.hs,
N3ParserTest.hs,
N3FormatterTest.hs,
GraphTest.hs,
LookupMapTest.hs,
DateTimeTest.hs,
ParseURITest.hs,
ParseTest.hs,
URITest.hs:
these are all test modules, and are not part of any
swish application. They use the HUnit library to
run a series of unit tests on the various swish
components. Each of these compiles to a stand-alone
program that runs a series of test cases.

RDFGraph.hs: this defines data types and methods for
RDF graphs.
It defines a datatype RDFLabel (URIs, bnodes, RDF literals
and a couple of other extensions for future use),
and datatype NSGraph that implements the graph class
interface.
(NSGraph is similar to GraphMem but adds prefix and
formula lists to the content of a graph, and also defines
a graph merge function that merges graphs with renaming of
bnodes as needed to preserve semantics.)
RDFGraph is a type synonym for NSGraph RDFLabel.

N3Parser.hs: a Notation3 parser that converts string
into an RDFGraph value. The parser is built using the
Parsec library (except the URI parser that is currently
based on some private parsing components).

N3Formatter.hs: a Notation3 formatter that converts an
RDFGraph value to a string, or to a ShowS function.
The formatter uses a monad to track formatter state.
Currently, the formatter is pretty primitive, and is
intended to minimally satisfy the round-tripping of RDF
graphs via Notation3 format (also using N3Parser).
Future versions may try to generate more human-friendly
output.

GraphClass.hs: some basic definitions for a type class
for representing and manipulating labelled directed
graphs.

GraphMem.hs: a graph implementation based on a simple
in-memory list structure. There is much scope for
performance improvement by replacing the list structure
with a more efficiently accessed structure. (Ralf Hinze
http://www.informatik.uni-bonn.de/~ralf/
has some interesting options for this; I like the idea
of using a splay trees.)
This code needs a reworking to properly separate the
graph container structure from the graph node type.

GraphMatch.hs: graph comparison function. This function
tests two graphs for isomorphism, taking account of nodes
that are "variable" (i.e. may be freely renamed without
changing the fundmanetal graph structure). The code
implements an algorithm by Jeremy Carroll
[6].

URI.hs, ParseURI.hs: URI handling code, including
methods to deduce and process relative URI references.
The code is based on an early draft of the revised URI
specification [7], and may
be subject to revision as that specification progresses.

Parse.hs: Simple parser code, based on some example
programs in Simon Thompson's book
The Craft of Functional Programming
[18].
This was written to support the URI handler, which was
my very first program in Haskell, and suffers from my
early confusion about how to use Haskell.
In due course, the URI parser will be revised to use the
Parsec library, and this code will be retired.

LookupMap.hs: class and function definitions for lookup
tables, which are used in various ways throughout the
RDF graph handling code. (The Swish package would
benefit from greatly from this code using a hash table
or some other more serious lookup algorithm.)

ListHelpers.hs, MiscHelpers.hs: a few useful helper
functions.

Data/*.n3: subdirectory Data of the main installation
directory contains a number of Notation3 data files that
are used to test the Swish utility, and in particular
are assumed to be present by the SwishTest program.

ghcc.bat, MakeSwish.bat, TestSwish.bat:
these are files used to build and test the Swish software on
a Windows operating system,
using the Glasgow Haskell Compiler (GHC).
They have been tested under Windows 2000, and should work
on any recent Windows operating system (NT, 2K, XP, 98, 95)
ghcc.bat will need to be edited to reflect the local
installation of GHC and the support libraries.
Examination of these files should also provide any
information needed to build and run the software on
non-Windows operating systems.

A small number of other files contain further information
about the software and/or related topics.

Currently, the only supported RDF graph serialization format
is Notation3, but future developments may add support
for other formats. RDF/XML would clearly be most desirable.
Meanwhile, utilities such as CWM [5]
can be used convert RDF/XML to and from Notation 3 format.

The Swish utility is a command-line utility that performs
some simple RDF processing functions. The capabilities provided
are with a view to testing the underlying RDF library software
rather than performing any particular application purpose.

A Swish command contains a one or more command line options that
are processed from left-to-right.
The Swish program maintains an internal graph workspace, which
is updated or referenced as the command options are processed.

Swish command options:

-?

Displays a summary of the command line options.

-n3

Indicates that Notation3 be used for subsequent input
and output. (Currently, this is the only format option,
and is selected by default.)

-i[=file]

read file into the graph workspace,
replacing any existing graph.
If the filename is omitted, the graph is read from
standard input.

-m[=file]

read and merge file with the graph workspace.
Blank nodes in the input file are renamed as necessary
to avoid node identifiers already used by the existing
graph.
If the filename is omitted, the graph is read from
standard input.

-c[=file]

read file and compare the resulting graph with the workspace.
Graph comparison is done in a fashion that treats isomorphic
graphs as equivalence, and is insensitive to renaming of
blank nodes. This is intended to match the definition
of graph equivalence in the RDF abstract syntax
specification [10].
If the filename is omitted, the graph is read from
standard input.
If the graphs are unequal, the exit status code is 1.

-o[=file]

write the graph workspace to a file.
If the filename is omitted, the graph is written to
the standard output.

The Swish program terminates with a status code that
indicates the final status of the operation(s) performed.
Haskell distinguishes between a success status code whose
value is not specified, assumed to be system dependent,
and a failure code which is associated with an integer
value.
The status code values returned by Swish are:

swich -n3 -i=file1 -c=file2
Read 'file1' and 'file2' as Notation3, report any syntax
errors, and if both are OK, compare the resulting
graphs, indicating whether or not they are equivalent.

swish -n3 -i=file1 -o=file2
Read 'file1' as Notation3, report any syntax errors,
and output the resulting graph as reformatted Notation3.
(The output will generally be different form the input,
and may be unpretty.
It may be used to test round-tripping of Notation 3 data.)

4. Software installation

The Swish software is distributed as a single ZIP archive.
Start installation by creating an empty directory for the
software, and extracting the content of the ZIP archive into
that directory. Select the ZIP option that uses directory
information from the archive so that the sub-directory
structure is preserved.

The following sections deal with how get get the software
running in different Haskell environments. The instructions
relate to MS Windows operating systems, but it should be
fairly obvious how to adapt the procedures for Unix/Linux
systems.

Swish is written entirely in Haskell, and should work with
any Haskell system that supports Haskell 98 together with
the extensions noted below. The software has been tested
using Hugs [12] (version November 2002),
Glasgow Haskell Compiler (GHC) [13]
(version 5.04.3)
and the interactive version of GHC (GHCi).

The required extensions to standard Haskell-98 are:

Multi-parameter classes.

Hierarchical libraries.

Control.Monad.Trans and Control.Monad.State libraries,
as distributed with GHC or Hugs.

Some freely available additional Haskell libraries are used,
as described later. For convenience, these are included
with the Swish software distribution, but are not themselves
part of the Swish software for licensing purpose. More details
are given later.

My development has been performed mostly using Hugs on a
1.3GHz PC with 256Mb of memory.
For most purposes, this has been more than adequate.
Some of the larger test cases, and the more perverse graph
comparisons, may take several minutes to run on this platform
(SwishTest takes about 20 minutes). In practice, the
applications are likely to be more demanding than basic
requirements of Swish.

Running the Swish software under Hugs is straightforward.
The Hugs options -98 and +N must be specified.

Special steps that might help include:

Edit the Hugs registry key to include additional
directories for the Parsec, HUnit and Sort libraries.

Edit the Hugs registry key to increase the Hugs heap
size (Hugs -h option) for some of the test cases.
(Since doing that I've done some code tuning,
which may mean the number below is unnecessarily high.)

I have applied changes by using RegEdt32 to modify the
registry key
HKEY_CURRENT_USER\Software\Haskell\Hugs\Nov 2002\Options,
though some of these changes can be applied using the
Hugs :set command. By saving these changes in the
registry, I can run Hugs directly from my editor,
or by activiating a source file in file manager,
and have all the right options applied.

To run a program in Hugs, use the :load command to
load the main program module; if the library paths
are set correctly, all other files required will be
automatically located and loaded. When a program is
loaded, type evaluate 'main' to execute it.
The Swish program itself is an exception to the above:
it is not designed to be run from an interactive shell.
To achieve the same effect, load Swishtest into Hugs, and
evaluate

runSwish "command options"

noting that the command line must be supplied as a
Haskell string expression (e.g. in double quotes).

Running the Swish software under GHCi is almost as easy
as using Hugs. GHCi command line options used include
'-fglasgow-exts' and
'-iF:\Haskell\Lib\HUnit;F:\Haskell\Lib\Parsec;F:\Haskell\Lib\Sort'
(adjusted according to the directories containing the library
files).
Working under MS-Windows,
I find it convenient to create a desktop shortcut to run GHCi,
specifying the Swish source directory and other options as
properties of the shortcut.

To run a program in the GHCi command interpreter, follow
the same procedure that is described for running a program
under Hugs. The GHCi and Hugs command shells are very
similar.

There is a GHCi initialization file '.ghci' that
if placed in the appropriate startup directory is read
automatically by GHCi and defines some convenient commands
for running the non-interactive GHC compiler from within
the GHCi shell.

MS-Windows command scripts have been prepared to compile
and run the Swish software in an MS-Windows command
window. It should be straightforward to create Unix
equivalents using information from these.
The relevant files are:

MakeSwish.bat, to compile and link all the programs.

ghcc.bat, a helper file used by MakeSwish.bat to
compile and link a single program.

TestSwish.bat, to run the various test programs.

The file ghcc.bat assumes a standard GHC installation,
with the GHC compiler is on the current search path,
and will probably need to be edited to refelct the
actual locations of the support libraries used.

Once the programs have been compiled and linked,
they can be run in the usual way by using entering the
program name at a command prompt. The test programs
do not expect any command line options and run to completion.
The program Swish.exe takes command line options as
descriped above.

A Swish installation under GHC can be tested by running
the command script TestSwish.bat, and ensuring that all
tests complete with zero errors. On a 1.7GHz PC running
Windows 2000, the tests take a few minutes to complete.

To test the installation from an interactive shell, the test
programs need to be loaded and executed individually.
To confirm a successful installation, it is probably
sufficient to load and run RDFGraphTest, which should
complete quite quickly (about 30 seconds under Hugs
on a 1.3GHz PC),
then run SwishTest which takes about 20 minutes on
the same system.

Swish uses some additional libraries that are not part of the
swish software, but which are included with the Swish software
distribution for the convenience of users.

Please note that these support libraries are distributed under
their own licensing terms and conditions, which I have reproduced
below where available. Please contact the respective authors
for further information.

Redistribution and use in source and binary forms, with or
without modification, are permitted provided that the following
conditions are met:

Redistributions of source code must retain the above
copyright notice, this list of conditions and the
following disclaimer.

Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the
following disclaimer in the documentation and/or other
materials provided with the distribution.

This software is provided by the copyright holders "as is" and
any express or implied warranties, including, but not limited
to, the implied warranties of merchantability and fitness for a
particular purpose are disclaimed. In no event shall the
copyright holders be liable for any direct, indirect,
incidental, special, exemplary, or consequential damages (
including, but not limited to, procurement of substitute goods
or services; loss of use, data, or profits; or business
interruption) however caused and on any theory of liability,
whether in contract, strict liability, or tort (including
negligence or otherwise) arising in any way out of the use of
this software, even if advised of the possibility of such
damage.

HUnit is Copyright (c) Dean Herington, 2002, all rights
reserved, and is distributed as free software under the
following license.

Redistribution and use in source and binary forms, with or
without modification, are permitted provided that the
following conditions are met:

Redistributions of source code must retain the above
copyright notice, this list of conditions, and the
following disclaimer.

Redistributions in binary form must reproduce the above
copyright notice, this list of conditions, and the
following disclaimer in the documentation and/or other
materials provided with the distribution.

The names of the copyright holders may not be used to
endorse or promote products derived from this software
without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT
SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY DIRECT,
INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (
INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY
OF SUCH DAMAGE.

5. Conclusions

Swish is very much a work-in-progress, and the present
release is a first step along a path with many
possible options for future developments.

The immediate next step is to use the Swish framework
in the construction of some special-purpose RDF
inference tools. My intent is to revisit my
earlier work [19], and learn
how that work may be served by using Haskell in place
of a packaged RDF inference tool.

The Swish code itself is far from perfect, and there
is much additional functionality and improvement that
can be made.
But it does pass an extensive array of tests, and I believe
it is sufficiently stable and functional for this initial
release.

The software distribution contains a file named TODO.TXT,
which lists a number of specific possible enhancements that
have been identified to date.

6. Acknowledgements

I would like to thank the following, whose previous work has
been most helpful to me (though, of course, they bear no
responsibility for the failings of my work):

The Haskell community [11] for
not just one, but at least two, really solid Haskell
language implementations, a good range of supporting
libraries, and many points of useful advice offered
though the Haskell mailing list.

Dean Herington for his HUnit testing framework.

Daan Leijen, for his Parsec library.

This document has been authored in XML using the format described
in RFC 2629 [3], and converted to HTML using
the XML2RFC utility developed by Marshall Rose
(http://xml.resource.org/).