Why has the pleasure of slowness disappeared? Ah, where have they gone, the
amblers of yesteryear? Where have they gone, those loafing heroes of folk song,
those vagabonds who roam from one mill to another and bed down under the
stars? Have they vanished along with footpaths, with grasslands and clearings,
with nature? There is a Czech proverb that describes their easy indolence by a
metaphor: ‘they are gazing at God’s windows.’ A person gazing at God’s windows
is not bored; he is happy. In our world, indolence has turned into having nothing
to do, which is a completely different thing: a person with nothing to do is
frustrated, bored, is constantly searching for an activity he lacks.

Slowness, Milan Kundera, 1995 (pp. 4-5).

GATE is a backplane into which specialised Java Beans plug. These beans are loose-coupled with
respect to each other - they communicate entirely by means of the GATE framework.
Inter-component communication is handled by model components - LanguageResources, and
events.

Components are defined by conformance to various interfaces (e.g. LanguageResource), ensuring
separation of interface and implementation.

The reason for adding to the normal bean initialisation mech is that LRs, PRs and VRs all have
characteristic parameterisation phases; the GATE resources/components model makes explicit
these phases.

Wherever users of the architecture may wish to extend the set of a particular type of entity, those
types should be expressed as components.

Another way to express this is to say that the architecture is based on agents. I’ve avoided this in
the past because of an association between this term and the idea of bits of code moving around
between machines of their own volition. I take this to be somewhat pointless, and probably the
result of an anthropomorphic obsession with mobility as a correlate of intelligence. If we drop this
connotation, however, we can say that GATE is an agent-based architecture. If we want to, that
is.

Many of the classes in the framework are components, by which we mean classes that conform to
an interface with certain standard properties. In our case these properties are based on the Java
Beans component architecture, with the addition of component metadata, automated loading and
standardised storage, threading and distribution.

All components inherit from Resource, via one of the three sub-interfaces LanguageResource
(LR), VisualResource (VR) or ProcessingResource (PR) VisualResources (VRs) are
straightforward – they represent visualisation and editing components that participate in
GUIs – but the distinction between language and processing resources merits further
discussion.

Like other software, LE programs consist of data and algorithms. The current
orthodoxy in software development is to model both data and algorithms together, as
objects1.
Systems that adopt the new approach are referred to as Object-Oriented (OO), and there are good
reasons to believe that OO software is easier to build and maintain than other varieties [Booch
94, Yourdon 96].

In the domain of human language processing R&D, however, the terminology is a little more
complex. Language data, in various forms, is of such significance in the field that it is
frequently worked on independently of the algorithms that process it. For example: a
treebank2
can be developed independently of the parsers that may later be trained from it; a thesaurus can
be developed independently of the query expansion or sense tagging mechanisms that may later
come to use it. This type of data has come to have its own term, Language Resources (LRs)
[LREC-1 98], covering many data sources, from lexicons to corpora.

In recognition of this distinction, we will adopt the following terminology:

Language Resource (LR):

refers to data-only resources such as lexicons, corpora,
thesauri or ontologies. Some LRs come with software (e.g. Wordnet has both a user
query interface and C and Prolog APIs), but where this is only a means of accessing
the underlying data we will still define such resources as LRs.

Processing Resource (PR):

refers to resources whose
character is principally programmatic or algorithmic, such as lemmatisers, generators,
translators, parsers or speech recognisers. For example, a part-of-speech tagger is best
characterised by reference to the process it performs on text. PRs typically include
LRs, e.g. a tagger often has a lexicon; a word sense disambiguator uses a dictionary or
thesaurus.

Additional terminology worthy of note in this context: language data refers to LRs which are at
their core examples of language in practice, or ‘performance data’, e.g. corpora of texts
or speech recordings (possibly including added descriptive information as markup);
data about language refers to LRs which are purely descriptive, such as a grammar or
lexicon.

PRs can be viewed as algorithms that map between different types of LR, and which
typically use LRs in the mapping process. An MT engine, for example, maps a
monolingual corpus into a multilingual aligned corpus using lexicons, grammars,
etc.3

Further support for the PR/LR terminology may be gleaned from the argument in favour of
declarative data structures for grammars, knowledge bases, etc. This argument was current in the
late 1980s and early 1990s [Gazdar & Mellish 89], partly as a response to what has been seen as
the overly procedural nature of previous techniques such as augmented transition networks.
Declarative structures represent a separation between data about language and the algorithms that
use the data to perform language processing tasks; a similar separation to that used in
GATE.

Adopting the PR/LR distinction is a matter of conforming to established domain practice and
terminology. It does not imply that we cannot model the domain (or build software to
support it) in an Object-Oriented manner; indeed the models in GATE are themselves
Object-Oriented.

...divides an interactive application into three components. The model
contains the core functionality and data. Views display information to the user.
Controllers handle user input. Views and controllers together comprise the user
interface. A change-propagation mechanism ensures consistency between the user
interface and the model. [p.125]

A variant of MVC, the Document-View pattern,

...relaxes the separation of view and controller... The View component of
Document-View combines the responsibilities of controller and view in MVC, and
implements the user interface of the system.

A benefit of both arrangements is that

...loose coupling of the document and view components enables multiple
simultaneous synchronized but different views of the same document.

With a few exceptions (such as for utility classes), clients of the framework work with the gate.*
package. This package is mostly composed of interface definitions. Instantiations of these interfaces
are obtained via the Factory class.

The subsidiary packages of GATE provide the implementations of the gate.* interfaces that are
accessed via the factory. They themselves avoid directly constructing classes from other packages
(with a few exceptions, such as JAPE’s need for unattached annotation sets). Instead they use the
factory.

When and how to use exceptions? Borrowing from Bill Venners, here are some guidelines (with
examples):

Exceptions exist to refer problem conditions up the call stack to a level at which they
may be dealt with. "If your method encounters an abnormal condition that it can’thandle, it should throw an exception." If the method can handle the problem rationally,
it should catch the exception and deal with it.

Example:If the creation of a resource such as a document requires a URL as a parameter, the
method that does the creation needs to construct the URL and read from it. If there is
an exception during this process, the GATE method should abort by throwing its own
exception. The exception will be dealt with higher up the food chain, e.g. by asking
the user to input another URL, or by aborting a batch script.

All GATE exceptions should inherit from gate.util.GateException (a descendant of
java.lang.Exception, hence a checked exception) or gate.util.GateRuntimeException (a
descendant of java.lang.RuntimeException, hence an unchecked exception). This rule
means that clients of GATE code can catch all sorts of exceptions thrown by the system
with only two catch statements. (This rule may be broken by methods that are not
public, so long as their callers catch the non-GATE exceptions and deal with them
or convert them to GateException/GateRuntimeException.) Almost all exceptions
thrown by GATE should be checked exceptions: the point of an exception is that clients
of your code get to know about it, so use a checked exception to make the compiler
force them to deal with it. Except:

Example:With reference to the previous example, a problem using the URL will be signalled by
something like an UnknownHostException or an IOException. These should be caught
and re-thrown as descendants of GateException.

In a situation where an exceptional condition is an indication of a bug in the GATE
library, or in the implementation of some other library, then it is permissible to throw
an unchecked exception.

Example:If a method is creating annotations on a document, and before creating the annotations
it checks that their start and end points are valid ranges in relation to the content
of the document (i.e. they fall within the offset space of the document, and the end
is after the start), then if the method receives an InvalidOffsetException from the
AnnotationSet.add call, something is seriously wrong. In such cases it may be best to
throw a GateRuntimeException.

Where you are inheriting from a non-GATE class and therefore have the exception
signatures fixed for you, you may add a new exception deriving from a non-GATE
class.

Example:The SAX XML parser API uses SaxException. Implementing a SAX parser for a
document type involves overiding methods that throw this exception. Where you want
to have a subtype for some problem which is specific to GATE processing, you could
use GateSaxException which extends SaxException.

Test code is different: in the JUnit test cases it is fine just to declare that each
method throws Exception and leave it at that. The JUnit test runner will pick up the
exceptions and report them to you. Test methods should, however, try and ensure that
the exceptions thrown are meaningful. For example, avoid null pointer exceptions in
the test code itself, e.g. by using assertNonNull.

"Throw a different exception type for each abnormal condition." You can go too far on this
one - a hundred exception types per package would certainly be too much - but in general
you should create a new exception type for each different sort of problem you
encounter.

Example:The gate.creole package has a ResourceInstantiationException - this deals with all problems
to do with creating resources. We could have had "ResourceUrlProblem" and
"ResourceParameterProblem" but that would probably have ended up with too many. On
the other hand, just throwing everything as GateException is too coarse (Hamish take
note!).

Put exceptions in the package that they’re thrown from (unless they’re used in many
packages, in which case they can go in gate.util). This makes it easier to find them in the
documentation and prevents name clashes.

Example:gate.jape.ParserException is correctly placed; if it was in gate.util it might clash with, for
example, gate.xml.ParserException if there was such.