GATE Version 4.0 release (July 2007)

1. Major new features

1.1. ANNIC

ANNotations In Context: a full-featured annotation indexing and retrieval
system designed to support corpus querying and JAPE rule authoring. It is
provided as part of an extention of the Serial Datastores, called Searchable
Serial Datastore (SSD)
(details).

1.4. OCAT

1.5. Alignment Tools

A new set of components (e.g. CompoundDocument, AlignmentEditor etc.) that help
in building alignment tools and in carrying out cross-document processing
(details).

1.6. New HTML Parser

A new HTML document format parser, based on Andy Clark's
NekoHTML. This parser
is much better than the old one at handling modern HTML and XHTML constructs,
JavaScript blocks, etc., though the old parser is still available for existing
applications that depend on its behaviour.

1.7. Java 5.0 support

GATE now requires Java 5.0 or later to compile and run. This brings a number
of benefits:

Java 5.0 syntax is now available on the right hand side of JAPE rules
with the default Eclipse compiler
(details).

enum types are now supported for resource parameters. see
here for details on
defining the parameters of a resource.

AnnotationSet and the CreoleRegister take advantage of
generic types. The AnnotationSet interface is now an extension of
Set<Annotation> rather than just Set, which should make for
cleaner and more type-safe code when programming to the API, and the
CreoleRegister now uses parameterized types, which are
backwards-compatible but provide better type-safety for new code.

2. Other new features and improvements

Hiding the view for a particular resource (by right clicking on its tab
and selecting "Hide this view") will now completely close the associated
viewers and dispose them. Re-selecting the same resource at a later time will
lead to re-creating the necessary viewers and displaying them. This has two
advantages: firstly it offers a mechanism for disposing views that are not
needed any more without actually closing the resource and secondly it
provides a way to refresh the view of a resource in the situations where it
becomes corrupted.

The DataStore viewer now allows multiple selections. This lets users load
or delete an arbitrarily large number of resources in one operation.

The Corpus editor has been completely overhauled. It now allows
re-ordering of documents as well as sorting the document list by either index
or document name.

Support has been added for resource parameters of type
gate.FeatureMap, and it is also possible to specify a default value
for parameters whose type is Collection, List or
Set (details).

(Feature Request
1446642)
After several requests, a mechanism has been added to allow overriding of
GATE's document format detection routine. A new creation-time parameter
mimeType has been added to the standard document implementation, which
forces a document to be interpreted as a specific MIME type and prevents the
usual detection based on file name extension and other information
(details).

A capability has been added to specify arbitrary sets of additional
features on individual gazetteer entries. These features are passed forward
into the Lookup annotations generated by the gazetteer
(details).

As an alternative to the Google plugin, a new plugin called yahoo has been
added to GATE to allow users to submit their query to the Yahoo search engine
and to load the found pages as GATE documents
(details).

It is now easier to run a corpus pipeline over a single document in the
GATE GUI -- documents now provide a right-click menu item to create a
singleton corpus containing just this document
(details).

A new interface has been added that lets PRs receive notification at the
start and end of execution of their containing controller. This is useful
for PRs that need to do cleanup or other processing after a whole corpus has
been processed
(details).

The GATE GUI does not call System.exit() any more when it is closed.
Instead an effort is made to stop all active GATE threads and to release all
GUI resources, which leads to the JVM exiting gracefully. This is
particularly useful when GATE is embedded in other systems as closing the
main GATE window will not kill the JVM process any more.

The set of AnnotationSchemas that used to be included in the core
gate.jar and laoded as builtins have now been moved to the ANNIE plugin.
When the plugin is loaded, the default annotation schemas are instantiated
automatically and are available when doing manual annotation.

There is now support in creole.xml files for automatically creating
instances of a resource that are hidden (i.e. do not show in the GUI). One
example of this can be seen in the creole.xml file of the ANNIE plugin where
the default annotation schemas are defined.

A couple of helper classes have been added to assist in using GATE within
a Spring application
(details).

Improvements have been made to the thread-safety of some internal GATE
components, which mean that it is now safe to create resources in
multiple threads (though it is not safe to use the same resource instance in
more than one thread). This is a big advantage when using GATE in a
multithreaded environment, such as a web application
(details).

Plugins can now provide custom icons for their PRs and LRs in the plugin
JAR file (details).

It is now possible to override the default location for the saved session
file using a system property
(details).

The TreeTagger plugin supports a system property to specify the location
of the shell interpreter used for the tagger shell script. In combination
with Cygwin this makes it much easier to use the tagger on Windows
(details).

The Buchart plugin has been removed; it is superseded by SUPPLE. The
probability finder plugin has also been removed, as it is no longer
maintained.

The bootstrap wizard now creates a basic plugin that builds with Ant.
Since a Unix-style make command is no longer required this means that the
generated plugin will build on Windows without needing Cygwin or MinGW.

The GATE source code has moved from CVS into Subversion. See
here for details of how to check
out the code from the new repository.

An optional parameter, keepOriginalMarkupsAS, has been added to the
DocumentReset PR which allows users to decide whether to keep the Original
Markups AS or not while reseting the document
(details).

3. Bug fixes and optimizations

The Morphological Analyser has been optimized. A new FSM based, although
with minor alteration to the basic FSM algorithm, has been implemented to
optimize the GATE Morphological Analyser. The previous profiling figures show
that the morpher when integrated with ANNIE application used to take upto
60% of the overall processing time. The optimized version only takes 7.6%
of the total processing time.
(details).

The ANNIE Sentence Splitter was optimised. The new version is about twice
as fast as the previous one. The actual speed increase varies widely
depending on the nature of the document.

The implementation of the OrthoMatcher component has been
improved. This resources takes significantly less time on large documents.

The implementation of AnnotationSets has been improved. GATE now
requires up to 40% less memory to run and is also 20% faster on average.
The get methods of AnnotationSet return instances of
ImmutableAnnotationSet. Any attempt at modifying the content of
these objects will trigger an Exception. An empty
ImmutableAnnotationSet is returned instead of null.

The Chemistry tagger has been updated with a number of bugfixes and
improvements
(details).

The Document user interface has been optimised to deal better with large
bursts of events which tend to occur when the document that is currently
displayed gets modified. The main advantages brought by this new
implementation are:

The document UI refreshes faster than before.

The presence of the GUI for a document induces a smaller performance
penalty than it used to. Due to a better threading implementation, machines
benefiting from multiple CPUs (e.g. dual CPU, dual core or hyperthreading
machines) should only see a negligible increase in processing time when a
document is displayed compared to the situations where the document view is
not shown. In the previous version, displaying a document while it was
processed used to increase execution time by an order of magnitude.

The GUI is more responsive now when a large number of annotations are
displayed, hidden or deleted.

The strange exceptions that used to occur occasionally while working
with the document GUI should not happen any more.

And as always there are many smaller bugfixes too numerous to list here...