First make the two scripts executable as follows:

Then follow these steps:

I. Preprocessing-phase 1: Text-to-XML Conversion

(Here is an alternative to this step, which uses Charniak's full parser instead. Thus after using this alternative converter, you should continue with 'Preprocessing-phase 2' below and skip 'Preprocessing-phase 1'.)

Command:

txtToXML bbcexcerpt.txt bbcexcerpt.xml

Output:

None

Explanation:

The script does two things:
1. Calls another script txtXMLPipeline which produces (in this example) the file bbcexcerpt.xml2. Runs class XMLTokeniser to produce the following file tagged.bbcexcerpt.xml (for this example)

Note: The script txtXMLPipeline makes an external call to ltchunk, a chunker which is part of the LT-XML suite of tools
developed by the University of Edinburgh's LTG.
To my understanding an evaluation copy may be requested from them and also an online demo is available at
http://www.ltg.ed.ac.uk/~mikheev/tagger_demo.html.

I presume the online demo might be used in this example, only the corresponding processed text as defined by the flow of
the script (the pipeline) should be pasted into the text box and then the resulting text (processed by ltchunk) should be
passed over to the next steps of the script.

II. Preprocessing-phase 2: Syntactic heuristics

Command:

Output:

File: tagged.bbcexcerpt.xml
Number of NEs: 26

Explanation:

For every ne marked-up by the chunker this class does the following:
1. Adds four attributes: np type and the three np agreement features - person, number and gender.
2. Markes-up the premodifiers
3. Markes-up the head
4. Markes-up the postmodifiers

III. Anaphora Resolution

Command:

Output:

Explanation:

This is the actual anaphora resolution module, which takes as an input a MAS-XML compliant file and adds new mark-up
holding anaphoric information ( elements). The resulting file for this example is
processed.masxml.tagged.bbcexcerpt.xml.

IV. Anaphora Resolution Evaluation

This example does not make use of the evaluation module which is also included in the jar file, because the text file
employed in the example has not been annotated, thus no reference annotation is available. If a reference annotation is
available it should come in a MAS-XML compliant file with additional elements holding the anaphoric information (the
format is exactly the same as for the elements added by the system, the only difference being the tag name
itself).

The following command may be issued to invoke the evaluation module:
java -cp gtar1.1.jar GTAR_Evaluation masxml.referenceAnnotation.xml > performance.log

Then performance.log may be imported to Excel using the symbol ^ as field separator.

Finally here is the (fairly complete, but a bit out of date :-) API specification of the system (generated by javadoc including private methods).