How to Use GATE

MPQA Scheme

goodFor/badFor Scheme

How to Annotate in GATE

In GATE, load the document: sample-documents/examples-untagged.xml

While it's possible to load a plain text file or a web document URL and to just begin annotating, some minimal preprocessing can make the annotation process a bit easier (see XXXX for how we preprocess a document in GATE).

Some Default Annotations

During preprocessing, a number of annotations are added to the document.

Open the Annotation Sets and Annotation List frames.

Select the 'agent' annotations for viewing. Two
zero-length agent annotations, for agents with id=implicit and id=w
(writer) will be listed. Because they are zero-length annotations,
they will not be visible in the document text.

Select the 'objective-speech' and 'sentence' annotations for viewing.
At the beginning of each sentence, a zero-length, implicit 'objective-speech'
annotation has been added for the writer. During annotation, the
'objective-speech' annotation can easily be changed to a 'direct-subjective'
annotation if the annotator feels that is the correct annotation.

There are also 'split' annotations added by GATE's sentence splitter.

Note that the 'direct-subjective' or 'expressive-subjectivity' annotation
types are not yet listed in the Annotation Sets frame. This is because there
are currently no annotations of these types in the document. When the first
direct-subjective or expressive-subjectivity annotation is added to the document,
the annotation type will then be added to the Annotation Sets frame.

Creating an Annotation

First, in the Annotation Sets frame click on MPQA. This will
select the MPQA annotation set and ensure than any annotations you
create will be properly listed under this set.

In the document text, highlight the span of text that you want to
annotate. Make sure that you do NOT accidently include any spaces at
the beginning or end of the span of text you are annotating.

EXAMPLE: "China" in the sentence,
"China said on Tuesday a U.S. State Department
report that accused Beijing of suppressing religious freedom was full
of lies and urged Washington not to hold double standard in the war on
terrorism."

The Annotation Editor Dialog window will pop up. (It may take a few
seconds.) In that window, select the annotation type. In our example, we
want to select 'agent'. The Annotation Editor Dialog window will change
to show available features (attributes) for the 'agent' type.
Also, the new annotation will be listed in the Annotation List frame,
and the color of the highlighted span will change to the 'agent' color
and begin flashing.

Start filling in the attributes for the annotation frame
you're working on. For our example, type in "china" in the id field
and "w,china" in the nested-source field.

Saving a Document

Save your document reasonably often as you annotate. GATE has no
auto-save feature!

Right click on the document name under Language Resources
OR
right click on the appropriate tab in the list of open documents at the
top of the middle frame.

Select: Save As XML.

Type in the file name that you want to give it. Example:
hr7-taw.xml. Make sure that it was saved with an .xml extension.

When you are completely done with your annotations and have saved
the document for the last time, you may want to try closing the
document in GATE (right-click -> Close), and reopening
it to check that all of you annotations were saved properly.

Finally, click on the Messages tab to see if there was an error
saving the file.

Other Recommendations

In the context of the Pitt group, one thing to be careful about when
performing annotations in GATE is that you might click into the text
area and introduce characters or white space.

This causes problems when other people are annotating the same documents
in parallel and one wants to perform an automatic comparison of
the two annotations. It also could cause a problem if the additional
material gets introduced after the gate_default file that stores the
tokenization for the xml document was created and is not updated.

The upshot is: be really, really careful not to modify the original
text!