then the output is this file
(rendered in your browser with XSLT) ... or was in July 2013.

The output isn't bad, but we might want to improve it. For example, we
might want to label university degrees as an entity. We can do that in
a simple rule-based manner with RegexNER. Here's a simple example.

The simplest rule file has two tab-separated fields on a line. (Note
that you must have a tab character between the text
and the category. Other spaces will not do.) The first
field contains a sequence of one or more space-separated regular
expressions. If the regular expressions match a sequence of tokens, the tokens will be
relabeled as the category in the second column. So our first RegexNER
file is:

Bachelor of (Arts|Laws|Science|Engineering) DEGREE

and we now run with this command, adding RegexNER to the list of annotators:

Also, if you look at the original output, you will see there are a
couple of mistakes. It misrecognizes Lalor as a PERSON, when it
is a LOCATION (an electoral seat). And it sometimes fails to
tag Labor as an ORGANIZATION when it appears not followed
by Party. To fix the first error, you need one more concept:
RegexNER will not overwrite an existing entity assignment, unless you
give it permission in a third tab-separated column, which contains a
comma-separated list of entity types that can be overwritten. This
gives us this RegexNER file, which you can also
download.

Of course, these last two rules for relabeling tokens are rather dangerous.
If you ran with this RegexNER file on the Wikipedia page for
Kieran Lalor,
then it would mess up the output badly. Similarly, the rule
for Labor would be a bad idea in an article that
discusses Labor unions.

If you want more control in checking for words in the context, or
checking parts-of-speech, then you want to start looking into
TokensRegex,
which is a more powerful (but more complex) framework for
writing rules for token labeling. Among other things, it lets you have a
whole library of rule files. Though, in general, writing rules
that cover all cases is a difficult enterprise! This is one reason why
statistical classifiers have become dominant, because they are good at
integrating various sources of evidence. But, nevertheless, a tool like
RegexNER can be very useful as an overlay that corrects or augments the
output of a statistical NLP system like
Stanford
NER.

This example showed doing things at the command line. But you can also
easily use RegexNER in code. The RegexNER rules can be in a
regular file, in a resource that is on your CLASSPATH, or even specified
by a URL. You then
specify to load RegexNER and where the RegexNER rules file is by
providing an appropriate Properties object when creating Stanford CoreNLP: