Package edu.stanford.nlp.pipeline Description

Linguistic Annotation Pipeline

The point of this package is to enable people to quickly and
painlessly get complete linguistic annotations of their text. It
is designed to be highly flexible and extensible. I will first discuss
the organization and functions of the classes, and then I will give some
sample code and a run-down of the implemented Annotators.

Annotation

An Annotation is the data structure which holds the results of annotators.
An Annotations is basically a map, from keys to bits of annotation, such
as the parse, the part-of-speech tags, or named entity tags. Annotations
are designed to operate at the sentence-level, however depending on the
Annotators you use this may not be how you choose to use the package.

Annotators

The backbone of this package are the Annotators. Annotators are a lot like
functions, except that they operate over Annotations instead of Objects.
They do things like tokenize, parse, or NER tag sentences. In the
javadocs of your Annotator you should specify what the Annotator is
assuming already exists (for instance, the NERAnnotator assumes that the
sentence has been tokenized) and where to find these annotations (in
the example from the previous set of parentheses, it would be
TextAnnotation.class). They should also specify what they add
to the annotation, and where.

AnnotationPipeline

An AnnotationPipeline is where many Annotators are strung together
to form a linguistic annotation pipeline. It is, itself, an
Annotator. AnnotationPipelines usually also keep track of how much time
they spend annotating and loading to assist users in finding where the
time sinks are.
However, the class AnnotationPipeline is not meant to be used as is.
It serves as an example on how to build your own pipeline.
If you just want to use a typical NLP pipeline take a look at StanfordCoreNLP
(described later in this document).

Sample Usage

Here is some sample code which illustrates the intended usage
of the package:

How Do I Use This?

You do not have to construct your pipeline from scratch! For the typical NL processors, use
StanfordCoreNLP. This pipeline implements the most common functionality needed: tokenization,
lemmatization, POS tagging, NER, parsing and coreference resolution. Read below for how to use
this pipeline from the command line, or directly in your Java code.