Stanford Log-linear Part-Of-Speech Tagger

A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads
text in some language and assigns parts of speech to each word (and
other token), such as noun, verb, adjective, etc., although generally
computational applications use more fine-grained POS tags like
'noun-plural'.
This software is a Java implementation of the log-linear part-of-speech
taggers described in these papers (if citing just one paper, cite the
2003 one):

The tagger was originally written by Kristina Toutanova. Since that
time, Dan Klein, Christopher Manning, William Morgan, Anna Rafferty,
Michel Galley, and John Bauer have improved its speed, performance, usability, and
support for other languages.

The system requires Java 1.8+ to be installed. Depending on whether
you're running 32 or 64 bit Java and the complexity of the tagger model,
you'll need somewhere between 60 and 200 MB of memory to run a trained
tagger (i.e., you may need to give java an
option like java -mx200m). Plenty of memory is needed
to train a tagger. It again depends on the complexity of the model but at
least 1GB is usually needed, often more.

Several downloads are available. The basic download contains two trained
tagger models for English. The full download contains three trained English
tagger models, an Arabic tagger model, a Chinese tagger model,
a French tagger model, and a
German tagger model. Both versions include the same source and other
required files. The tagger can be retrained on any language, given
POS-annotated training text for the language.

The tagger is
licensed under the GNU
General Public License (v2 or later). Source is included.
The package includes components for command-line invocation, running as a
server, and a Java API.
The tagger
code is dual licensed (in a similar manner to MySQL, etc.).
Open source licensing is under the full GPL,
which allows many free uses.
For distributors of
proprietary
software, commercial licensing is available.
If you don't need a commercial license, but would like to support
maintenance of these tools, we welcome gift funding.

We have 3 mailing lists for the Stanford POS Tagger,
all of which are shared
with other JavaNLP tools (with the exclusion of the parser). Each address is
at @lists.stanford.edu:

java-nlp-user This is the best list to post to in order
to send feature requests, make announcements, or for discussion among JavaNLP
users. (Please ask support questions on
Stack Overflow using the
stanford-nlp tag.)

You have to subscribe to be able to use this list.
Join the list via this webpage or by emailing
java-nlp-user-join@lists.stanford.edu. (Leave the
subject and message body empty.) You can also
look at
the list archives.

java-nlp-announce This list will be used only to announce
new versions of Stanford JavaNLP tools. So it will be very low volume (expect 1-3
messages a year). Join the list via this webpage or by emailing
java-nlp-announce-join@lists.stanford.edu. (Leave the
subject and message body empty.)

java-nlp-support This list goes only to the software
maintainers. It's a good address for licensing questions, etc. For
general use and support questions, you're better off joining and using
java-nlp-user.
You cannot join java-nlp-support, but you can mail questions to
java-nlp-support@lists.stanford.edu.

The basic download is a 24 MB zipped file with support for
tagging English. The full download is a 124 MB zipped
file, which includes additional English models and trained models
for Arabic, Chinese, French, Spanish, and German.
In both cases most of the file size is due to the trained model
files. The only difference between the two downloads is the number
of trained models included.
If you unpack the tar file, you should have everything
needed. This software provides a GUI demo, a command-line interface,
and an API. Simple scripts are included to invoke the tagger.
For more information on use, see the included README.txt.

Matlab: Utkarsh Upadhyay provides
Matlab
function for accessing the Stanford POS tagger. But note that it
loads the tagger each time it is called, and you don't want to do
that! You should load the tagger only once and then re-use it.
https://github.com/musically-ut/matlab-stanford-postagger

Faster Arabic and German models.
Compatible with other recent Stanford releases.
English /
Full

3.0

2010-05-21

Tagger is now re-entrant. New tagger objects are loaded with tagger = new MaxentTagger(path) and then used with tagger.tagMethod...English /
Full

2.0

2009-12-24

An order of magnitude faster, slightly more accurate best model,
more options for training and deployment.
English /
Full

1.6

2008-09-28

A fraction better, a fraction faster, more flexible model specification,
and quite a few less bugs.
English /
Full

1.5.1

2008-06-06

Tagger properties are now saved with the tagger, making taggers more portable; tagger can be trained off of treebank data or tagged text; fixes classpath bugs in 2 June 2008 patch; new foreign language taggers released on 7 July 2008 and packaged with 1.5.1.
English /
Full /
Updated models

1.5

2008-05-21

Added taggers for several languages, support for reading from and writing to XML, better support for
changing the encoding, distributional similarity options, and many more small changes; patched on 2 June 2008 to fix a bug with tagging pre-tokenized text.
English /
Full