Chapter 1: Introduction

Chapter 2: Regular Expressions and Automata

Chapter 3: Words and Transducers

This
new version of the chapter still focuses on morphology and FSTs, but is
expanded in various ways. There are more details about the formal
descriptions of finite-state transducers, many bugs are fixed, and two
new sections are added relating to words and subwords. The first new
section is on word and sentence tokenization, including algorithms for
English as well as the maxmatch algorithm for Chinese word
segmentation. The second new section is on spelling correction and
minimum edit distance, and is an extended version of the edit-distance
section from Chapter 5 of the first edition, with clearer figures for
example for explaining the minimum-edit-distance backtrace. (top)

Chapter 4: N-grams (Formerly Chapter 6)

This
updated language model chapter has had a complete overhaul. This draft
includes more examples, a more complete description of Good-Turing,
expanded sections on practical issues like perplexity and evaluation,
language modeling toolkits, including ARPA format, and an overview of
modern methods like interpolated Kneser-Ney. (top)

Chapter 5: Part-of-Speech Tagging (Formerly Chapter 8)

The main change to this revised chapter is a greatly expanded, and
hence self-contained, description of bigram and trigram HMM
part-of-speech tagging, including Viterbi decoding and deleted
interpolation smoothing. Together with the new Chapter 6, this allows
a complete introduction to HMMs for courses that don't use the speech
recognition chapters. Other changes in this chapter include expanded
descriptions of unknown word modeling and part-of-speech tagging in
other languages, and many bug fixes. Finally, we've moved this chapter
earlier in the book.
(top)

This new chapter introduces two sequence models: HMMs and MEMMs.
It gives the details of Hidden Markov Models,
including Forward, Viterbi, and EM. It then introduces MaxEnt models,
begining with linear regression, followed by logistic regression, then the extension
to MaxEnt, and finally the MEMM and the Viterbi intuition.
(top)

Chapter 7: Phonetics (Formerly parts of Chapters 4, 5, and 7)

This
chapter is an introduction to articulatory and acoustic phonetics for
speech processing, as well as foundational tools like the ARPAbet,
wavefile formats, phonetic dictionaries, and PRAAT. (top)

This chapter is a brief introduction to computational phonology,
including phonological and morphological learning, finite-state models,
OT, and Stochastic OT. (top)

Chapter 12: Formal Grammars of English (Formerly 9)

This
chapter still focuses on CFGs for English and includes a revamped and
somewhat expanded grammar for the ATIS domain. New and expanded
sections cover: treebanks with a focus on the Penn Treebank, searching
treebanks with tgrep and tgrep2, heads and head-finding rules,
dependency grammars, Categorial grammar, and grammars for spoken
language processing. (top)

Chapter 13: Syntactic Parsing (Formerly 10)

The
focus of this chapter is still on parsing with CFGs. It now includes
sections on CKY, Earley and agenda-based (chart) parsing. In addition,
there is a new section on partial parsing with a focus on machine
learning based base-phrase chunking and the use of IOB tags. (top)

Chapter 14: Statistical Parsing (Formerly 12)

This statistical parsing chapter has been extensively revised. It now
covers PCFGs, probabilistic CKY parsing, parent annotations, the Collins parser,
and touches on advanced topics such as discriminative reranking and parsing for
language modeling.
(top)

Chapter 15: Features and Unification (Formerly 11)

Chapter 16: Language and Complexity (Formerly 13)

Chapter 17: The Representation of Meaning (Formerly 14)

This
chapter still covers basic notions surrounding meaning representation
languages. It now has better coverage of model-theoretic semantics for
meaning representations, and a new section on Description Logics and
their role as a basis for OWL and its role in the Semantic Web. (top)

Chapter 18: Computational Semantics (Formerly 15)

This chapter covers compositional approaches to semantic analysis at the sentence level. The primary focus is
on rule-to-rule approaches based on lambda-expressions. It also now has new coverage of unification-based
approaches to computational semantics. Coverage in the old chapter 15 on semantic grammars has been moved to the
discourse chapter; coverage of information extraction has been expanded and moved to the new chapter 22.
(top)

Chapter 19: Lexical Semantics (Formerly 16)

This chapter still covers the basics of lexical semantics, including
sense relations, semantic roles, and primitive decomposition. The
treatment of semantic roles has been updated, as has the coverage of
WordNet, and new sections added for PropBank and FrameNet. (top)

The
focus of this new chapter is on computing with word meanings. The three
main topics are word sense disambiguation, computing relations between
words (similarity, hyponymy, etc.), and semantic role labeling. It
considerably expands the treatment of these topics. (top)

Chapter 21: Computational Discourse

This rewritten chapter includes a number of updates to the first edition.
The anaphora resolution section is updated to include modern log-linear
methods, and a section on the more general problem of coreference is also included.
The coherence section describes cue-based methods for rhetorical relation and coherence relation extraction.
Finally, there is a significant new section on discourse segmentation (including TextTiling).
(top)

Chapter 22: Information Extraction (New chapter: Parts of old 15)

This new chapter surveys current approaches to information extraction. The main topics are named entity recognition,
relation detection, temporal expression analysis and template-filling. The primary focus is on supervised machine learning approaches to these topics. The coverage on mostly finite-state methods (FASTUS) has been moved from the old Ch. 15 to here.
(top)

This new chapter covers two applications, question answering and summarization.
A brief introduction to the necessary background material from information retrieval is also included.
The chapter includes factoid question answering, single document summarization,
generic multiple document summarization, and query-focused summarization.
(top)

Chapter 24: Dialog and Conversational Agents (Formerly 19)

This
is a completely rewritten version of the dialogue chapter. It includes
much more information on modern dialogue systems, including VoiceXML,
confirmation and clarification dialogues, the information-state model,
markov decision processes, and other current approaches to dialogue
agents. (top)

Chapter 25: Machine Translation

The
MT chapter has been extensively rewritten and a significant new section
added covering statistical MT, including IBM Model 1, Model 3, and HMM
alignment. A new evaluation section covering human evaluation and Bleu
has also been added, as well as sections on SYSTRAN and more details on
cross-linguistic divergences. (top)