–HAL is an artificial agent capable of such advanced language processingbehavior as speaking and understanding English, and at a crucial momentin the plot, even reading lips.

•The language-related parts of HAL

–Speech recognition

–Natural language understanding (and, of course, lip-reading),

–Natural language generation

–Speech synthesis

–Information retrieval

–information extraction and

–inference

Introduction to NLP

3

Background

•Solving the language-related problems and others like them, is themain concern of the fields known as Natural Language Processing,Computational Linguistics, and Speech Recognition and Synthesis,which together we callSpeech and Language Processing(SLP).

•Applications of language processing

–spelling correction,

–grammar checking,

–information retrieval, and

–machine translation.

Introduction to NLP

4

1.1 Knowledge in Speech and Language Processing

•By SLP, we have in mind those computational techniques that processspoken and written human language,as language.

•What distinguishes these language processing applications from otherdata processing systems is their use ofknowledge of language.

•Unix wc program

–When used to count bytes and lines, wc is an ordinary data processingapplication.

–However, when it is used to count the words in a file it requiresknowledgeabout what it means to be a word, and thus becomes a languageprocessing system.

Introduction to NLP

5

1.1 Knowledge in Speech and Language Processing

•Both the tasks of being capable of analyzing an incoming audio signaland recovering the exact sequence of words and generating itsresponse require knowledge aboutphonetics and phonology, whichcan help model how words are pronounced in colloquial speech(Chapters 4 and 5).

•Producing and recognizing the variations of individual words (e.g.,recognizing thatdoorsis plural) requires knowledge aboutmorphology, which captures information about the shape andbehavior of words in context (Chapters 2 and 3).

Introduction to NLP

6

1.1 Knowledge in Speech and Language Processing

•Syntax:

the knowledge needed to order and group words together

HAL, the pod bay door is open.

HAL, is the pod bay door open?

I’m I do, sorry that afraid Dave I’m can’t.

(Dave, I’m sorry I’m afraid I can’t do that.)

Introduction to NLP

7

1.1 Knowledge in Speech and Language Processing

•Lexical semantics:knowledge of the meanings of the componentwords

•Compositional semantics:knowledge of how these componentscombine to form larger meanings

–To know that Dave’s command is actually about opening the pod bay door,rather than an inquiry about the day’s lunch menu.

Introduction to NLP

8

1.1 Knowledge in Speech and Language Processing

•Pragmatics:the appropriate use of the kind of polite and indirectlanguage

Noor

No, I won’t open the door.

I’m sorry, I’m afraid,I can’t.

I won’t.

Introduction to NLP

9

1.1 Knowledge in Speech and Language Processing

•discourse conventions:

knowledge of correctly structuring these suchconversations

–HAL chooses to engage in a structured conversation relevant to Dave’sinitial request. HAL’s correct use of the wordthat

in its answer to Dave’srequest is a simple illustration of the kind of between-utterance devicecommon in such conversations.

Dave, I’m sorry I’m afraid I can’t dothat.

Introduction to NLP

10

1.1 Knowledge in Speech and Language Processing

•Phonetics and Phonology—

The study of linguistic sounds

•Morphology—The study of the meaningful components of words

•Syntax—The study of the structural relationships between words

•Semantics—

The study of meaning

•Pragmatics—

The study of how language is used to accomplish goals

•Discourse—The study of linguistic units larger than a single utterance

Introduction to NLP

11

1.2 Ambiguity

•A perhaps surprising fact about the six categories of linguisticknowledge is that most or all tasks in speech and language processingcan be viewed as resolvingambiguityat one of these levels.

•We say some input is ambiguous

–if there are multiple alternative linguistic structures than can be built for it.

•The spoken sentence,I made her duck,has five different meanings.

–(1.1) I cooked waterfowl for her.

–(1.2) I cooked waterfowl belonging to her.

–(1.3) I created the (plaster?) duck she owns.

–(1.4) I caused her to quickly lower her head or body.

–(1.5) I waved my magic wand and turned her into undifferentiatedwaterfowl.

Introduction to NLP

12

1.2 Ambiguity

•These different meanings are caused by a number of ambiguities.

–Duckcan be a verb or a noun, whilehercan be a dative pronoun or apossessive pronoun.

–The wordmakecan meancreateorcook.

–Finally, the verbmakeis syntactically ambiguous in that it

can betransitive (1.2), or it can be ditransitive (1.5).

–Finally,makecan take a direct object and a verb (1.4), meaning that theobject (her) got caused to perform the verbal action (duck).

–In a spoken sentence, there is an even deeper kind of ambiguity; the firstword could have beeneyeor the second wordmaid.

Introduction to NLP

13

1.2 Ambiguity

•Ways toresolveordisambiguatethese ambiguities:

–Deciding whetherduckis a verb or a noun can be solved bypart-of-speechtagging.

–Deciding whethermakemeans“create”

or“cook”

can be solved byword sensedisambiguation.

–Resolution of part-of-speech and word sense ambiguities are two important kindsoflexical disambiguation.

•A wide variety of tasks can be framed as lexical disambiguation problems.

–For example, a text-to-speech synthesis system reading the wordleadneeds todecide whether it should be pronounced as inlead pipeor as inlead me on.

•Deciding whetherherandduckare part of the same entity (as in (1.1) or (1.4))or are different entity (as in (1.2)) is an example ofsyntactic disambiguationand can be addressed byprobabilistic parsing.

•Ambiguities that don’t arise in this particular example (like whether a givensentence is a statement or a question) will also be resolved, for example byspeech act interpretation.

•Closely related to the above procedural models are their declarative counterparts:formalrule systems.

–regular grammarsandregular relations,context-free grammars,feature-augmentedgrammars, as well as probabilistic variants of them all.

•State machines and formal rule systems are the main tools used when dealing withknowledge of phonology, morphology, and syntax.

•The algorithms associated with both state-machines and formal rule systems typicallyinvolve asearch through a space of states representing hypotheses about an input.

•Representative tasks include

–searching through a space of phonological sequences for a likely input word in speechrecognition, or

–searching through a space of trees for the correct syntactic parse of an input sentence.

•Among the algorithms that are often used for these tasks are well-known graphalgorithms such asdepth-first search, as well as heuristic variants such asbest-first,andA* search.

•The dynamic programming paradigm is critical to the computational tractability of manyof these approaches by ensuring that redundant computations are avoided.

Introduction to NLP

17

1.3 Models and Algorithms

•The third model that plays a critical role in capturing knowledge oflanguage islogic.

•We will discuss

–first order logic, also known as thepredicate calculus, as well as

–such related formalisms as feature-structures,

–semantic networks, and

–conceptual dependency.

•These logical representations have traditionally been the tool of choicewhen dealing with knowledge of semantics, pragmatics, and discourse

(although, as we will see, applications in these areas are increasinglyrelying on the simpler mechanisms used in phonology, morphology,and syntax).

Introduction to NLP

18

1.3 Models and Algorithms

•Each of the other models (state machines, formal rule systems, and logic) canbe augmented with probabilities.

•Onemajor use of probability theory

is to solve the many kinds of ambiguityproblems that we discussed earlier;

–almost any speech and language processing problem can be recast as:“given Nchoices for some ambiguous input, choose the most probable one”.

•Anothermajor advantage of probabilistic models

is that

–they are one of a class ofmachine learningmodels.

•Machine learning research has focused on ways to automatically learn thevarious representations described above;

–automata, rule systems, search heuristics, classifiers.

•These systems can be trained on large corpora and can be used as a powerfulmodeling technique, especially in places where we don’t yet have good causalmodels.

Introduction to NLP

19

1.4 Language, Thought, and Understanding

•The effective use of language is intertwined with our general cognitiveabilities.

•Turing Testby Alan Turing (1950)

–He suggested an empirical test, a game, in which a computer’s use oflanguage would form the basis for determining if it could think. If themachine could win the game it would be judged intelligent.

Introduction to NLP

20

1.4 Language, Thought, and Understanding

•ELIZA program (Weizenbaum, 1966)

–ELIZA was an early natural language processing system capable ofcarrying on a limited form of conversation with a user.

•Consider the following session with a version of ELIZA that imitatedthe responses of a Rogerian psychotherapist.

User1: You are like my father in some ways.

ELIZA1: WHAT RESEMBLANCE DO YOU SEE

User2: You are not very aggressive but I think you don’t want me to notice that.

ELIZA2: WHAT MAKES YOU THINK I AM NOT AGGRESSIVE

User3: You don’t argue with me.

ELIZA3: WHY DO YOU THINK I DON’T ARGUE WITH YOU

User4: You are afraid of me.

ELIZA4: DOES IT PLEASE YOU TO BELIEVE I AM AFRAID OF YOU

Introduction to NLP

21

1.4 Language, Thought, and Understanding

•ELIZA is a remarkably simple program that makes use of pattern-matching to process the input and translate it into suitable outputs.

•The success of this simple technique in this domain is due to the factthat ELIZA doesn’t actually need toknowanything to mimic aRogerian psychotherapist.

•Eliza

•A. L. I. C. E. Artificial Intelligence Foundation

•Loebner Prize competition, since 1991,

–An event has attempted to put various computer programs to the Turingtest.

Introduction to NLP

22

1.5 The State of the Art and the Near-term Future

•Some current applications and near-term possibilities

–A Canadian computer program accepts daily weather data and generatesweather reports that are passed along unedited to the public in English andFrench (Chandioux, 1976).

–TheBabel Fishtranslation system from Systran handles over 1,000,000translation requests a day from the AltaVista search engine site.

–A visitor to Cambridge, Massachusetts, asks a computer about places toeat using only spoken language. The system returns relevant informationfrom a database of facts about the local restaurant scene (Zue et al., 1991).

Introduction to NLP

23

1.5 The State of the Art and the Near-term Future

•Somewhat more speculative scenarios

–A computer reads hundreds of typed student essays and grades them in amanner that is indistinguishable from human graders (Landauer et al.,1997).

–An automated reading tutor helps improve literacy by having children readstories and using a speech recognizer to intervene when the reader asks forreading help or makes mistakes (Mostow and Aist, 1999).

–A computer equipped with a vision system watches a short video clip of asoccer match and provides an automated natural language report on thegame (Wahlster, 1989).

•Chomsky (1956), drawing the idea of a finite state Markov processfrom Shannon’s work, first considered finite-state machines as a way tocharacterize a grammar, and defined a finite-state language as alanguage generated by a finite-state grammar.

•These early models led to the field offormal language theory, whichused algebra and set theory to define formal languages as sequences ofsymbols.

–This includes the context-free grammar, first defined by Chomsky (1956)for natural languages but independently discovered by Backus (1959) andNaur et al. (1960) in their descriptions of the ALGOL programminglanguage.

Introduction to NLP

27

1.6 Some Brief History

Foundational Insights: 1940s and 1950s

•The second foundational insight of this period was the development ofprobabilistic algorithms for speech and language processing, whichdates to Shannon’s other contribution:

–the metaphor of thenoisy channelanddecodingfor the transmission oflanguage through media like communication channels and speechacoustics.

–Shannon also borrowed the concept ofentropyfrom thermodynamics as away of measuring the information capacity of a channel, or theinformation content of a language, and performed the first measure of theentropy of English using probabilistic techniques.

–It was also during this early period that the sound spectrograph wasdeveloped (Koenig et al., 1946), and foundational research was done ininstrumental phonetics that laid the groundwork for later work in speechrecognition.

•This led to the first machine speech recognizers in the early 1950s.

Introduction to NLP

28

1.6 Some Brief History

The Two Camps: 1957–1970

•By the end of the 1950s and the early 1960s, SLP had split verycleanly into two paradigms:symbolic

andstochastic.

•The symbolic paradigm took off from two lines of research.

–Thefirst

was the work of Chomsky and others on formal language theoryand generative syntax throughout the late 1950s and early to mid 1960s,and the work of many linguistics and computer scientists on parsingalgorithms, initially top-down and bottom-up and then via dynamicprogramming.

–One of the earliest complete parsing systems was Zelig Harris’sTransformations and Discourse Analysis Project (TDAP), which wasimplemented between June 1958 and July 1959 at the University ofPennsylvania (Harris, 1962). (continued)

Introduction to NLP

29

1.6 Some Brief History

The Two Camps: 1957–1970

–The second line of research was the new field of artificial intelligence.

•In the summer of 1956 John McCarthy, Marvin Minsky, Claude Shannon, andNathaniel Rochester brought together a group of researchers for a two-monthworkshop on what they decided to call artificial intelligence (AI).

•Although AI always included a minority of researchers focusing on stochasticand statistical algorithms (include probabilistic models and neural nets), themajor focus of the new field was the work on reasoning and logic

typifiedby Newell and Simon’s work on the Logic Theorist and the General ProblemSolver.

–These were simple systems that worked in single domains mainly by a combinationof pattern matching and keyword search with simple heuristics for reasoning andquestion-answering.

–By the late 1960s more formal logical systems were developed.

Introduction to NLP

30

1.6 Some Brief History

The Two Camps: 1957–1970

•The stochastic paradigm took hold mainly in departments of statistics and ofelectrical engineering.

–By the late 1950s the Bayesian method was beginning to be applied to theproblemof optical character recognition.

–Bledsoe and Browning (1959) built a Bayesian system for text-recognition thatused a large dictionary and computed the likelihood of each observed lettersequence given each word in the dictionary by multiplying the likelihoods for eachletter.

–The 1960s also saw the rise of the first serious testable psychological models ofhuman language processing based on transformational grammar, as well as the firston-line corpora: the Brown corpus of American English, a 1 million wordcollection of samples from 500 written texts from different genres (newspaper,novels, non-fiction, academic, etc.), which was assembled at Brown University in1963–64 (Kučera and Francis, 1967; Francis, 1979; Francis and Kučera, 1982),andWilliam S. Y.Wang’s 1967 DOC (Dictionary on Computer), an on-line Chinesedialect dictionary.

Introduction to NLP

31

1.6 Some Brief History

Four Paradigms: 1970–1983

•The next period saw an explosion in research in SLP and thedevelopment of a number ofresearch paradigms

that still dominatethe field.

•Thestochasticparadigm played a huge role in the development ofspeech recognition

algorithms in this period,

–particularly the use of theHidden Markov Model

and the metaphors of thenoisy channel and decoding, developed independently by Jelinek, Bahl,Mercer, and colleagues at IBM’s Thomas J. Watson Research Center, andby Baker at Carnegie Mellon University, who was influenced by the workof Baum and colleagues at the Institute for Defense Analyses in Princeton.

–AT&T’s Bell Laboratories was also a center for work on speechrecognition and synthesis; see Rabiner and Juang (1993) for descriptionsof the wide range of this work.

Introduction to NLP

32

1.6 Some Brief History

Four Paradigms: 1970–1983

•Thelogic-basedparadigm was begun by the work of Colmerauer andhis colleagues on Q-systems and metamorphosis grammars(Colmerauer, 1970, 1975),

•The program was able to accept natural language text commands(Move the red block on top of thesmaller green one)of a hitherto unseen complexity and sophistication.

•His system was also the first to attempt to build an extensive (for the time) grammar of English, based onHalliday’s systemic grammar.

–Winograd’s model made it clear that the problem of parsing was well-enough understood tobegin to focus on semantics and discourse models.

–Roger Schank and his colleagues and students (in what was often referred to as theYale School)built a series of language understanding programs that focused on human conceptualknowledge such as scripts, plans and goals, and human memory organization (Schank andAlbelson, 1977; Schank and Riesbeck, 1981; Cullingford, 1981; Wilensky, 1983; Lehnert,1977).

•The logic-based and natural-language understanding paradigms were unified on systemsthat used predicate logic as a semantic representation, such as the LUNAR question-answering system (Woods, 1967, 1973).

Introduction to NLP

34

1.6 Some Brief History

Four Paradigms: 1970–1983

•Thediscourse modelingparadigm focused on four key areas indiscourse.

–Grosz and her colleagues introduced the study ofsubstructure indiscourse, and ofdiscourse focus

(Grosz, 1977a; Sidner, 1983),

–a number of researchers began to work onautomatic referenceresolution

(Hobbs, 1978),

–and theBDI(Belief-Desire-Intention) framework for logic-based work onspeech acts was developed (Perrault and Allen, 1980; Cohen and Perrault,1979).

Introduction to NLP

35

1.6 Some Brief History

Empiricism and Finite State Models Redux: 1983–1993

•This next decade saw the return of two classes of models which had lostpopularity in the late 1950s and early 1960s, partially due to theoreticalarguments against them such as Chomsky’s influential review of Skinner’sVerbal Behavior(Chomsky, 1959b).

–The first class was finite-state models, which began to receive attention again afterwork on finite-state phonology and morphology by Kaplan and Kay (1981) andfinite-state models of syntax by Church (1980).

–The second trend in this period was what has been called the“return of empiricism”;most notably here was the rise of probabilistic models throughout speech andlanguage processing, influenced strongly by the work at the IBM Thomas J.Watson Research Center on probabilistic models of speech recognition.

•These probabilistic methods and other such data-driven approaches spread into part-of-speech tagging, parsing and attachment ambiguities, and connectionist approaches fromspeech recognition to semantics.

•This period also saw considerable work on natural language generation.

Introduction to NLP

36

1.6 Some Brief History

The Field Comes Together: 1994–1999

•By the last five years of the millennium it was clear that the field was vastlychanging.

•Algorithms for parsing, part-of-speech tagging, reference resolution, and discourseprocessing all began to incorporate probabilities, and employ evaluation methodologiesborrowed from speech recognition and information retrieval.

–Second, the increases in the speed and memory of computers had allowedcommercial exploitation of a number of subareas of speech and languageprocessing, in particular

•speech recognition and spelling and grammar checking.

•Speech and language processing algorithms began to be applied to Augmentative andAlternative Communication (AAC).

–Finally, the rise of the Web emphasized the need for language-based informationretrieval and information extraction.

Introduction to NLP

37

1.7 Summary

•A good way to understand the concerns of speech and languageprocessing research is to consider what it would take to create anintelligent agent likeHAL from 2001: A Space Odyssey.