Outline Why part of speech tagging? Word classes

Similar presentations

0 Word classes and part of speech taggingReading: Chap 5, Jurafsky & MartinInstructor: Paul Tarau, based on Rada Mihalcea’s original slidesNote: Some of the material in this slide set was adapted from Chris Brew’s (OSU) slides on part of speech tagging

22 Rule-Based Tagging Basic Idea: Assign all possible tags to wordsRemove tags according to set of rules of type: if word+1 is an adj, adv, or quantifier and the following is a sentence boundary and word-1 is not a verb like “consider” then eliminate non-adv else eliminate adv.Typically more than 1000 hand-written rules, but may be machine-learned.

27 Stochastic TaggingBased on probability of certain tag occurring given various possibilitiesRequires a training corpusNo probabilities for words not in corpus.Training corpus may be different from test corpus.

28 Stochastic Tagging (cont.)Simple Method: Choose most frequent tag in training text for each word!Result: 90% accuracyBaselineOthers will do betterHMM is an example

30 Start with Bigram-HMM TaggerargmaxT P(T|W)argmaxTP(T)P(W|T)argmaxtP(t1…tn)P(w1…wn|t1…tn)argmaxt[P(t1)P(t2|t1)…P(tn|tn-1)][P(w1|t1)P(w2|t2)…P(wn|tn)]To tag a single word: ti = argmaxj P(tj|ti-1)P(wi|tj)How do we compute P(ti|ti-1)?c(ti-1ti)/c(ti-1)How do we compute P(wi|ti)?c(wi,ti)/c(ti)How do we compute the most probable tag sequence?Viterbi

36 Transformation-Based Tagging (Brill Tagging)Combination of Rule-based and stochastic tagging methodologiesLike rule-based because rules are used to specify tags in a certain environmentLike stochastic approach because machine learning is used—with tagged corpus as inputInput:tagged corpusdictionary (with most frequent tags)Usually constructed from the tagged corpus

37 Transformation-Based Tagging (cont.)Basic Idea:Set the most probable tag for each word as a start valueChange tags according to rules of type “if word-1 is a determiner and word is a verb then change the tag to noun” in a specific orderTraining is done on tagged corpus:Write a set of rule templatesAmong the set of rules, find one with highest scoreContinue from 2 until lowest score threshold is passedKeep the ordered set of rulesRules make errors that are corrected by later rules

40 TBL: The AlgorithmStep 1: Label every word with most likely tag (from dictionary)Step 2: Check every possible transformation & select one which most improves taggingStep 3: Re-tag corpus applying the rulesRepeat 2-3 until some criterion is reached, e.g., X% correct with respect to training corpusRESULT: Sequence of transformation rules

41 TBL: Rule Learning (cont’d)Problem: Could apply transformations ad infinitum!Constrain the set of transformations with “templates”:Replace tag X with tag Y, provided tag Z or word Z’ appears in some positionRules are learned in ordered sequenceRules may interact.Rules are compact and can be inspected by humans

45 Tagging Unknown WordsNew words added to (newspaper) language 20+ per monthPlus many proper names …Increases error rates by 1-2%Method 1: assume they are nounsMethod 2: assume the unknown words have a probability distribution similar to words only occurring once in the training set.Method 3: Use morphological information, e.g., words ending with –ed tend to be tagged VBN.

46 EvaluationThe result is compared with a manually coded “Gold Standard”Typically accuracy reaches 96-97%This may be compared with result for a baseline tagger (one that uses no context).Important: 100% is impossible even for human annotators.Factors that affects the performanceThe amount of training data availableThe tag setThe difference between training corpus and test corpusDictionaryUnknown words