PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding.

Similar presentations

Presentation on theme: "PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding."— Presentation transcript:

1
PARSING WITH CONTEXT-FREE GRAMMARS cc437

2
PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding a derivation of the string consistent with the grammar – The derivation gives us a PARSE TREE

3
EXAMPLE (CFR LAST WEEK)

4
PARSING AS SEARCH Just as in the case of non-deterministic regular expressions, the main problem with parsing is the existence of CHOICE POINTS There is a need for a SEARCH STRATEGY determining the order in which alternatives are considered

5
TOP-DOWN AND BOTTOM-UP SEARCH STRATEGIES The search has to be guided by the INPUT and the GRAMMAR TOP-DOWN search: the parse tree has to be rooted in the start symbol S – EXPECTATION-DRIVEN parsing BOTTOM-UP search: the parse tree must be an analysis of the input – DATA-DRIVEN parsing

6
AN EXAMPLE OF TOP-DOWN SEARCH (IN PARALLEL)

7
AN EXAMPLE OF BOTTOM-UP SEARCH

8
NON-PARALLEL SEARCH If it’s not possible to examine all alternatives in parallel, it’s necessary to make further decisions: – Which node in the current search space to expand first (breadth-first or depth-first) – Which of the applicable grammar rules to expand first – Which leaf node in a parse tree to expand next (e.g., leftmost)

9
TOP-DOWN, DEPTH-FIRST, LEFT-TO-RIGHT

10
TOP-DOWN, DEPTH-FIRST, LEFT-TO-RIGHT (II)

11
TOP-DOWN, DEPTH-FIRST, LEFT-TO-RIGHT (III)

12
TOP-DOWN, DEPTH-FIRST, LEFT-TO-RIGHT (IV)

13
A T-D, D-F, L-R PARSER

14
TOP-DOWN vs BOTTOM-UP TOP-DOWN: – Only search among grammatical answers – BUT: suggests hypotheses that may not be consistent with data – Problem: left-recursion BOTTOM-UP: – Only forms hypotheses consistent with data – BUT: may suggest hypotheses that make no sense globally

26
DYNAMIC PROGRAMMING A standard T-D parser would reanalyze A FLIGHT 4 times, always in the same way A DYNAMIC PROGRAMMING algorithm uses a table (the CHART) to avoid repeating work The Earley algorithm also – Does not suffer from the left-recursion problem – Solves an exponential problem in O(n 3 )

27
THE CHART The Earley algorithm uses a table (the CHART) of size N+1, where N is the length of the input – Table entries sit in the `gaps’ between words Each entry in the chart is a list of – Completed constituents – In-progress constituents – Predicted constituents All three types of objects are represented in the same way as STATES

28
THE CHART: GRAPHICAL REPRESENTATION

29
STATES A state encodes two types of information: – How much of a certain rule has been encountered in the input – Which positions are covered – A  , [X,Y] DOTTED RULES – VP  V NP  – NP  Det  Nominal – S   VP

30
EXAMPLES

31
SUCCESS The parser has succeeded if entry N+1 of the chart contains the state – S   , [0,N]

32
THE ALGORITHM The algorithm loops through the input without backtracking, at each step performing three operations: – PREDICTOR: add predictions to the chart – COMPLETER: Move the dot to the right when looked-for constituent is found – SCANNER: read in the next input word