Note that the parser starts with a string of tokens.
21 March 2013! OSU CSE! 2!

Plan for the BL Parser
•! Design a context-free grammar (CFG) to specify syntactically valid BL programs •! Use the grammar to implement a recursive-descent parser (i.e., an algorithm to parse a BL program and construct the corresponding Program object)

21 March 2013!

OSU CSE!

3!

Parsing
•! A CFG can be used to generate strings in its language
–! “Given the CFG, construct a string that is in the language”

•! A CFG can also be used to recognize strings in its language
–! “Given a string, decide whether it is in the language” –! And, if it is, construct a derivation tree (or AST)
21 March 2013! OSU CSE! 4!

Parsing
Parsing generally refers to this last •! A CFG can step, be used to generate strings in i.e., going from a string (in the its language language) to its derivation tree or— for aconstruct programming language— –! “Given the CFG, a string that is in perhaps to an AST for the program. the language”

•! A CFG can also be used to recognize strings in its language
–! “Given a string, decide whether it is in the language” –! And, if it is, construct a derivation tree (or AST)
21 March 2013! OSU CSE! 5!

A Recursive-Descent Parser
•! One parse method per non-terminal symbol •! A non-terminal symbol on the right-hand side of a rewrite rule leads to a call to the parse method for that non-terminal •! A terminal symbol on the right-hand side of a rewrite rule leads to “consuming” that token from the input token string •! | in the CFG leads to “if-else” in the parser

A Recursive-Descent Parser
•! One parse method per non-terminal symbol •! A non-terminal symbol on the right-hand side of a rewrite rule leads to a call to the parse method for that non-terminal •! A terminal symbol on the right-hand side of a rewrite rule leads to “consuming” that token from the input token string •! | in the CFG leads to “if-else” in the parser •! {...} in the CFG leads to “while” in the parser
21 March 2013! OSU CSE! 12!

More Improvements
expr term factor add-op mult-op number nz-digit If we treat every number as a token, ! term { add-op } then thingsterm get simpler for the ! factor { mult-op factor } only 5 nonparser: now there are terminals to worry about. ! ( expr ) | number !+|! * | DIV | REM ! 0 | nz-digit { 0 | nz-digit } !1|2|3|4|5|6|7|8|9

Evaluating Arithmetic Expressions
•! For this problem, parsing an arithmetic expression means evaluating it •! The parser goes from a string of tokens in the language of the CFG on the previous slide, to the value of that expression as an int

A Recursive-Descent Parser
•! One parse method per non-terminal symbol •! A non-terminal symbol on the right-hand side of a rewrite rule leads to a call to the parse method for that non-terminal •! A terminal symbol on the right-hand side of a rewrite rule leads to “consuming” that token from the input token string •! | in the CFG leads to “if-else” in the parser •! {...} in the CFG leads to “while” in the parser
21 March 2013! OSU CSE! 39!

Observations
•! This is so formulaic that tools are available that can generate RDPs from CFGs •! In the lab, you will write an RDP for a language similar to the one illustrated here
–! The CFG will be a bit different –! There will be no tokenizer, so you will parse a string of characters in a Java StringBuilder
•! See methods charAt and deleteCharAt