The output of Ripper.lex is an array of arrays. Each element in the array has information regarding the tokens. The first element is the line number and column number where the token was found. The second element is the classification of the token, whether it is a string, integer, new line, empty space, identifier, keyword, etc.

The tokenizer doesn’t check for any syntax i.e even if you provide a wrong syntax it will blindly convert it into tokens. Let’s modify the sample_code.rb to have incorrect syntax

Parsing

While parsing Ruby actually tries to group the tokens into phrases that actually makes sense to Ruby. Ruby uses parse generator called Bison to create the parser class. The input to the parser is set of grammar rules. Ruby builds this parser during the build process. Ruby uses LALR algorithm ( Look-Ahead Left Reversed Rightmost Derivation ) to parse the tokens. It reads the tokens from left to right, trying to match the pattern to one or more grammar rules. The parser also looks at the next token in the stream when trying to figure out which rule to match.

To see the output of the parsing stage you have to pass the code to Ripper#sexp. Here is a sample code:

This output is a data structure called Abstract Syntax Tree. The data structure is used to record the structure and meaning of the Ruby code. The graphical representation of the above output is

Abstract Syntax Tree

The command node or the function call is followed by the identifier/function to be called. The args_add_block has the list of arguments or block passed to the function. Here a string literal with content “Hello Aliens” is passed to the puts method.

Compilation

Ruby version 1.9 introduced compiler which compiles the ruby code before executing. Ruby compiler translates your code into another language which Ruby’s virtual machine understands. Ruby’s compiler runs in the background without the need of our interaction. Ruby compiles the AST into low-level bytecode which ( Yet Another Ruby Virtual Machine ) YARV can understand. YARV is an interpreter which executes the code.