SpiderMonkey Parser API: A Standard For Structured JS Representations

SpiderMonkey Parser API: A Standard For Structured JS Representations

Description
-----------

The representation of JavaScript programs that Mozilla used when they exposed their SpiderMonkey reflection API isn't perfect; in fact, it has a good number of flaws. But a rich ecosystem of tools has formed around this particular structured representation of JavaScript programs, most notably the popular esprima parser.

The reusability and composability of these tools has made this format the standard for all modern projects that transform, generate, analyse, or otherwise work with JavaScript programs. We will explore this burgeoning format, evaluate its design with the benefit of hindsight, and showcase some of the more useful and prominent projects that have adopted it.

Speaker Notes
-------------

=== Slide 1 ===

=== Slide 2 ===

* this is a JavaScript program
* uses the new operator on a constructor "big C"
* passes result of `1 + a`
* not very useful in this format; just a series of characters
* meaningful static analysis requires a more structured representation

=== Slide 3 ===

* in creating this structure, we usually start with lexical analysis (tokenisation)
* makes for a much simpler parser
* character stream turned into stream of more meaningful tokens
* tokens tagged with type
* whitespace characters do not create tokens

=== Slide 4 ===

* parsers are magic
* it turns the token stream into a tree

=== Slide 5 ===

* this is a representation of an abstract syntax tree (AST)
* formatted the way that Spidermonkey interpreter does internally

=== Slide 6 ===

* same AST as a JavaScript object

=== Slide 7 ===

* mid-2010, Dave Herman announced on his Mozilla blog
** new public API in SpiderMonkey
** exposes its JavaScript parser.

* vast majority of document specifies AST format
* let's take a closer look at it

=== Slide 10 ===

* the Node interface
* all nodes have a "type" member
* each node may have source tracking information; line/column of start/end parse position
* used to preserve location information through transformations to track original source

=== Slide 11 ===

* Program is a Node
* any successful parse will have a top-level Program node

* Function interface shows support for ES6 features
* because SpiderMonkey’s implementation parses JavaScript, not ECMAScript

=== Slide 12 ===

* Statement interface extends Node interface
* EmptyStatement is the simplest Statement node
* BlockStatement contains a list of statements and executes them in sequence
* ExpressionStatement allows an Expression to be used in Statement position

=== Slide 13 ===

before we continue, let’s look at what we’re looking for in a good AST format

* These nodes are combinations of similarly structured nodes... kind of
* BinaryExpression not split up into PlusExpression, MultiplicationExpression, etc.
* But for some reason, split up AssignmentExpression (assignment ops), LogicalExpression (&&, ||), and BinaryExpression (all other binary operators)

=== Slide 15 ===

* Same thing for UpdateExpression (increments, decrements) and UnaryExpression

* these AST problems are directly derived from problems with the language

=== Slide 22 ===

* two different programs, create same AST, have different behaviour
* lack of a DirectiveStatement node
* for now, parsers treat directives as strings
* we’re working on fixing this one

=== Slide 23 ===

* Spidermonkey AST is definitely not perfect -- why would we want to use it?
* Reflect.js introduced about a year after Reflect.parse
* JavaScript parser written in ES3-compatible JavaScript
* makes a bit of noise, but nothing came of it

=== Slide 24 ===

* today, Esprima is the most popular JavaScript Reflect.parse implementation
* heavily tested, very true to spec
* even has a harmony branch that follows ES6 development

=== Slide 25 ===

* created a fuzzer for generative testing of Reflect.parse implementations

=== Slide 26 ===

* found 11 bugs in 4 implementations in the first 2 or 3 weeks

=== Slide 27 ===

=== Slide 28 ===

* implements visitor pattern for Spidermonkey ASTs
* doesn't do much on its own
* useful for building other tools that operate on Spidermonkey ASTs

* example visualisation of a very small program
* notice that any nonlinear control flow causes branching/joining

=== Slide 48 ===

* web demo available

=== Slide 49 ===

* computes complexity metrics

=== Slide 50 ===

* on a single module, get
** cyclomatic complexity
** source lines of code
** maintainability index
** more

* per function and for whole program

=== Slide 51 ===

* across multiple modules, get coupling and maintainability metrics

=== Slide 52 ===

* Plato visualises these metrics

=== Slide 53 ===

* fully pluggable linter

* alerts about potential bugs

* consistent code style
** not just formatting, structural too

=== Slide 54 ===

* example eslint rule

=== Slide 55 ===

* tracks line, function, and branch coverage
* uses instrumentation

=== Slide 56 ===

* standard LCOV report
* visualised using an LCOV visualiser

=== Slide 57 ===

=== Slide 58 ===

* partially evaluates JS programs
* generates own control flow graph, does own scope analysis
* not very good at either; should use escope/esgraph
* replaces AST nodes that can be statically computed
* unrolls loops

=== Slide 59 ===

* Jez went one step further
* metacircular interpreter

=== Slide 60 ===

* step through evaluation, generate environment state at any point
* still doesn't use escope or estraverse
* so it's not always correct

=== Slide 61 ===

=== Slide 62 ===

* performs tail call elimination
* uses estraverse and escope

=== Slide 63 ===

* transforms to iterative loops

=== Slide 64 ===

* compiles ES6 generators to ES5

=== Slide 65 ===

* generates semantically equivalent, syntactically minimal AST
* uses fixed point evaluation strategy: repeatedly applies a set of rules to an AST (using estraverse) until it reaches a fixed point
* 2 phases: simplification then expansion
* simplification generates smaller AST; expansion generates larger AST
* also does name mangling, but should probably be separated out

=== Slide 66 ===

* 1st phase reduces AST to simpler AST

=== Slide 67 ===

* 1st phase reduces AST to simpler AST

=== Slide 68 ===

* 2nd phase creates AST that has more compact syntax

=== Slide 69 ===

* 2nd phase creates AST that has more compact syntax

=== Slide 70 ===

[[ NOTE: read slide aloud ]]

=== Slide 71 ===

* grepping with esquery style selectors

=== Slide 72 ===

* grep with placeholders

=== Slide 73 ===

* replacement

=== Slide 74 ===

* another 2012 summer internship from Mozilla
* write JS with hygienic macros
* basically modifies token stream before it's sent to the parser
* this is a very difficult problem; much harder than it sounds: no parsing context

* In summary:
* use the Spidermonkey AST
** it's not perfect
** unfortunately, ASTs not guaranteed to represent valid JS
** will be expanded for ES6 and beyond
** the tooling is awesome
** JS tooling is now comparable to that of mature languages

* don't make your own AST format
** you'll probably get it wrong
** you don't want to recreate all these tools

* don't ever manipulate strings of code: EVER
** not in any programming language
** especially not in JavaScript