Context free grammars are usually described in Backus Naur Form (BNF)
and are a common part of computer programming language specifications.
Visual representations of CFG productions are sometimes used in
documentation to give a high-level overview of the different parts of a
language that is easy to understand, yet comprehensive, in that it
describes the full power of the language. There is a standard flow
graph notation for visualizing grammar productions. This flow graph
notation is very similar to a finite state machine graph with one start
and one final state. Examples of the use of this type of visualization
include the Oracle SQL language documentation [19] and the ANTLRworks
IDE for the ANTLR compiler compiler language [5]. The World Wide Web
consortium (W3C) uses a standard notation for the BNF grammar included
in all of its language specifications documents.

This paper describes an application for CFG visualization SVG.
Grammar productions are parsed using ANTLR and an SVG graph is
generated using Java and the Velocity template language. The grammar
visualization tool is used to present an augmented view of W3C
specifications. This augmented view displays a W3C specification
side-by-side with a visual representation of each grammar production
of the specification's language. This application works within a web
browser and uses Javascript to bring SVG to life by adding
interactivity and animations. The user can move between grammar
productions by clicking on the production references that appear in
the current flow graph, or by selecting an item from a full list of
grammar productions. The W3C specification document is kept in sync
with the currently displayed graph, so that the user can read the
textual description and related documentation from the specficiation
document while looking at the production graph.

Introduction

Background Information

Context Free Grammars

Context free grammars are used to describe formal languages in computer
science. The most common format for Context Free Grammar representation is the
Backus-Naur Form or BNF for short. A BNF consists of a set of production rules
- each rule describes a single part of the language. Productions usually follow
a hierarchical structure with a single root production at the top, which
captures the entire language. A hypothetical BNF description of a natural
language might consist of production rules for paragraphs, sentences and
various phrases. A more practical example of a BNF production for a street
address might look like this:

address ::= street_name number city state zip_code

In this example the pφroduction's rule name is address, and the part on the
right side of the ::= symbol is the production's value, which in turn, in this
case is a sequence of other rules, which are defined separately. Each of those
rules are non-terminals. A production can refer to other non-terminals and
terminals. Terminals represent a character, a sequence of characters or a
simple character pattern, which can be represented as a regular expression.

Visualizing Production Rules

Grammar productions are an essential part of the description of any
programming language and are usually included in programming language
specifications. Productions can be also represented in a more visual way in the
form of a diagram, and such diagrams are often included in programming language
documentation. The online docuemntation for the Oracle database product for
example includes such visual diagrams for each element of the Oracle SQL
programming language.

Oracle documentation website screenshot.

Figure 1. Oracle Documentation Website Screenshot

Documentation

Grammar production graphs are a very useful documentation aid as
demonstrated in the above example. They represent and intuitive,
easy-to-understand view of what is otherwise very dry, terse, technical
material.

The XML Language Specification, issued by the World Wide Web Consortium
(W3C) represents a complete description of the XML language. A core part of the
specification is the BNF grammar definition of the XMl language. The entire
specifications document is structured around the various productions of the BNF
grammar, with productions separated into groups, which appear in sections
alongside detailed descriptions of the meaning of each element represented by
each production rule. While the rules are listed in their BNF form, the
graphical form of productions is not included. This paper describes a project,
whose aim is to provide an enhanced view of the XML (and other) language
specification documents from the W3C. This enhanced view inlcudes a navigable
graphical representation of the BNF productions and appears alongisde the
original specifications document. The graphs of all the rules from an
individual section of the document are displayed alongisde that section of the
original document and the user is able to navigate through the specification by
clicking on references to other production rules, and observing a synced-up
view of the main specifications document and the current production graph.

Grammar visualization application screenshot.

Figure 2. Grammar Visualization Application Screenshot

The screenshot above shows a full view of the application window, which
includes the annotated version of the W3C XML Language Specification displayed
in a browser window. The graphical represnetation of the rules from the current
section of the specifications document appears in the top part of the window
with the original specifications document below it. The top section of the
window includes a drop-down box, which can be used to navigate between the
sections of document and display a particular section of the specification
alongisde its productions' graphs.

Grammar Production Visualization Implementation

The production graphs are visualized using SVG - an open web standard for
vector graphics on the web. SVG is an excellent choice for a web application
such as this, due to its good integration with HTML and JavaScript. SVG code
can be directly embedded inside HTML and its structure is fully exposed for
programmatic manipulation using JavaScript - the standard language for web
programming inside the browser.

Grammar Production Language

In order to produce a graphical representation of a BNF production, the frst
step was to obtain a structured representation of a production. The format of
BNF productions as used in the XML Language Specification is fully described in
Section 6 of the XML Language Specification document itself. Using this
information as reference a BNF grammar was written for XML symbol productions
using ANTLR. ANTLR is a Java compiler compiler tool, which is used for
generating language parsers based on a BNF language description. The parsers
are generated in the Java language and ANTLR itself is written completely in
Java. ANTLR also provides an IDE called ANTLRWorks, for developing programs in
the ANTLR language. ANTLRWorks greatly simplifies and speeds up the development
of ANTLR grammar, by providing a syntax-hightlighted editor with Ctrl-Click
style navigation between rules, a step-by-step parser debugger and a visual
representation of the tokenized input and the output Abstract Syntax Tree
(AST). AST is the structured representation of the input to a grammar, which in
this case is a production rule. This structured tree can be used to generate
the SVG graphical representation of the production. An example of an AST as
displayed in ANTLRWorks is included in the screenshot below:

AST of BNF recognition grammar.

Figure 3. AST of BNF recognition grammar

SVG Code Generation

Once the grammar for XML production rules was developed, the next step was
to use the AST produced by parsing a rule using ANTLR to produce the
appropriate SVG image. The production visualization format is standard and
well-established with numerous example implementations such as in the Oracle
documentation example shown above. The ANTLRWorks IDE itself includes a feature
to display the graphical representation of grammar production rules. The
ANTLRWorks format of visual representation was used as a reference point in the
development of the current application. All of our visualization primitives
were based entirely on the ANTLRWorks grammar visualization model.

ANTLRworks screenshot.

Figure 4. ANTLRworks Screenshot

Visualization Templates

Grammar productions consist of only a handful of basic operations, each of
which requires a distinct form of visual representation. All grammar
productions are built using combiantions of these basic operators. As a first
step a set of templates, one for each basic operator, were produced. These
templates consisted of static SVG code and are shown and described here.

Production

Production SVG image.

Figure 5. Production SVG Image

This is the core of the root of each production - it includes the production
name and the right arrow which indicates the the end of the production.

Sequence

Sequence SVG image.

Figure 6. Sequence SVG Image

The most basic building block, a sequence of sub-expressions is represented
as a concatenated list of boxes. There are no connecting arrows here - all the
necessary connectors are drawn as part of the leaf unit expressions (see
below).

Zero or One

Zero-or-one SVG image.

Figure 7. Zero-or-one SVG Image

The sub-expression (represented as a dotted outline box) is optional and can
be skipped (can occur zero or one times) in the input.

One or More

One-or-more SVG image.

Figure 8. One-or-more SVG Image

The sub-expression (represented as a dotted outline box) can occur one or
more times in the input.

Zero or More

Zero-or-more SVG image.

Figure 9. Zero-or-more SVG Image

The sub-expression (represented as a dotted outline box) can occur zero or
more times in the input.

Choice

Choice SVG image.

Figure 10. Choice SVG Image

Either one of the subexpression can appear and will be matched. All the
sub-expressions are mutually exclusive. This template is designed to be able to
seamlessly handle a choice with a single option. This special case occurs in
almost every production due to the way the AST is constructed by ANTLR.

Reference

Reference SVG image.

Figure 11. Reference SVG Image

This is the first of the two leaf unit expressions. This tempalte represents
a reference to another production. The given production (by its name) is
matched here.

Literal

Literal SVG image.

Figure 12. Literal SVG Image

This is the second leaf unit expression. This template represents a
character sequence or pattern (terminal in BNF terminology). This appears the
same as a reference with the color and style of the font being the
differentiating factor.

Velocity Template Language for SVG Generation

Once the basic building units of a grammar production graph were designed, a
simple tree walker was implemented using the Velocity Template Language (VTL).
VTL is a template language processor implemented in Java. It was originally
designed with the primary goal of web application development, but is also
available as a standalone library for use in any Java application. Velocity is
a simple but powerful scripting language, which is very useful for the purpose
of SVG generation because it eliminates the need to embed SVG code inside Java,
which would be a cumbersome effort and result in hard to understand and
unmaintainable code. Instead, VTL allows code to be embedded directly inside
the SVG files described above, thus bringing the static SVG files to life in a
way. Besides one velocity template for each of the above SVG templates, only
one additional velocity template is required, called eval_node. This template
acts as a dispatcher of sorts, delegating the processing of each node of the
AST to the correct template based on the AST node's type. Two sample SVG
production gpraphs from the XML Specification is included below:

SDDecl element SVG image.

Figure 13. SDDecl Element SVG Image

Mixed element SVG image.

Figure 14. Mixed Element SVG Image

Web Page Parsing with Java and Regular Expressions

The next step in the implementation of the application was the extraction of
all of the grammar production rules from the XML Specification web page. Upon
examination it was discovered that all of the rules of the XML Specification
follow an format with a common structure and appear inside HTML tables with a
common structure. This alllowed an easy implementation of the extraction
process by the use of a small set of simple regular expressions. In addtion to
the grammar productions themselves, grouping information was also extracted and
the expressions were stored inside an in-mmeroy list of javabeans, including
all the necessary information (HTML anchor names) to locate each production
rule inside the HTML document at runtime by JavaScript code running inside the
web browser.

The final step of the project was the implementation of an integrated XML
Specifications viewer which would present the user with a synchronized view of
the SVG production graphs and the source XML Specification document from the
W3C. This augmented view of the online specification is served from a standard
Jav Enterprise Web Server where our application is deployed as a WAR file. The
web application consists of an index page, which sets the layout of the
augmented XML specification view and includes both the production graph in the
top pane and the XML Specifications document loaded directly from the W3C web
site. The individual production graphs are loaded dynamically inside the
browser using AJAX technology without refreshing the web page inside the
browser.

Development Environment

ANTLRWorks was used for the development and debugging of the ANTLR grammar.
For the rest of the application, the Eclipse IDE was used, with the Veloeclipse
plugin for editing Velocity Template files and the Java EE tools for the
development of the web application servlet, HTML and JavaScript content. The
application was deployed and tested on version 6 of the Apache Tomcat Web
Server. The Eclipse Java EE tools provide an easy-to-use environment which
greatly simplifies the development of Java EE applications, with one-click
packaging and deployment of an application to the web container with full
debugging support.