Let Your Parser Go for the GOLD

A Parser Generator that Speaks YOUR Language

In a previous installment, you visited the inner workings of the Yet Another Compiler-Compiler (YACC) parsing system. This month, you go for the gold: the Grammar Oriented Language Developer (GOLD) Parser. Like most parsing systems, GOLD uses the LALR(1) state machine (algorithm) to analyze syntax and a Deterministic Finite Automaton (DFA) to identify different lexical units (tokenizer). Practically all common parser generators, such as YACC/Bison, use these algorithms. However, GOLD takes a different approach than common compiler-compilers. GOLD is freeware and uses the "zlib" style license so you can integrate it with your own apps without worry.

In GOLD, the LALR and DFA algorithms are merely simple automata, using lookup tables to determine both state transition and parsing actions. As a result, most of the computational work is performed before any input text is parsed. It is the computation of these tables that is both time-consuming and complex. Once created, these parsing tables are essentially independent of any programming language binding. As if to prove this point, the GOLD developers have implemented a dozen language bindings as of this writing (as either linkable .LIB import libraries or plug-ins), including:

ActiveX DLL

Assembly - Intel x86

C/C++

C#

D

Delphi

Java

Pascal

Python

Visual Basic .NET

wxWidgets

GOLD takes a unique approach of logically separating the development of the LALR and DFA parse tables from the algorithms that use them. Unlike what we saw with YACC/Bison, GOLD does not require you to embed your source code directly into the grammar. Grammars are completely independent of any implementation language or platform. Instead, the application analyzes the grammar and then saves the parse tables to a separate file. This file can be subsequently loaded by the actual parser engine and used.

How Does It Work?

The GOLD Parsing System is designed to aid in the development of compilers, interpreters, and translators while supporting multiple programming languages' backends.

Because the LALR and DFA algorithms can be implemented with simple state transition graphs, the parsing can be easily hosted by any programming language because the logic simply looks up a value in a table and then acts accordingly. The creation of these lookup tables is where all the hard work takes place. It's not surprising, then, that GOLD is composed of three pieces: Builder, parser Engine, and an output file that contains table information. In a nutshell, the Builder constructs the parse tables and the Engine executes the tables.

The Builder is a Win32 app that reads a source grammar written in the GOLD Meta-Language, produces the parse tables, and then writes them to the Compiled Grammar Table file. Builder does a couple unrelated things, such as creating enum lists and allowing you to interactively test a grammar. The Builder could theoretically be ported to other OSes, but is only available for Win32 at this time.

The Engine component performs the actual parsing and can be developed in any programming language necessary. Because all the complex work was already performed by the Builder, the Engine simply needs to read the Compiled Grammar Table (CGT) file and implement the LALR and DFA automata. Although Engine implementations vary in design, they all behave identically because they completely are table-driven with respect to the input.

Mechanically speaking, this means each time you modify the grammar file, you'll run the Builder to regenerate the CGT file. Once the grammar is completely refined, the bulk of your work shifts to the implementation of semantics and your focus changes to the Engine.

The Calculator Revisited

For comparison's sake, you will now reverse-engineer the simple calculator you developed in GNU Bison. Fortunately, the GOLD Builder includes an import tool for legacy YACC/Bison grammars to get you started. For those of you starting from scratch, the New Grammar Wizard will give you a leg up on some standard types of grammars by quizzing you on some basic language characteristics: Do newlines terminate statements or not? Does it have typical programming-language type identifiers? Does it contain string constants? And so on. The output from the File | Import A YACC/Bison Grammar menu selection is shown in Figure 1. I modified it slightly to follow the suggested practice of defining the terminal symbols using the "=" operator. Although you literally could write in '+' for the plus operator, I feel that spelling out the terminals ("PLUS") both provides abstraction while supporting maintenance and readability. The GOLD Meta-Language syntax is as close as anyone will ever get to notating in pure Backus-Naur Form (BNF), so if you have any prior experience in formal grammars it will be a cakewalk.

Schooling Your Grammar

Now, it's time to test the grammar! Rather than write a lot of code to do this, you simply can choose Window | Test Grammar on the Builder menu. You'll test the same expression you used in the previous article on YACC: 3 * ( 4 + 5 ). The results are shown in Figure 2:

Figure 2: Parse Actions Display

You also can have a look at the Parse Tree view that shows a nice treeview control so you can drill down to exactly the right rules in your grammar (see Figure 3).

Figure 3: Parse Tree Display

Let Your Parser Go for the GOLD

Building the GOLD Engine

If there is any downside to the GOLD parser, it is that the language-specific bindings for the GOLD Engine are neither documented online nor offline. Instead, you have to use Doxygen to generate the HTML, which I found a silly chore—especially considering that C++ is probably one of the most popular platforms and documentation for its GOLD Engine should have been readily available. In the end, I felt like I was more reverse-engineering the demo programs rather than actually learning something from the docs. This is not quite the same as a step-by-step tutorial.

I just improvised by compiling all the code in the SRC directory and shoving it into a library, like this:

However, there are Project files available for a variety of Visual Studio versions if you want to do that, even one that builds the Engine as a DLL. Again, I don't see a reason why precompiled binaries weren't provided for these common platforms. The final link step, then, looks like this:

The program starts in main() on Line 23 where you load the externally generated Compiled Grammar Table (CGT), which is the binary output of the GOLD Builder. Thus, you can tweak the grammar as much as you like without necessarily needing a recompile.

On Line 38, you do the actual scanning of the source text (hardcoded for this simple example) using a DFA built from the stored image in the CGT file. The next steps are to get the tokens out of the DFA results and stuff them into the parser.

The two arguments at the end of Line 60 designate trim reduction and whether to stop parsing after the first error is encountered. Trim reduction means to simplify rules of the form: NonTerminal1 := Nonterminal2 (intermediate productions). Last, to prove that your program did something, you print the tokens (see Line #53) and print the reduction tree (see Line #71). The printing of the tokens is mainly to show that you recognized each of tokens correctly and should read left-to-right along the input. The reduction tree shows a tree you would use to control the expression evaluation so that, in fact, it does "4 + 5" and then multiplies that result by 3. Below is the output from the demo program:

Conclusion

I've barely scratched the surface of what you can do with the GOLD parser system. In this article really only took you through the steps up to a complete parse tree. Adding in the semantics is the next step for you to take. There are lots of applications from compilers to XML translators that you can implement with this flexible tool. The added advantage of being able to test your grammar inside the Builder can save you time wasted recompiling and debugging your code just to test the quality of the grammar. This alone is a big step up from the traditional YACC/Bison solutions. Add in the incredible cross-platform support for everything from .NET to Java and you'll be sure to be finding new ways to use GOLD every day.

About the Author

Victor Volkman has been writing for C/C++ Users Journal and other programming journals since the late 1980s. He is a graduate of Michigan Tech and a faculty advisor board member for Washtenaw Community College CIS department. Volkman is the editor of numerous books, including C/C++ Treasure Chest and is the owner of Loving Healing Press. He can help you in your quest for open source tools and libraries; just drop an e-mail to sysop@HAL9K.com.

About the Author

Victor Volkman

Victor Volkman has been writing for C/C++ Users Journal and other programming journals since the late 1980s. He is a graduate of Michigan Tech and a faculty advisor board member for Washtenaw Community College CIS department. Volkman is the editor of numerous books, including C/C++ Treasure Chest and is the owner of Loving Healing Press. He can help you in your quest for open source tools and libraries, just drop an e-mail to sysop@HAL9K.com.

Comments

There are no comments yet. Be the first to comment!

You must have javascript enabled in order to post comments.

Leave a Comment

Your email address will not be published. All fields are required.

Name

Email

Title

Comment

Top White Papers and Webcasts

Live Event Date: March 19, 2015 @ 1:00 p.m. ET / 10:00 a.m. PT
The 2015 Enterprise Mobile Application Survey asked 250 mobility professionals what their biggest mobile challenges are, how many employees they are equipping with mobile apps, and their methods for driving value with mobility.
Join Dan Woods, Editor and CTO of CITO Research, and Alan Murray, SVP of Products at Apperian, as they break down the results of this survey and discuss how enterprises are using mobile application management and private …

Today's agile organizations pose operations teams with a tremendous challenge: to deploy new releases to production immediately after development and testing is completed. To ensure that applications are deployed successfully, an automatic and transparent process is required. We refer to this process as Zero Touch Deployment™. This white paper reviews two approaches to Zero Touch Deployment--a script-based solution and a release automation platform. The article discusses how each can solve the key …