Lexing and parsing in Go (Part 1)

About a month ago, I watched a video of Rob Pike
giving a talk about lexing for the text/template/parse package. In it he described the simplicity
and elegance of making a traditionally not concurrent problem into a concurrent solution. This then
motivated me to try out making a lexer for a yet undefined language of my own design. But first as I
did in my implementation, let start with the lexer.

Lexical Analysis

My quest started simply with a lexer. The idea behind the lexer is actually quite simple. The lexer
has a goroutine that runs the lexer and another that collects the tokens sent by the lexer. This setup
allows the lexer to be run in a very simple manner. Instead of having mutually recursive functions,
we have a for loop that runs functions set by the lexer. The for loop is as follows:

// stateFn represents the state of the scanner as a function// that returns the next state.typestateFnfunc(*Lexer)stateFn// run runs the state machine for the lexer.func(l*Lexer)run(){forl.state=lexStart;l.state!=nil;{l.state=l.state(l)}}

As you will notice, l is a pointer to the lexer struct and lexStart is the first function passed to
the loop. The state field of the lexer holds a state function which is a function with the above
type signature. The state function returns another state function which will take care of the next
state and so on.The for loop then runs the current state until a state function returns nil. The run()
function is ran in a goroutine and communicates with the main goroutine through a channel

// lex creates a new scanner for the input string.funcLex(name,inputstring)*Lexer{l:=&Lexer{name:name,input:input,items:make(chanToken),}gol.run()returnl}

In the other goroutine, users of the lexer can make calls to NextItem() which returns the next
Token in the channel.

Tokens

At this point, you would be right to wonder what exactly are we lexing, since we cannot lex tokens
we don’t know. I started with the code from the
text/template/parse package
and modified it to suit my needs. Tokens are defined as a struct with fields for the type, position
and value of the token.

// token represents a token or text string returned from the scanner.typeTokenstruct{TypTokenType// The type of this token.PosPos// The starting position, in bytes, of this item in the input string.Valstring// The value of this item.}