Introduction

But sometimes you may want to have your own parser. This is what's
pyPEG for. And pyPEG supports Unicode.

pyPEG is a plain and simple intrinsic parser interpreter framework for
Python version 2.7 and 3.x. It is based on Parsing Expression Grammar,
PEG. With pyPEG you can parse many formal languages in a very easy
way. How does that work?

Installation

Parsing text with pyPEG

PEG is something like Regular Expressions with recursion. The
grammars are like templates. Let's make an example. Let's say, you
want to parse a function declaration in a C like language. Such a
function declaration consists of:

type declaration

name

parameters

block with instructions

intf(int a, long b)
{
do_this;
do_that;
}

With pyPEG you're declaring a Python class for each object type you want
to parse. This class is then instanciated for each parsed object. This class
gets an attribute grammar with a description what should be parsed in
what way. In our simple example, we are supporting two different things
declared as keywords in our language: int and long. So we're
writing a class declaration for the typing, which supports an Enum of
the two possible keywords as its grammar:

class Type(Keyword):
grammar = Enum( K("int"), K("long") )

Common parsing tasks are included in the pyPEG framework. In this
example, we're using the Keyword class because the result will be a
keyword, and we're using Keyword objects (with the abbreviation K),
because what we parse will be one of the enlisted keywords.

The total result will be a Function. So we're declaring a Function
class:

class Function:
grammar = Type, …

The next thing will be the name of the Function to parse. Names are
somewhat special in pyPEG. But they're easy to handle: to parse a
name, there is a ready made name() function you can call in your grammar to
generate a .nameAttribute:

class Function:
grammar = Type, name(), …

Now for the Parameters part. First let's declare a class for the parameters.
Parameters has to be a collection, because there may be many of
them. pyPEG has some ready made collections. For the case of the Parameters,
the Namespace collection will fit. It provides indexed access by name, and
Parameters have names (in our example: a and b). We write it like this:

class Parameters(Namespace):
grammar = …

A single Parameter has a structure itself. It has a Type and a name().
So let's define:

pyPEG will instantiate the Parameter class for each parsed parameter.
Where will the Type go to? The name() function will generate a
.nameAttribute, but the Type object? Well, let's move it to an
Attribute, too, named .typing. To generate an Attribute, pyPEG
offers the attr() function:

By the way: name() is just a shortcut for attr("name", Symbol). It generates
a Symbol.

How can we fill our Namespace collection named Parameters? Well, we have
to declare, how a list of Parameter objects will look like in our source text.
An easy way is offered by pyPEG with the cardinality functions. In this case
we can use maybe_some(). This function represents the asterisk cardinality, *

Maybe a function has no parameters. This is a case we have to consider.
What should happen then? In our example, then the ParametersNamespace should
be empty. We're using another cardinality function for that case, optional(). It
represents the question mark cardinality, ?

We can continue with our Function class. The Parameters will be
in parantheses, we just put that into the grammar:

class Function:
grammar = Type, name(), "(", Parameters, ")", …

Now for the block of instructions. We could declare another collection for the
Instructions. But the function itself can be seen as a list of instructions. So
let us declare it this way. First we make the Function class itself a List:

class Function(List):
grammar = Type, name(), "(", Parameters, ")", …

If a class is a List, pyPEG will put everything inside this list,
which will be parsed and does not generate an Attribute. So with that
modification, our Parameters now will be put into that List, too. And
so will be the Type. This is an option, but in our example, it is not
what we want. So let's move them to an Attribute.typing and an
Attribute.parms respectively:

Now we can define what a block will look like, and put it just behind into
the grammar of a Function. The Instruction class we have plain and simple.
Of course, in a real world example, it can be pretty complex ;-) Here we just
have it as a word. A word is a predefined RegEx; it is re.compile(r"\w+").

Composing text

pyPEG can do more. It is not only a framework for parsing text, it can
compose source code, too. A pyPEGgrammar is not only “just like” a
template, it can actually be used as a template for composing text.
Just call the compose() function:

>>> compose(f, autoblank=False)
'intf(inta, longb){do_this;do_that;}'

As you can see, for composing first there is a lack of whitespace. This
is because we used the automated whitespace removing functionality of
pyPEG while parsing (which is enabled by default) but we disabled the
automated adding of blanks if violating syntax otherwise. To improve on
that we have to extend our grammar templates a little bit. For that
case, there are callback function objects in pyPEG. They're only
executed by compose() and ignored by parse(). And as usual, there
are predefined ones for the common cases. Let's try that out. First
let's add blank between things which should be separated:

The blank after the comma int a,long b was
generated by the csl() function; csl(Parameter) generates:

Parameter, maybe_some(",", blank, Parameter)

Indenting text

In C like languages (like our example) we like to indent blocks.
Indention is something, which is relative to a current position. If
something is inside a block already, and should be indented, it has to
be indented two times (and so on). For that case pyPEG has an indention
system.

The indention system basically is using the generating function indent()
and the callback function object endl. With indent we can mark what should
be indented, sending endl means here should start the next line of the
source code being output. We can use this for our block:

User defined Callback Functions

With User defined Callback Functions pyPEG offers the needed flexibility
to be useful as a general purpose template system for code generation. In
our simple example let's say we want to have processing information in
comments in the Function declaration, i.e. the indention level in a comment
bevor each Instruction. For that we can define our own Callback Function:

Such a Callback Function is called with two arguments. The first
argument is the object to output. The second argument is the parser
object to get state information of the composing process. Because this
fits the convention for Python methods, you can write it as a method of
the class where it belongs to.

The return value of such a Callback Function must be the resulting text.
In our example, a C comment shell be generated with notes. We can put
this now into the grammar.

XML output

Sometimes you want to process what you parsed with
the XML toolchain, or with
the YML toolchain. Because of that, pyPEG has an
XML backend. Just call the thing2xml() function to get bytes with
encoded XML: