Generating Web content with Cocoon (1/2) - exploring XML

Generating Web content with Cocoon

The Apache project is well-known for the Web server software it produces that
is carrying its name. In the past, many other interesting software projects were
also started there, mainly in the Java and XML space.
Cocoon is one of them.

Cocoon is a Java Web-application for generating dynamic content using XML. It
can be installed on any Java Servlet Engine and comes with a wide variety of
components for generating, transforming and outputting data with XML. Cocoon 2
was recently released as a complete rewrite of its predecessor, with improved
flexibility and scalability.

The central concept in Cocoon is the pipeline, a number of components
plugged together in a serial configuration to process incoming data that will
be passed along. Unix users are well-aware of this concept, as it comes with a lot
of small utilities that can be linked with the famous pipe symbol "|" to
manipulate character data:

ls -1 | grep "\.bak" | wc -l

The ls program understands file systems,
grep finds characters in arbitrary text, and wc counts line
of text. Together this little series of programs counts the number of backup
files in a certain directory. The glue between these tools is character data that
the operating system transparently passes between them.

The Cocoon developers set out to create a similar system for generating
content on the Web by piping XML through a configurable set of tools. The first
version of the software was passing around full DOM documents, limiting
scalability with regard to the size of documents that could be processed, and the
amount of parallelism in the pipeline. Furthermore, the pipeline was defined
through processing instructions within the documents, making reuse in different
contexts difficult.

Version 2 eliminates these problems by using SAX instead of DOM, and connecting
the processing components through SAX events. This way XML documents of
arbitrary size can be processed, and the components can work in parallel on the
same document. The configuration of the pipeline is now moved out of the data
documents and into a separate sitemap file.

Pipeline components

Now out of which components can a pipeline be built? Cocoon comes with many
configurable components for generating, transforming and serializing data with
XML. Some generators are:

A generator creates a series of SAX events that can be processed in subsequent
stages of the pipeline.
Readers are a special case of generators, in that they return non-XML data
and are usually used as a one-stage pipeline, like the FileReader for returning
static data to the Web client.