XML::Comma is an information management platform.
It was designed to be used as a core tool for developing Very Large Websites.
Comma specifies an XML-based document definition format,
encourages Perl code to be embedded in these definitions,
and specifies an API for manipulating documents and document collections.
Comma includes functionality to store sets of documents in ordered,
extensible ways,
and integrates with a relational database to index,
sort and retrieve collections of documents.

Comma relies on MySQL's "USE DATA LOCAL" function, which has been disabled by default in MySQL versions 3.23.48 and greater. This change to MySQL breaks a few of the indexing tests. MySQL can be recompiled with the option "--enable-local-infile" to turn back on the LOCAL functionality. Or mysqld can be started with the argument "--local-infile=1". See the following piece of MySQL documentation:

If a pre-compiled version of the SimpleC parser is not found by Inline, Inline tries (using the FindBin module) to figure out its "current working directory." FindBin croaks when $0 is set to /dev/null, which causes Apache to abort the startup process.

You can get a compiled version of the parser in the right spot by simply invoking

Because the elements() method always tries to return a "list" of elements -- which means that it returns an array in list context and a reference to an array in scalar context -- you have to do a bit of extra work to determine how many instances of a plural element exist. The most concise (and the recommended) way to do this is:

Comma elements automatically get created when you ask for them. So the following code first makes a new Doc, then (behind the scenes) makes a new Element object for you:

XML::Comma::Doc->new ( type=>'Some_Def' )->element ( "foo" );

This automatic creation (auto-vivication, in Perl lingo) is almost always what you want. But there are cases where you need to check whether an Element already exists before you try to manipulate it. For example, you might have a nested element that has some required children -- and you probably only want to create that element when you're sure you're ready to populate it fully.

The idiom to do this turns out to be almost the same as the idiom to check how many instances of a plural element exist, given above:

Element content must be legal XML -- so no &lt;, &gt;, or &amp; characters are allowed. These special characters must be "escaped" by replacing them with their entity codes (respectively &amp;lt;, &amp;gt;, or &amp;amp;). The Comma::Util::XML_basic_escape() and Comma::Util::XML_basic_unescape() methods are available, as are shortcut flags for the element set() and get() methods:

There is a bug in Perl (both versions 5.6.1 and 5.8.0) that leads to a memory leak in Comma code like this:

my $iterator = $index->iterator();
while ( $iterator++ ) {
...
}

or:

if ( $iterator++ ) {
}

If you use the 'while($iterator++){}' or 'if($iterator++){}', then your Iterators objects won't ever get garbage-collected. This is very often not a problem; any stand-alone script will be fine, the Iterators will get properly DESTROYed when the script exits. But code like the above running inside, for example, a web application, can be a problem.

This works fine:

my $iterator = $index->iterator();
while ( ++$iterator ) {
...
}

The pre-increment may seem a little counter-intuitive, but the Iterator class is written to Do The Right Thing for this very common case. And the pre-increment doesn't trigger the memory leak. And this is fine, too:

Comma writes a line about all un-caught errors to a log file. The location of the log file is controlled by a setting in Comma.pm -- the default is /tmp/log.comma/. This file probably needs to be writable by any processes that use the Comma framework. In most installations, the file is made world-writable (which should tell you that the Comma log system isn't intended to be used as part of any security auditing or similar framework -- you should write additional code to handle any secure reporting that an application might need.)

Sure. We don't know of any faster way to develop (or to add new features to) large-scale applications that manipulate collections of hundreds of thousands of pieces of messy-but-structured information. We use it every day, and so do many, many people who access the web sites we build.

Oh, wait: you meant, "does it run fast?" Well, that's in the eye of the beholder. Comma's bottleneck is the parsing and object-ifying of XML files. The power and flexibility that the API gives you comes at some cost -- a hand-coded, special-purpose implementation could well be faster for any single usage.

However, we've worked hard to make Comma fast enough to be really, really, useful. For example, Comma's "Inline" parser is about twice as fast as the general-use XML parsers against which we've benchmarked it (because Comma documents aren't allowed to make use of all parts of the XML specification). An experienced designer of large-scale internet systems will easily be able to structure and tune a Comma-based system to serve hundreds of thousands of dynamic pages a day on mid-range x86 boxes.