XML-Tiny reviews

Most of the time, when presented with so-called XML, I'm parsing "tag-soup"... not XML. This module is better than what I usually do: /<tag[^<>]+>/. Java programmers probably won't like it, because they like BFs, ... but this *is* CPAN.

I'm no wizard when it comes to XML, and I just needed to deal with some simplistic data (specifically, the REST API for Facebook - see developers.facebook.com/documentation... for an example). XML::Tiny gave me exactly what I needed to deal with it with the absolute minimum of effort. Thanks Dave!

The documentation is pretty explicit that it supports a subset of XML, and it's pretty explicit as to what that subset is. Presumably for cases where you are in control of the XML, this will do fine.

That said, I usually seem to be working on systems that already have some other XML parsing module installed, so I've not rarely had need for a lightweight alternative. So I've never had to the chance to use it for anything serious.

It's useful for what I'm using it for! I don't need a full-on XML parser to parse a few tiny XML files, and from what I understand, that is the point of the whole subset of Tiny modules anyway: give a small subset of functionality for the small jobs. These jobs are ones where I will have control over the input, so most of Aristotle's concerns aren't applicable. The one concern that remains is the unescaping-all-entities problem, and that can probably be fixed by me submitting patches at a later date if I need it.

This is silly. The module ignores important aspects of XML; if you use it, and you do not control the code on the other end of the wire that generates the â€œXMLâ€ which you want to process with XML::Tiny, you are on your own. It may work now, but you have no idea when the other party might change their code to use legal aspects of XML that this module will choke on.

Furthermore, it will fail to reject many documents that are not valid XML, so the language it implements is not a proper subset of XML.

Additionally, the authorâ€™s design choice to unescape a select few entities (most importantly &amp;) but no others means that any text content you retrieve from this module is mangled so that it is impossible to recover the original text faithfully.

If you want to process XML, but you can only install pure-Perl modules and you want to avoid the hassle of a large dependency tree, then I recommend that you use XML::Parser::Lite instead. That is a single Perl-only module whose author elected to invest his time in the implementation of a complete XML parser rather than into docs justifying his codeâ€™s brokenness and complaining about the XML policeâ€™s detachment from reality.

Since this module already has users, the best strategy for the future would probably be to re-implement the XML::Tiny interface in terms of XML::Parser::Lite as well as such is possible (but breaking compatibility where necessary for correctness), and then make XML::Tiny a stub module in the XML::Parser::Lite distribution.

I'm not sure I like this module. The XML folks spent a lot of time worrying about things like character encodings (that's why <?xml has to be at the beginning of the file; the parser uses it to determine enough charset information to be able to parse the charset= part), escaping, etc. It is really useful to parse an RSS feed when all the entries containing '&' are going to read '&amp;' instead? Is ignoring CDATA really going to work on any real RSS feeds?