Ruby and XML not-so-simple?

Man, I think I’ve been reading too much Sam Ruby lately (ok, that was a year ago, but not much has changed). You have to admit, though, that XML handling in Ruby is one of those things that just doesn’t feel quite right. REXML is pretty much the standard API for Ruby, yet it suffers from two showstoppers in my opinion:

In Ruby 1.8.4 it still has the glaring hole Sam mentioned last year with well-formedness. (No exception raised below!)

The REXML::Text#to_s method violates the principle of least surprise. In just about every other XML parser written, when you ask a
text node for its contents, it returns you the value with entities resolved. Not so Text#to_s. You have to call Text#value
instead. Unfortunately, this would be difficult to reverse in future versions of REXML without breaking existing apps.

This second problem manifests itself in subtle ways. If you’re calling Element#text (which is probably the most common way), you’re fine, because it implicitly does self.texts.first.value under the hood. But if you want to make sure you’re grabbing all the text content, you might be inclined to write element.texts.join('') to concatenate them together. But this method bypasses the value method and instead uses to_s, leaving you with unresolved entities.

It turns out this problem is exhibited in the version of XmlSimple now included with Edge Rails as of rev 4453. So if
you’re living on the edge using the newly minted ActiveResource fetching XML from remote resources like a champion, you just got benched
as soon as you tried to fetch XML that had normalized entities inside.