Adventures with lightweight and minimalist software for Linux

html-xml-utils: A sweet suite

I’m in favor of any tool that can strip away the manure that masquerades as XML files. I have no earthly idea why anyone would use that style or arrangement voluntarily, especially when simpler and cleaner arrangements are so much … cleaner and simpler to work with.

So if you hand me a suite of 10 or 12 tools that scrape away at XML and HTML files, I’m like a kid on Christmas Day. Here’s html-xml-utils, which is just a toy box full of goodies. Which unfortunately means I can only show one or two.

hxnormalize, I imagine, improves readability for pages with frequent links. Go from this:

Of course, pipe hxnormalize into hxprintlinks, and some of that will be cleaned up a little.😉

If you remember xidel or xmlstarlet, you might remember how it’s possible to pull single elements out of an XML file, for further editing. hxextract can do that, and here are the results of hxextract command .config/openbox/rc.xml on my system:

Not pretty, but a step forward in terms of finding miscreant keyboard commands in my rc.xml file.😐

There is a lot more — a lot more — available in html-xml-utils that I just don’t have the time and resources to touch on. Look for tools that will convert from XML to asc files, tools that will build tables of contents and bibliographies for entire trees of files, and even a few that transpose tables or just pull out links. That one, hxwls, is mighty clever. …

I leave it to you to explore the rest of that suite. If you’re like me and can only scratch your head a the ascent of XML as a data format, this will be fun for you to play with.