Oozing Caribou

Meet Oozie’s Workflows

Oozie is a workflow scheduler for Hadoop, but that’s not terribly important right now. What is important is that it defines its workflows using an XML dialect. And as all XML things go, the result is… shall we say, less than easy on the eyes and the typing fingers. As a piece of evidence, I bring you that simple example workflow part of the Oozie distribution:

Not the worst tag soup ever, I’ll admit. But still, that’s hefty on the eyes.

For the Love of the FSM, DSL That ML

At the core, the XML representation of the workflow is a fine thing. It’s very easily machine parsable and well-defined. It’s just not very friendly to us humans, and it’s one case where I think DSLs do wonders to abstract most of the tediousness and verbosity of the job.

Enter Template::Caribou, that toy templating system of mine. While its primary raison d’etre is HTML templating, it has been designed such that it’s friendly to any XML dialect. Indeed, with the help of a Hive tag library (currently available on the ‘hive’ branch of the GitHub repo of Caribou), here is how the workflow above could look.

PYTHIAN®, LOVE YOUR DATA®, and ADMINISCOPE® are trademarks and registered trademarks owned by Pythian in North America and certain other countries, and are valuable assets of our company. Other brands, product and company names on this website may be trademarks or registered trademarks of Pythian or of third parties. Use of trademarks without permission is strictly prohibited.