Introducing Unixish

Unixish is a simple framework for creating data transformation routines that can be applied to arrays or streams. The data transformation routines can also be accessed via command-line, much like your usual Unix utilities. In fact, the Data::Unixish distribution comes with several clones of Unix utilities like cat, shuf, head, tail, wc, sort, yes, rev.

Creating a data transformation routine, called a dux function (where dux is short for Data::Unixish) is easy enough. Let's say you're creating a function called revline to reverse the characters of each line of text. You write a function called revline and place it in Data::Unixish::revline package.

Several things to note here. First, you accept arguments via a hash. The input and output are located in $args{in} and $args{out}, respectively. Command-line options will also be received as function arguments, hence the hash, since a rather complex utility can potentially accept many command-line options.

Second, you must always use each() to iterate over the array elements, instead of using for() or grep() or map(). Since via the magic of tie(), the array can actually be a stream (a long or infinite one, even), using the Perl 5's for/grep/map will slurp them all into memory. Using each() will nicely iterate each element without slurping.

Third, you must also add to the result using push() instead of assigning to them directly, e.g. $out = [1, 2, 3]. Since the output array might actually be a stream, push() is what you need to do.

Lastly, you return [200, "OK"] to signify success, like in HTTP.

After you write the function, you write a metadata for this function, to give summary and describe its arguments (if any). Metadata is written according to the Rinci specification.

But for convenience, there are a set of XduxY routines provided by Data::Unixish to apply a dux function to some form of input and return some form of output. The X prefix can be one a/f/l which determines whether function accepts arrayref/file(handle)/list as input. The Y suffix can be one of a/f/l/c which determines whether function returns an array, a list, a filehandle, or calls a callback.

You can also run your dux function via command line, using the dux program provided by the App::dux distribution.

% ls -l | dux wc -l
12

So in short, the advantage of using this framework is reusability: you only have to write one routine that can be applied to various forms of input and produces various types of output.

The current drawback is speed, it's not very fast due to all that tie() abstraction. If you want to process millions of items or more, you might want to write the routine with a more direct/low-level Perl. I first created the framework for Text::ANSITable to format table rows/columns, which typically won't number in the thousands/millions.