Now, imagine you want to parse this data so that your application can read it. Here’s a regular expression I nabbed from a Stack Overflow answer which grabs information about the individual ping responses:

On the surface it seems like a perfectly acceptable solution. Until, that is, the maintainer of the app changes the formatting/wording of the output. Not to mention the slight overhead of running a regex, or even the effort it takes for the app to take this output, make it pretty for a human, just to have us convert it back into a format for machine consumption. But really, the biggest overhead IMO is the time it takes to build a script, regex, or grep command to parse such output.

JSON Output

Now, imagine a world where a large subset of Linux command line utilities have a magical –json (or –xml, –yaml) option, which takes that normally human-readable output displayed via stdout and instead renders it as the specified data transmission language. Here’s an example of our previous ping command converted to JSON:

Suddenly, we have an amazingly simple output format for parsing! Simply fire up your language of choice’s JSON library and parse that output. We could even possibly take the contents of stderr and display it in the same output, but in an error node.

Pitfalls

I’m not sure of a good solution for handling ‘streaming’ output, e.g. a ping command without the -c option. Perhaps we could actually display several complete JSON documents, maybe requiring a delimiter of some sort , but at this point we are no longer dealing with pure clean JSON and are instead starting to develop a JSON superset.

This also wouldn’t make a lot of sense to be used with interactive commands, or really anything that needs a Ctrl+C when complete. So, for this reason, it would probably mean that –json is limited to being used in conjunction with other arguments, and that it cannot be used with all CLI apps.

Arguments Against

What if the author of the app changes the structure of the JSON document? The applications digesting the output of the app would still need to be re-written, but grabbing a different data node is usually easier than getting familiar with the new output and writing a different parser.

Why use JSON? Well, JSON is a pretty commonly used data format (at least on the web, it isn’t too popular among Linux utilities). Importantly, JSON can be easily converted into different formats such as XML or YAML. One could write a utility and pipe the output through it, such as “ping -c 4 google.com –json | json2xml”. Or, even better, the author of the app could provide switches for the different types of data.

What about the Linux philosophy of having simple, human readable output? We would still keep the default, human readable output. The –json flag would simply be an option that the maintainer of the app would add as an enhancement.

My app already has a –json switch for doing something else. The –json flag is just an example, really. There doesn’t need to be a universal switch for enabling this output (although it would be nice). The overall goal of these switches would be to make parsing of the output of the command built into the actual app instead of being external.

Inspiration

Of course, I wasn’t inspired by the ping command to write this (with its simple output), but instead the output of the command iwlist wlan0 scan. I’ve been writing a parser using node.js, and the code is a bit scary looking (as I feel most parsers must be). Here’s an example of the output being parsed (the Cell’s repeat once for each network):

Thanks

Thanks for giving this a read! If you know of any possible solutions to the ‘streaming’ output, if you can think of a better output format, if you know of a similar initiative, or even if you think I’m an idiot, please let me know in the comments below.

Update

Thanks for all of the great feedback on Reddit (and even a little bit on Hacker News). It is interesting to see how much of a controversial subject this is. One one side, there are the *nix purists who have been writing programs for years, and have always understood the plain-text Unix philosophy. On the other hand, there are the (probably younger) half who love the thought of being able to easily read the output of commands without the need for complex parsing instructions.

I think the biggest issue is the purists hating the thought of JSON; the new-comer / web-centric data format I mentioned in the title and ping output example. Again, that was just an example language. Most of us just want a standardized method for parsing command output, and surprisingly, we really don’t care what that format is, as long as it is easy to work with.

The best example someone had of this type of thing already being in the wild is the emacs formatted output for the ls command, as pointed out by ISV_Damocles on Reddit. If there is an output format for the most commonly used Unix command just to be consumed by one program, why not have a standardized output option easily digested by any program?

Thomas is the author of Advanced Microservices and a prolific public speaker with a passion for reducing complex problems into simple language and diagrams. His career includes working at Fortune 50's in the Midwest, co-founding a successful startup, and everything in between.

YAML! YAML was designed to be streamed. It’s also more compact than JSON, equally as machine-readable, can easily be converted in to JSON/XML and, unlike JSON, can not contain arbitrary JavaScript!

Jeffrey Gipson

I kinda vote for idiot, on this one. It’s easy when one is working entirely in the web world, or a high-level scripting world, to get a little tunnel vision. JSON is for JavaScript. Text output is for humans. Most of the tools you are describing are for humans. If you are on a Linux system the /sys and /proc file systems exists to provide highly structured, predictibly formatted information about things like network interfaces for the consumption of programs.

Honestly, I have no idea what dbus is or what it does, but it does sound like it is exactly the sort of thing I should be doing. Thanks for the information, I’m going to look into it.

Richard

I get where you’re coming from as i work with databases and web-apis. But cannot think of a single case where this would be needed (and made sense). In this output fe there is the hostname echoed you put it before.

stdout is made for simplicity, not for bunches of new (and different) layers and complexity to work with data. There is no fun in piping outputs using different xml parsers and getting headaches from escaping lots of special characters. You use bash commands for bash scripts, not to store your streamripper output in a database. If you want the function, get the source, if you want data use the often provided api or simply parse but often there is a better way to get where you want to go.

Ask yourself what you really want to do and don’t do it along something like ?php exec and storing parsed output in databases. You’re on the wrong way to solve your problem and asking others to cover for your 0.001% use case of their product. Any program that makes sense to store their data in databases has an api or you can query the db yourself.

Don’t get me wrong, i dreamt the dream of structured data myself, but often there simply is no need and you narrow your mind if you only think in nodes.

Although this doesn’t yet exist in the *nix world, check out the strides that Microsoft has made with PowerShell. I think that the ideas therein tackle many of the motivations discussed in your post. Commands (cmdlets) output objects. Objects can be pipelined into other cmdlets. The console output will be human-readable, or use a method against the final object to format it as you like. Want output a CSV? As HTML? Simple. Many new enterprise applications (Exchange, SQL, VMWare, etc.) have Powershell interfaces.

Instead, how about a “jsonator” utility that could exist at the end of the pipe chain and understand the output of common unix utils

ifconfig eth0 | jsonator -f ifconfig

There might be other flags to identify how the piped input should be handled and different utils could register their human-readable formats through a plugin system rather than having to reinvent every single unix util itself.