While git and Github brought tremendous improvement to how people can manage changes and collaborate on their software and simple textual content, most people still work with various binary formats and could benefit greatly from a similar boost in the way they work.

This effort would not only make Github much more useful by making binary file comparison possible, but also contribute hugely to efforts like the semantic web by compiling a library of the best open source parsers (similar to Linguist for code) which can be used by anyone to analyse binary files.

Of course crunching through massive files would be very costly, but depending on complexity and usefulness each format could be processed to a various degree.

1. Gather metadata

As a first step only metadata would be extracted from the file. This would give a high level idea of what changed, making it possible for people to do quick sanity checks whether the changes look right.

This might also mean improving the diff UI to accommodate for these richer comparisons which could also benefit code, but I’ll leave that for another post. Suffice to say that pull requests should be much more suited to (near) real-time collaboration, and not just on code. And that using semantic understanding to go beyond a simple text diff should be used for visualisation (and the tools are actually already available for code via the syntax definitions in Linguist used only for code colouring).

It’s difficult to foresee what kind of possibilities these would bring, but I think they could have to potential to bring open, global collaboration to a lot of new fields like design, engineering, music, etc.

Also it feels the time is getting right for this, with technologies like containers which enable using (often compiled) tools for processing these binary files with much less hassle when it comes to setting environments up with all dependencies installed. If you thought Docker is only for deploying web services, think again.