Kolmogorov Complexity is fucking brilliant

by Sebastian Benthall

But first: if you think about it, we are all basically becoming part of one big text, right? Like, that is what we mean when we say “Big Data”. It’s that we are rapidly turning everything into a string of letters or numbers (they are roughly the same) which we then reinterpret.

Sometimes those strings are people’s creation/expression. Some of them are sensed from nature.

We interpret them by processing them through a series of filters, or lenses. Or, algorithms. Or what some might call “perceptual concepts.” They result in taking action. You could call it cybernetic action. Just as the perception of pain causes through the nerves a reflexive leap away from the stimulus, the perception of spam through an algorithmic classifier can trigger an algorithmic parry.

Turns out, if everything is text that we process with computers, there’s a really amazing metric of complexity that applies to strings of characters. That is super. Because, let’s face it, most of the time when people talk about “complexity” they aren’t using the term very precisely at all. That is too bad, because everything is really complicated, but some things are more complicated than others. It could be useful–maybe absurdly useful–to have a good way to talk about how differently complicated things are.

There is a good way. A good way called Kolmogorov Complexity. This is the definition of it provided by Wolfram Mathworld:

The complexity of a pattern parameterized as the shortest algorithm required to reproduce it. Also known as bit complexity.

I am–and this is totally a distraction but whatever–totally flummoxed as to how this definition could ever have been written. Because half of it, the first sentence, is the best humanese definition I’ve ever heard, maybe, until I reread it and realized I don’t really get the sense in which they are using “parameterized.” Do you?

But essentially, a pattern’s (or text’s) Kolmogorov complexity is the length of the shortest algorithm that produces it. Length, here, as in length of that algorithm represented as another text–i.e., in software code. (Of course you need a reader/observer/language/interpreter that can understand the code and react by writing/showing/speaking the pattern, which is where it gets complicated, but…) Where was I? (Ever totally space out come back to wondering whether or not you closed a parenthesis?)

No seriously, where was I?

Ok, so here’s the thing–a pattern’s Kolmogorov complexity is always less than or equal to its length plus a small constant. So for example, this string:

If you know anything about programming at all (and I’m not assuming you do) you should know that the program print("Hello World!") returns as an output the string “Hello World!”. This is traditionally the first program you learn to write in any language. If you have never written a “Hello World!” program, I strongly recommend that you do. It should probably be considered a requirement for digital literacy. Here is a tutorial on how to do this for JavaScript, the language that runs in your web browser.

Now take for example this totally different string of the same length as the first one:

You might notice something about this string of numbers. Can you guess what it is?

If you guessed, “It’s really simple,” you’re right! That is the correct intuition.

A way to formalize this intuition is to talk about the shortest length of its description. You can describe it in a lot fewer than 100 characters. In English, you could write “A hundred zeros”, which is 15 characters. In the pseudocode language I’m making up for this blog post, you could write:

print(repeat("0",100))

Which is pretty short.

It turns out, understanding Kolmogorov complexity is like understanding one of the fundamental secrets of the universe. If you don’t believe me, ask the guy who invented BitCoin, whose website links to the author of the best textbook on the subject. Side note: Isn’t it incredible that the person who doxedo Satoshi Nakamoto did it through automated text analysis? Yes.

If you are unconvinced that Kolmogorov complexity is one of the most amazing ideas ever, I encourage you look further into how it plays into computational learning theory through the Minimum Description Length principle, or read this helpful introduction to Solomonoff Induction a mathematical theory of inductive inference that is as powerful as Bayesian updating.

Like this:

Related

One Comment to “Kolmogorov Complexity is fucking brilliant”

Don’t get too wrapped up in string-binary-textuality: Just because it is a predominant model of literacy doesn’t mean the semiotic (il)literacy issue should be ignored — especially since it is a second tier telematics big data commerce field with especially significant social computing effects for distributive finance and systems constraints requirements where predication of orders can be either disastrous or self-stabilizing in the malleable rapidities of data machine signatures and the problem of labor telematics transparencies especially regarding those who do not want to confront the effects of technological totalizations as a measured lever of powers on the treads left by skull-cracking jackbootism.