Thursday, September 25, 2008

Donald Knuth And The Complexity Of Songs

Donald Knuth is a living legend among computer scientists. His monumental work-of-life "The Art Of Computer Programming" (I was never able to fully follow it) is a standard reference/textbook/work of art for computer science. Mr. Knuth took a new, immature geek-only hobby and transformed it into a solid and complete scientific discipline.

Among his contributions, the systematic analysis of the complexity of algorithms, really stands out. As a concept complexity is very familiar to developers. It refers to how a specific algorithm or algorithmic solution to a problem scale to the 'problem size'.

If his work is a reason to admire him, his humor is a reason to love him. Recently, I had the most splendid time reading his hilarious paper titled "The Complexity Of Songs" ! Here is the story.

A song has some lyrics which we have to remember in order to sing it. Humans are trying to learn lyrics of length s (space complexity) to sing a song of length n. You would expect mathematically that s ~ n, but Knuth investigates all human 'inventions' to reduce the song space complexity!!

After a series of 'theorems', Knuth proves the existence of songs with s= a.n, a<1, s="O(sqrt(n))," s="O(logn)">and finally.....s=O(1)!! Yes, you read well.

Knuth argues the first step was the invention of the refrain, the repeated part of a song. If the song has m verses of length V and the refrain has length R, then the song length is roughly n = V.m+R.m and the complexity s = m.V+R. Hence we have a reduction of V/V+R in song complexity. So, a refrain is also a tool to save some memory space to remember the song!

Remember the old song "Old Mc'Donald had a farm, Ei-gh,Ei-gh,Oh!" ? Well this has a complexity s = O(sqrt(n)). It is much like the Greek song "Οταν θα πας κυρα μου στο παζαρι...". In this pattern, each verse includes all previous verses. For m verses n = o(m^2) and s = O(m), so s = O(sqrt(n)). This means that it is much easier to remember! That's why they are most suitable for children! A same pattern can yield a log complexity.

Now to the best part, Theorem 2 is the real killer! The introductory text goes as follows:"However, the advent of modern drugs has led to demands for still less memory, and the ultimate improvement of Theorem 1 has consequently just been announced: