2011/11/07

I was intrigued by the various twitter feeds, allegedly owned by factions of the Anonymous group. Intrigued because it looked like the messages were encrypted. I asked the cryptographer Endre Bangerter, FH Bern/Biel, to help me out and he forwarded me to one of his reverse engineering wizards, David Gullasch (https://twitter.com/#!/x0n0x), who found out that it was not what I thought it was. Here's David's analysis:

j0i8Ct Xl The most obvious observation is that it consists of alphanumericcharacters (a-z,A-Z,0-9), only. Therefore it can't be base64 encoded – more probably some sort of base62 encoding. Because log2(626) = 35.7251is a somehow weird value, the six character blocking does not make muchsense, if one assumes a binary encoding below the base62 layer.

The next observation come from the character frequencies: the character'A' is much more likely to be encountered than any other character.This statistical anomaly does not stem from long runs of "AAA...A",these runs are only present in the first and last messages shown above.The many 'A' sprinkled all over the place turn out to be periodic andsuggest a different blocking scheme as follows:

AAAAAAAAAAA4NcPhvjVqKmBOlrbGYFWFvtYc9FeFPlXAHsv8cp7dLGVwJMhtsz7tNaOCDebL3XyHL94NrD6bxC
ALJvRUoSl9jpywkA9JJg5YcQSHamT4ACuGMJGojDuar
...
AEcMt5AkEka055azHoxuRhPlEXh5PCm28LjtLo5bzoe
AAAAAGMt1IvWbjfNp1d6lLyZiyJAKMquAT8wSuxpOji
AAAAAAAAAAAAABlta7WXyEOism4GD7zKKwtj0i8CtXl Now the truncated final tweet "j0i8Ct Xl" also makes sense, because itexactly completes a 43 character block. Also log2(6243) = 256.03 is much,much nicer and suggests a base62 encoding of 256 bit blocks.

The third hint comes from the statistics of the second character ineach block: it always is in the range A-O. Factoring in this fact andthat the first character is always 'A', we get a block entropy oflog2(15*6241) = 248.03, suggesting that one byte in an underlying binary32 byte block must be fixed.

The next step is to find the correct base62 decoding. In the spirit ofBenford's Law (integers are more likely to start with lower digits),we guess that 'A'..'Z' map to the values 0..25. Also, we guess that'0'..'9' and 'a'..'z' map to contiguous ranges. With these assumptions,the final parameters for the decoding can be found by trial and error: 'A'..'Z','0'..'9','a'..'z' map to 0..61 and the blocks are big-endianintegers, which can be decoded like in the following example: