They Cracked This 250-Year-Old Code, and Found a Secret Society Inside | Danger Room | Wired.com

We know the rules and statistics of English: which words go together, which sounds the language employs, and which pairs of letters appear most often. (Q is usually followed by a u, for example, and “quiet” is rarely followed by “bulldozer.”) There are only so many translation schemes that will work with these grammatical parameters. That narrows the number of possible keys from billions to merely millions.

The next step is to take a whole lot of educated guesses about what the key might be. Knight uses what’s called an expectation-maximization algorithm to do that. Instead of relying on a predefined dictionary, it runs through every possible English translation of those Russian words, no matter how ridiculous; it’ll interpret as “yes,” “horse,” “to break dance,” and “quiet!” Then, for each one of those possible interpretations, the algorithm invents a key for transforming an entire document into English—what would the text look like if meant “break dancing”?

The algorithm’s first few thousand attempts are always way, way off. But with every pass, it figures out a few words. And those isolated answers inch the algorithm closer and closer to the correct key. Eventually the computer finds the most statistically likely set of translation rules, the one that properly interprets as “yes” and as “quiet.”

The algorithm can also help break codes, Knight told the Uppsala conference—generally, the longer the cipher, the better they perform. So he casually told the audience, “If you’ve got a long coded text to share, let me know.”

Funny, Schaefer said to Knight at a reception afterward. I have just the thing.

Search

Decyber

About

Dropsafe is the personal blog of Alec Muffett with occasional contributions from friends & occasional guest bloggers; it is therefore a blog populated entirely by the personal opinions of the author/s.
All original content hosted on crypticide.com - except where plagiarised from elsewhere quoted or reused - is licensed under CC BY-SA terms.