All you need now is a list of words to act as possible keywords. If you have a dictionary file on your computer (perhaps in /usr/share/dict/), you can use that. If not, Peter Norvig has links to several word lists, mainly originally for use by wordgame players.

And that works, but it takes a long time. This is to some extent expected, as there are a lot of words to try. However, if you look at a system monitor while this code is executing, you see that all the work is being done by one CPU core. In a world of CPUs coming with four, six, or more cores, it's inefficient to leave three-quarters or five-sixths of the computer idle while looking for keywords.

Python (and most other programming languages) are single-threaded by default and you have to use some kind of additional feature to allow multi-threaded programs. This isn't helped that, in the general case, you have to write programs that understand how other threads can modify the values stored in variables while this thread is running.

Luckily for us, this cipher-breaking example doesn't require much in the way of interacting threads. But to understand how we're going to use multiple CPU cores to break this cipher efficiently, let's step back a bit.

General pattern: map

Conceptually, we have a long list of possible keys to try. We also have a "scoring function" which takes a key and a ciphertext, and returns that key and the score of that key (for how much the proposed plaintext looks like English). One we have our list of (key and score), we can find the element of that list with the highest score, and that gives us the best key.

This general pattern of applying a a function to each element of a list, and generating a list of results, is so common it has its own name: map. Most programming languages have a function called map which takes a list of items and a function to apply to them, and returns the list of results. (This makes map a higher order function, because it's a function that takes a function as an argument.)

The important thing for us is that each of these scoring operations is entirely independent of each other: if we had a thousand keys to try and a thousand CPU cores to use, we could get each core to work on one key, and the different processes wouldn't need to communicate with each other at all.

But the star operator unpacks the tuple into the individual arguments for the function[^1]:

[^1] You can also use the star operator on function definitions to allow functions to take arbitrary numbers of arguments. When called, the arguments are packaged up into a list for use inside the function body.

>>> add3(*triple)
12
>>> add3(1, *pair)
16

It even gives the right answers!

Putting it all together

We can now put all this together with the starmap function in the multiprocessing library.

We define a helper function that takes all the parameters we want, scoring the effectiveness of this key on this ciphertext:

This is called by the keyword_break_mp function (mp for "multiprocessing"). This function just assembles all the arguments for each helper into a list of tuples called helper_args, then uses starmap to do the work.

And that's it! The multiprocessing.Pool() takes care of spreading the work across the different processors, and the max() function with the key parameter picks out the cipher key with the highest score.