25 Best Hangman Words

A simple question from a six-year-old about hangman turned into another analysis obsession that made me play 15 million games of hangman recently.

Back in 2007, I wrote a game of hangman for a human guesser on the train journey from Oxford to London. I spent the time on the London Underground thinking about optimal strategies for playing it, and wrote the version for the computer doing the guessing on the return journey. It successfully guessed my test words and I was satisfied, so I submitted both to the Wolfram Demonstrations Project. Now, three years later, my daughter is old enough to play, but the Demonstration annoys her, as it can always guess her words. She asked the obvious question that never occurred to me at the time: “What are the hardest words I can choose, so that I can beat it?”

In case you don’t know, the idea of hangman is that one player thinks of a word and tells the other player how many letters it has. The second player repeatedly guesses letters. If a guessed letter is in the word, the word chooser must reveal the position of every occurrence of the letter in the word. If it is not, then the chooser takes great pleasure in drawing a component of a gallows with a man hanging from it. If the gallows and man are complete before the word is fully guessed, the second player has been hanged and loses. There are various designs of gallows and man; I learned on the one above, which has 13 elements, but I have seen many possibilities between 10 and 13, and there are probably others. I’ll call these the 10-game and 13-game. My design, the 13-game, is easier for the guesser, as he or she is allowed more mistakes before losing.

Why a hangman? I don’t know. It is claimed that the game dates back to Victorian England, when hanging was probably an acceptable punishment for poor spelling!

Here’s how I created these games. First, let me describe the algorithm that we are attacking. My hangman algorithm uses all available information to produce a list of candidate words. At first, the available information is only the length of the word, but later we will know some of the letters and their positions and also some of the letters that are not in the word. All three bits of information can reduce the dictionary very quickly. Next, the game does a frequency analysis of the letters in all of the candidate words (how many of the candidate words contain at least one “a”, at least one “b”, and so on). Our best chance of avoiding a wrong guess (if we assume that the word has been chosen randomly from the dictionary) is to pick a letter that occurs frequently.

At this point, it is worth introducing the Nash equilibrium from game theory. This is when opposing strategies are found for which neither player can unilaterally improve his or her outcome, even if the opponent’s strategy is known. Partly with this in mind, the algorithm doesn’t choose the most popular letter, but chooses any one of the possible letters weighted according to the frequency (e.g., if 1,000 candidate words contain “e” and 13 contain “x”, then “e” will be picked more than “x” at a ratio of 1000:13.). This is a first iteration toward a Nash equilibrium point; without it, our algorithm is entirely deterministic, so that any word that defeats it will defeat it every time. The opponent would optimize his or her strategy by choosing that word every time. The algorithm also makes the game more fun. My daughter’s question can be thought of as the next iteration toward the Nash equilibrium. Knowing the guesser’s algorithm, we are asked to optimize the weighting of how we choose words from the dictionary instead of the equal weighting that I had assumed.

(A little digression: I had the pleasure of listening to John Nash, inventor of the Nash equilibrium, Nobel Prize winner, and subject of the film A Beautiful Mind, talk about his Mathematica use at the fifth International Mathematica Symposium in London a few years ago. Every year, there is usually at least one Mathematica user in the Nobel Prize list, though sadly, few Nobel Prize winners are in Hollywood films.)

The easiest way conceptually to answer my daughter’s question is with a brute-force Monte Carlo analysis of every possible word. The first thing I did was to re-factor the code from the Demonstration to make it faster. Sifting a 90,000 word dictionary and doing the frequency analysis takes about 0.2 seconds in my Demonstration—instantaneous in an interactive game. But simulating an entire game can require up to 26 such choices, and since I want to simulate 15 million games, I spent a few minutes using the Profiler in Wolfram Workbench to understand where the time goes and was rewarded with a version that was about 10 times faster. This implementation is at the bottom of the post, if you want to repeat or improve on my analysis.

Then I ran it in parallel using gridMathematica. If I had been able to use the Wolfram|Alpha hardware, I would have been done in a few minutes, but I just have a couple of idle office PCs, so I left it to run over the weekend.

I did an initial run of 50 games for each word in the dictionary. Enough to converge to within 10% of the true outcome and enough for a rough ordering. Then I ran further trials on the more promising words, rising to a total of 3,000 games on the shortlist of 1,000 best words. Enough to be pretty certain of their ordering.

To save others from having to burn the CPU cycles, I have included the 50 MB of generated data here.

Now that we have this data, we can start analyzing it:

Here is the result that I get for the word “difficult”:

The data shows the number of wrong guesses in each of the 50 games. We can see that the word “difficult” is not very difficult, taking on average 3.3 wrong guesses—not enough to start drawing the man in my design. Out of 50 games, the algorithm never fails on a 10-game or even comes close to losing a 13-game. Though if it had played an 8-game, it would have lost once.

Let’s look at the overall performance of the algorithm on a word chosen randomly from the dictionary (the original assumption). We can’t look at average miss rates, since a game with 13 wrong guesses is equally a loser in a 13-game as a game with 20 wrong guesses. What we care about are win ratios, and those depend on the game size.

For example, if we choose “cat” in a 13-game, then we will beat the algorithm 23% of the time.

In a 10-game, we will beat it 50% of the time.

It turns out that for a 13-game, we will beat the algorithm only 1% of the time for randomly selected words. I can see why my daughter was frustrated.

Rising to 5% for the 10-game:

If the algorithm didn’t use frequency analysis at all, then the win ratios would be 10% for the 13-game and 25% for the 10-game (as a careless coding error taught me in the first run of the experiment).

Here is the distribution of game outcomes. Half of the time it makes 4 or fewer wrong guesses.

Which are better, long words or short? When I played my daughter, I used short words, as I had assumed they were easier (they are certainly easier for her to spell), but I was surprised to discover that the average mistake rate is highest for short words. The reason seems to be simply that the more the letters vary, the less likely a person is to miss them. In the extreme, a word with 14 different letters cannot win a 13-game. There are only 12 wrong letters out there.

So if we only remember one rule, it is to use 3-letter words. And the more pieces in the gallows’ design, the more this is the case.

But we are interested in the very best words, so here is the score for the best word of each word length:

With careful choice, the very best words of each word length are more evenly matched.

And interestingly, if we sort the words by win ratio, the very best words have dramatically better scores than those only a few places back down in the rankings. Each line here is a different game size from 9 to 13. The jaggedness in the lower ranking words is due to insufficient simulation data and not a real phenomenon in the algorithm.

OK, enough about the trends, here are the best words:

As you might expect, low frequency letters like “x” and “z” are a big factor, but letter repetitions are also useful, since they make longer words have a similar number of different letters as shorter words.

So there we have it: “jazz” wins most for all game sizes. Though we can see odd variance by game size. “Jazzed” does progressively worse as the game size goes up, but “faffed” does progressively better. Understanding that is another project!

We can now improve our word selection algorithm. Instead of choosing a word randomly, we should weight our choice toward words with high win ratios.

Of course, this is only one more step toward the Nash equilibrium point. If the guesser updates the algorithm to take into account that strategy, we will have to repeat this entire experiment, to get an even better strategy. Eventually the two algorithms would likely converge on a point where every word has the same win ratio, and we will know the optimal game outcome.

I suspect that the 13-game is essentially solvable. There are enough words that are easily guessed that taking more risks with those, to test the harder words, will improve the guessing algorithm from a 99% success rate to 100%. At that point, we are at equilibrium—in the words of WOPR, “A strange game. The only winning move is not to play.” (The WarGames reference is particularly relevant, since the Nash equilibrium was used as the theoretical basis for the Cold War nuclear strategy of mutually assured destruction, and the climax of the film was essentially this kind of simulation—with added computer self-awareness.)

For the 10-game, I learned only enough to see that the ultimate algorithm may be quite complicated and that there is more richness in this simple game than I had expected.

If you are more intent on fun, then pick the best of the long words. Here is a table of the best words of each length for the 10-game. They don’t do as well as the 3–5 letter words, but you can’t beat “powwowing”, “bowwowing”, and “huzzahing” for entertainment!

This is all based on the 90,000 word English dictionary built into Mathematica. Results may be very different for larger dictionaries or other languages.

Wouldn’t it be better to guess based on minimizing the number of remaining candidate words which you won’t be able to eliminate, rather than guessing based on trying to avoid wrong guesses? For example, if there are 1000 candidates left and 900 of them have an “a”, you might be inclined to guess it because you aren’t likely to be wrong. But it also isn’t very informative. If you know you have 8 wrong guesses remaining and you can plan a set of 7 or fewer letters that such that if the inclusion/exclusion value of each letter in the set were known then it would be possible to uniquely identify a remaining candidate word, that would seem to be the best strategy from that point forward. It may be for some reason (but it’s not immediately clear to me why, if so) that a greedy strategy is just as good (and it has the advantage that correct guesses are free).

The reason picking a likely letter is best here is that you don’t just get a “yes or no” answer back from the word chooser; you get the location of every occurrence of that letter. You can use this information to eliminate quite a number of options. So maybe, building on your idea, a “most successful” deterministic algorithm might be to choose a letter that is fairly common, yet also tends to appear uniformly throughout each of the words it appears in, so that it’s pretty much guaranteed to eliminate at least 75% of the words of that length.

Very interesting… however I wonder if the computer guesser could do better. I think choosing based on letter frequency isn’t really ideal. It would be better to choose based on the amount of *information* that it estimates will be revealed by each choice.

I just threw together a quick implementation of this in C++ (not the best prototyping language, but at least it runs quick) I describe it (and also discuss possible other improvements) if you’re interested in taking a look: http://bodyfour.livejournal.com/54013.html

@Douglas McClean: The problem with this logic is that Hangman is an asymmetric game. If you guess wrong, you only get the information that all of the words containing your guess are wrong–but if you guess right, not only do you eliminate all of the words that don’t contain your guess, you’re also given information about where in the word your guess belongs, which allows you to eliminate many more possibilities. So in the example where 90% of words contain ‘a’, if you guess ‘a’ and get it wrong, you just reduced the dictionary size by 90%, but if you guess ‘a’ and get it right, you know where in the word the ‘a’ falls, which allows you to eliminate much more than 10% of the dictionary (and you don’t lose any guesses to boot). Hangman rewards “playing it safe” pretty heavily.

Very interesting. I remember in school that we learned the trick of using “lynx” to catch people out – but then everyone caught on. I also remember using “onyx” on an unsuspecting friend who was annoyed because they didn’t know the word.

When I want to play hangman to win, I tell the guesser “4 letters”, with the word “junk” in mind. But then I cheat. If the guesser goes for “j”, I mentally change the word to one of “bunk, dunk, funk, gunk, hunk, lunk, punk, sunk”, depending on which letters they’ve already guessed. I let them guess the u in the second position. If they guess the “n”, I mentally switch to something like “bump, dump, hump, jump, lump, rump, sump”, assuming “m”, “p” and the other letter hasn’t been guessed. And so on. There are many, many words with u in the second position (grep ‘^.u..$’ /usr/share/dict/words), and the remaining letters among the lower-frequency set.

So an interesting problem would be: for words of N letters, what is the sequence of guesses that most quickly forces the full word to be complete? Every letter guessed eliminates words containing that letter, until toward the end any letter will be included in some of the words, so the guesser wants to choose the letter that eliminates the most words.

@Joel
The algorithm implicitly does address common letter groups, because they skew the frequencies. eg if you look at a standard ending like “ing” in the case of 7 letter words… there are 363 words ending in “ng” out of which 346 end in “ing”, giving “i” a huge boost in the frequency count.

@Douglas & Mitch
I gave quite a lot of thought to the issue of expected dictionary reduction and I am sure that it is important in the “ultimate” algorithm, but as Mitch’s response blog points out, a perfect algorithm will require a full tree search lookahead which will be very expensive (26! branches though many can be discarded).

In the extreme case eg the 1-game, or the last remaining move, an entropy based algorithm is clearly the wrong thing. It doesn’t matter how much you learn from your go, if you don’t stay alive, you lose. For the 26-game it is obviously the right way – you don’t have care about lives and eliminating words will get you there sooner. Where the break-points or balance are, I don’t know.

The batter algorithm will trade-off off some of your spare “lives” by taking riskier entropy based guesses in return for a better overall average. This is what I hinted at in the “solvability of the 13-game”, where there is, on average, plenty of spare life to risk.

What I couldn’t resolve was how to calibrate that trade-off without lots of simulation or implementing a dynamic pruned-search look-ahead. All too much to write in a train-journey!

If you run the analysis on your algorithm, Mitch, I will be fascinated to hear the results.

I think that ‘faffed’ improves overtime because of the double-f in the middle. Once your algorithm correctly guesses the letter ‘a’ in the word, the use of a double-z is certainly more likely than the use of a double-f. Even though the word has three ‘f’s in it as opposed to two ‘z’s, only one f is actually likely to be present while both ‘z’s are. Therefore z would be more likely to occur in the word with at least one vowel guessed.

Instead of searching just 1 move ahead, I got slightly better results looking several moves ahead. With my own word list, instead of narrowing down to 8 words with 13 guesses, I got down to 6 words. See here.

Is Mathematica’s word list, or a rough equivalent, available anywhere?

Thanks for the list. When I ran through it, the most difficult part was that there are 13 four-letter-words that end in “ine”, and 13 four-letter-words that end with “ays”. If we want to find all of them, we can afford to waste a bad guess on any other of those letters until we know we aren’t in those paths.

It also means that 12-hangman is easily proven unsolvable.

I did trial-and-error and found out if I lead with S, N, L, and A, the default algorithm of guessing the most common letter works from there on out.

An interesting way in which this computer simulation diverges from having a human guesser is that the computer is equally aware of all words in its dictionary. A human guesser will be hard-pressed to come up with “syzygy” if that word isn’t in their natural vocabulary but the computer guesser will have no harder time with that then any other word with similar letter frequency.

Of course, coming up with a list of the best words to use against human guessers would require playing thousands of games against a human guesser – something that would take considerably more time than a computer guesser on a distributed system.

An intriguing project, then, would be to set up the testing system on a website that human guessers can log into and help run the test games. A little demographic information could even be collected so that, after gathering enough data, it could even provide a list of “the best words to use against a female player between the ages of 21 and 30 living in the Pacific Northwest” or other such granular silliness.

It would be interesting to find out that certain words are guessed more easily by people from certain socioeconomic or geographic groups. And the knowledge could be used in real life to earn free drinks in bar bets.

your program can’t actually think like a human, that’s the problem. words like “lynx” or even “sphinx” are much harder than half of those. compare lynx to jinx, which one do you really think is harder?

In my area, we always played Hangman with only six guesses. We start with the gallows complete, and draw only the head, body, arms, and legs. Ten guesses sounds luxurious. I guess having a smaller vocabulary makes it harder to pick words people don’t know.

I think a big part of the fun of playing Hangman is trying to pick a word you don’t think your audience is familiar with. This sounds a bit too difficult to model, though, since “weighted averages based on personal background”.

The bit about shorter words being better is great, though. I’ll remember that for when I want to beat some smart people sometime.

I’m slightly impressed by this blog, but not very. TBH, it doesn’t model anything I regard as real hangman. Real players recognize letter patterns. Real players don’t pick three- and four-letter words except to be obnoxious. Real players don’t get 14 guesses. Real players get mad at you if you pick strange slang words they’ve never heard of like ‘faff’. Real players will be taken in by vowel-loaded words. As an exercise in analyzing what words a brute force A.I. will have trouble with, this is a rousing success, but in terms of real gameplay, it falls flat considerably.

I was thinking that myself. While “jazz” would still be hard for a human player to get (he’ll probably never get around to guessing z), it’s much more doable than a word that the player just doesn’t know.

I wonder, though, if it would be possible to simulate this, rather than playing thousands of games against real humans. The guesser (or both the guesser and chooser) could be given a reasonable human vocabulary. You could even weight each known word with the probability that the player thinks of it–even if I know the word “polydactyly,” it might not come to mind while guessing at Hangman.

The question then would become where to get the vocabulary list and probabilities. Maybe you could feed it a bunch of human-written texts, and it could extract word usage stats to make up the list?

On a sidenote, my favorite Hangman word has always been “cwm.” “Phlegm” is a good one too. Of course, once you use an unusual word on someone, they’ll remember it and you can’t use it again on them.

Even though I’m ordinarily quite competitive when it comes to games, my favorite thing about hangman is the opportunity to extend the drawing long after the “man” is complete, by adding more elaborate and silly details to the scene — sort of equivalent to saying “you’re almost in trouble, mister! Eight, nine… nine-and-a-half… nine and two-thirds…” But with less stress. Also, I guess it’s a way to both win and pridefully demonstrate graciousness.

It strikes me that Hangman is a codebreaking exercise, and I wonder if the name relates to this. While figuring out ways to solve hangman may not be good for breaking codes, looking at words that win hangman could help create robust language for transmission of encrypted codes.

There’s only one problem; after getting enough misses, people often try using rare and/or early letters — and for much of your list that’s a disaster.

As a result, if I want a hard word I generally uses “HIGH”; the standard technique (vowels, then common letters) gets enough misses that people usually try switching techniques before hitting H or G, only for those to fail as well. It also prevents people complaining that a rare/foreign word was used. ;)

@Collin
I think it is the fact that a very large number of longer words end in “ing” or “ness” means that discovering those letters does little to reduce the list of candidate words. On the whole words with unusual structure are easier to get. eg Words like syzygy are found easily once you know that you have six letters with no vowels, because there are hardly any words like that. (It works well on humans, because it is obscure enough that we forget to include it in our mental list of candidate words).

Thank you for a very interesting article. I came across it while reminiscing about a project I did in College; I wrote a similar hangman guessing game back in the late 1970′s or early 80′s s as a computer science project.

My algorithm went like this:
1. Guess the most common letter from the list of n-letter words in the dictionary (where n is the length of the hidden word). If more than one letter has the same frequency, guess the most common in dictionary frequency.
2. Filter out non-matching words as letters are guessed.
3. When the list of matching words is exhausted, start doing pattern matches against the entire dictionary, beginning with a substring of n-1, then n-2, etc. until some matches are found, then guess the most common letter.
4. If no matches are found, guess the next most common letter by dictionary frequency.

The interesting thing was the program started with no words, and added new words to the dictionary as it won or lost. Over time, the program gets better and better at guessing and some of the guesses appeared “insightful”.

As an example, the computer had “fixture” and “mixer” in its dictionary. I wanted to try the word “mixture”. The computer quickly guessed “-i-ture”, then tried ‘f’. At this point, it doesn’t know the word, so starts pattern matching. It next guessed “x” based on the longest match “-ture”. Finally, it guessed ‘m’ because it had -ix. I thought this was impressive with it getting only one incorrect guess.

At the time, computers were not powerful enough to run through thousands of words, so I don’t think I had more than a few hundred words, but the concept was interesting and the learning aspect gave it an illusion of intelligence.

my favorite all-time word was “powwow.” of course all those difficult four-letter words were favorites, too. and non-vowel words work well, e.g., nth, YHWH, etc. we played with the rule that, if the guesser had never heard of the word itself (such as axolotl, syzygy, siamang) it didn’t count as a loss. And no proper nouns such as La Jolla. cwm? never heard of it.

Use different languages or even use words that are used less often such as oodles extreme or rather. use brands such as reddit bing google or twitter. or food ingredients such as rosemary chili-powder or thyme. use names less common such as Pippen or Joris. use states and countries mississippi pennsilvania and stuff. if you are studying something like chemistry or engineering use a harder word used little look up the most uncommon animals plants or places and use those they all work you just have to be creative and i have to go because i was taking a break from a book report hope this helps! cya! ~Hanger