The Longest Word Ladder Puzzle Ever

UPDATE: The solution to the puzzle and more comments from Jon have been added at the bottom of the post.

On the long flight to the recent Wolfram Technology Conference, I ended up on the puzzle page of a newspaper. My attention was drawn to a word ladder puzzle, where you must fill in a sequence of words from clues, but each word differs from the previous by only a single letter. Here, for example, is a simple puzzle already solved:

best

from a position of superiority or authority

bast

strong woody fibers obtained especially from the phloem of from various plants

bash

a vigorous blow

bath

a vessel containing liquid in which something is immersed (as to process it or to maintain it at a constant temperature or to lubricate it)

math

a science (or group of related sciences) dealing with the logic of quantity and shape and arrangement

I wasn’t going to do a blog entry on this, as it is a very similar task to my “Exploring Synonym Chains” post that I wrote some time ago, but that changed with a chance conversation at the (excellent) Technology Conference. Proving that one never stops learning, Charles Pooh, one of our graph theory developers, pointed out to me that my synonyms item could have been done much better. I had broken one of the very rules that I wrote about in my “10 Tips for Fast Mathematica Code” entry—”Use built-in functions.” I had effectively re-implemented the built-in Mathematica commands GraphPeriphery and GraphDiameter.

So, armed with these two new functions, let’s find the longest word ladder puzzle that can be made using Mathematica‘s English dictionary.

Some word ladder puzzles allow you to add or remove letters, but I am going to look only at the version where all words are the same length. Knowing that there can be no connections between different lengths means that we can be more efficient by considering each word length separately. So I start by generating dictionaries of words of specific lengths. Because I intend to use the words’ definitions as my clues, I will only look at words where I know at least one definition for the word, and I will also exclude words that contain non-letters or capitals (e.g. hyphenated words and proper names). The function caches its result, as we only want to do this once:

Next we construct the graph of words that have an edit distance of one, that is, only a single letter difference. The conceptually simple way is…

…but we are unnecessarily checking the edit distance in both directions, when we know it is symmetric. And we are also including “aloof” words (completely unconnected—so named by Donald Knuth because “aloof” is one of them). So this version, while more complicated, is faster and yields simpler graphs:

Knuth studied word ladders with five letters and observed that most five-letter words can be connected. We can easily replicate this observation:

We can count the five-letter aloof words:

Apparently, Knuth couldn’t tackle the six-letter words at the time, as it was too difficult. But we will analyze all the way up to 23-letter words. It turns out that the disconnectedness increases with word length. Here, for example, is the graph for six letters:

The number of aloof words also goes up to 2,756 in the graph for six letters.

You can make incredibly long ladders, but ones that behave like bat → cat → fat → hat are rather annoying, because you can get directly from “bat” to “hat” without going through the other steps. So I am going to assert that the only good word ladders follow the shortest path of valid words. This view is supported, I think, by Charles Dodgson (Lewis Carroll), who claimed to have invented word ladders. He did work on the shortest paths in word ladders, including the result that the shortest evolution from “ape” to “man” was six steps. Either because I have a more modern set of words, or because Dodgson needed Mathematica, I make it five steps:

Using the looser definition of “no two consecutive steps may change the same letter,” Games Magazine held a competition in 1993 to find the longest word ladders, and its readers managed 26 steps. But we can do better than that.

We need to find which connected subgraph of which word length has the longest shortest distance between any two of its words. That’s a tough sentence to parse, but the concept is summed up in one of the two commands that Charles Pooh told me about: GraphDiameter.

There is quite a lot of searching to do, but parallelizing it makes it take under 15 minutes on my laptop. We discover that the longest shortest word ladder is 49 words long:

Because I kept the structure of the data as I generated it, the position of that value will tell me the word length and subgraph number that contains our prize.

This turns out to be the main cluster of connected six-letter words.

Now we use the other command that Charles introduced me to. GraphPeriphery finds those vertices that are at the maximal distance from each other.

Luckily, we get two results. This means that they must be at either end of the word ladder. If we had more, we would have to figure out which element paired with which to produce the maximal path lengths. The next step is to generate the path:

I am not going to show you the result as that would ruin the puzzle, but here it is highlighted on the graph:

We now have to make the clues. I will use the built-in WordData definitions, picking one of them at random. Since lots of words have multiple meanings, you get a slightly different puzzle each time.

Before the result, an admission of cheating and a warning: unfortunately, the solution that this code finds turns out to contain a rather adult-themed word and clue. In an effort to keep the Wolfram Blog family-friendly, I manually removed the word from the dictionary (the redacted word in the first line of code). Run the code for yourself if you want the adult version. Of course, the whole result is dependent on the dictionary you use anyway, though larger dictionaries do not necessarily lead to longer word ladders, since they also increase connectedness.

Here is the family-friendly version; have fun:

charge

a quantity of explosive to be set off at one time

exchange or replace with another, usually of the same kind or category

occurring or appearing or singled out by chance

of uncertain outcome; especially fraught with risk

a rhythmical work song originally sung by sailors

small crude shelter used as a dwelling

European scaleless blenny

a simple version of hockey played by children on the streets (or on ice or on a field) using a ball or can as the puck

the characteristic sounds made by a horse

habitually complaining

a person given to excessive complaints and crying and whining

any of numerous small silvery North American cyprinid fishes especially of the genus Notropis

shake, as from cold

a razor powered by an electric motor

someone who has or gives or receives a part or a share

an effigy in the shape of a man to frighten birds away from seeds

an electronic pulse counter used to count pulses that occur too rapidly to be recorded individually

an official who affixes a seal to a document

a person skilled in a particular type of therapy

a machine that cuts the heads off grain and moves them into a wagon

someone who reads proof in order to find errors and mark corrections

give an interpretation or rendition of

an owner of property who receives payment for its use by another person

someone who rants and raves; speaks in a violent or loud manner

an enlisted soldier who serves in the ranks of the armed forces

desire strongly or persistently

a programmer for whom computing is its own reward; may enjoy the challenge of breaking into other computers but does no harm

small striped semiterrestrial eastern American squirrel with cheek pouches

long slender feather on the necks of e.g. turkeys and pheasants

challenge aggressively

(paper making) a frame used to form paper pulp into sheets

(statistics) any of nine points that divided a distribution of ranked scores into equal intervals where each interval contains one-tenth of the scores

make dirty or spotty, as by exposure to air; also used metaphorically

give a definition for the meaning of a word

make more complex, intricate, or richer

express discontent

the act of despoiling a country in warfare

a deep narrow steep-sided valley (especially one formed by running water)

in a raving manner

migratory

capturing cattle or horses with a lasso

brick that is laid sideways at the top of a wall

of the relatively near future

orienting or directing homeward or to a destination

hulled corn with the bran and germ removed

a sermon on a moral or religious topic

having a feeling of home; cozy and comfortable

according with custom or propriety

light and humorous drama with a happy ending

comedo

a black-tipped plug clogging a pore of the skin

If you want printable versions, here are CDF and PDF versions of the table.

I will post the solution to the puzzle in the comments section of the Wolfram Blog in a couple of weeks.

However, as some of the comments pointed out, the dictionary used for this lacked some obvious modified words such as plurals and verb conjugations. Dropping the requirement that we know definitions for the generated puzzle, and using a much larger dictionary, I have a revised result.

The larger dictionary provides greater connectivity, so the largest minimal word ladder is a little shorter at 46 words, and interestingly occurs in the seven-letter words.

24 Comments

The puzzle site Sporcle.com has a daily “Word Ladder”. When the solution is released, can I get your permission to reproduce this list in puzzle form (with credit to you & a link to this article)? If it becomes a featured quiz, it could be exposed to tens of thousands of players. (See the music-related quiz I created in the link.)

Two further notes: 1) I’ve solved it. As RJK infers, the clues made several steps pretty easy, although I went off the rails for a while when I put “TACKLE” for “challenge aggressively”. 2) You should correct the headline. This is not the “world’s longest word ladder”, but rather the world’s longest SHORTEST word ladder (or longest minimal word ladder).

I’m a bit puzzled as there seems to be a shorter path between “capturing cattle or horses with a lasso” and “orienting or directing homeward or to a destination”: just change the R to an H, then the P to an M. I’m having a tough time seeing how it’s even possible to use two words in between them without changing the same letter twice in a row. Anyone with some insight?

@Alan: Possibly (maybe even probably) the dictionary word list used didn’t include verb forms such as “hoping” that didn’t have a separate definition. Note that the other -ING words are not defined relative to their roots, and remarkably for a ladder of this length there are no words ending in either -ED or -S.

Patrick: did the title change before I came here? This isn’t claimed to be the longest word ladder, but the longest word ladder PUZZLE. That means the idea is to find the shortest ladder between two words, and this pair is purported to have the longest solution. That said…
Alan: I suppose that word is missing from the list used. But in this version, the intermediaries start with C.

I think the previous attempt at the ladder was in a version of mathematica without GraphPeriphery and GraphDiameter. With that in mind, it may be a good idea to include the mathematica version, that a blog has been written about. That will also help posterity keep things in perspective. The following should do:
Title
Date
Author
Mathematica #

Patrick Kelly’s comments are correct. The dictionary does not contain many verb cojugations and word variations. In particular “hoping”. Had it been present in the source dictionary, the solution would have short-cut through it and yielded a ladder with one fewer steps.

@Isaac, I wish I could say that my previous items pre-date the functionality, but unfortunately not. I just didn’t know enough about them then.

Now, can we please have a few more dictionaries into M here? I’d like to suggest important languages such as German and French. Also Russian and Hindi would be interesting, as they are related to the original Indo-Germanic languages of past millenia. Also Latin and (ancient! not modern!) Greek as well as Hebrew would be great to make statistical as well as graph-theoretic analyses of diction/vocabulary/etymology/morphology, and their migrations. How much Russian came from German? How much of English came from German and how much came from French, and how much French came from Latin? And what influence did Hebrew have on Greek? And what were the drivers of these influences? War, money, peace, love, food, domination, religion, crime, etc. Please, more languages! From a conceptual viewpoint in M, another dic should just be data!

@Jon – Perhaps a better way to obfuscate the ‘adult’ content would be to use the Overlay function, which is new in version 8. I’ve also wondered if Overlay could be used to ‘skin’ a Dynamic interface—maybe that would be a good blog post example?? (If so, feel free to contact me via email and I can point you toward some graphics to use!)