Humans best crossword-puzzling computer

Officially, humans are the only ones who can enter the American Crossword Puzzle Tournament, which was held in Brooklyn, New York over the weekend. But this time artificial intelligence expert Matt Ginsberg of On Time Systems has put his Dr Fill crossword solver to the test. As the results show, you don't have to throw out your pencil just yet.

This week, the puzzle masters at the ACPT put the hurt on Dr Fill, who according to a report in the New York Times, did terribly on two out of the five crossword puzzles and would have only ranked 141st among the 600 people that took the test. The program has done better in the past, and in simulated runs among fifteen former tournaments, Dr Fill came out on top three times.

"I'll be back next year," Ginsberg told the Times.

Ginsberg's company, located in Eugene, Oregon, is a niche player in industrial optimization. Among other things, On Time Systems has created algorithms behind the Green Driver application used in the cities of Eugene and Portland to anticipate red lights and to route around them, which can cut commute times by 5 per cent.

The company has also created the algorithms for routing the US Air Force's fleet of cargo aircraft around weather while still obeying air traffic rules. The Worldwide Aeronautical Route Planner was created to run on mainframes and its being modernized to run on laptops. The optimized routing saves the Air Force something on the order of 1 to 2 per cent on its fuel bill each year, which works out to $35m to $90m.

In his spare time, in addition to actually creating crossword puzzles for the Times, Ginsberg has been working on the Dr Fill program. Ginsberg submitted a paper describing Dr Fill to the Journal of Artificial Intelligence Research, which was published (PDF) in December 2011. Based on its past performance, Ginsberg reckons that Dr Fill ranks among the top 50 puzzle solvers in the world, even though it did not have a very good day on Sunday.

The Watson Jeopardy! question-answer system built by IBM that took on the two human champs (perhaps chimps or chumps might be more appropriate, given how badly Watson spanked Ken Jennings and Brad Rutter) did terribly at first, too, but with tweaks and tuning and algorithm changes, it got better.

So there's no question that Ginsberg will be back, and it might even turn out that he gets some help. The Gray Lady is Big Blue's hometown rag, so some collaboration might be in order. The puzzles that threw Dr Fill through a loop had words spelled backwards and some that had words diagonally as well as horizontally and vertically.

As with the Watson QA machine, Dr Fill is not actually understanding what it is doing, but using statistical probabilities to calculate its answers. Technically, what Dr Fill does is convert a crossword puzzle into a singly weighed constraint satisfaction problem. The constraints, of course, are the clues. Otherwise, you could just use a vast dictionary and try to cram every word of every length into every appropriate spot and then create a cross index of where letters match up where words cross.

Such an unconstrained problem is very tough to solve, so if you can "understand" the clues, then you can limit the possible answers and therefore the size of the dictionary you need to rifle through. The clues have all kinds of other cues that people process, such as a clue that ends in a question is usually tricky in some fashion.

Behind the Dr Fill program, Ginsberg has amassed a database of over 47,000 crossword puzzles and their solutions, with almost 1.9 million unique clues. The system also makes use of a dictionary with over six million words and a smaller dictionary with 8,542 common words.

The answers to all of these puzzles were then rated in a number of ways, including hand scoring them by 100 volunteers and cross-checking them for the number of Google hits, their Scrabble score if you were playing that word game, their length, and other criteria. Based on the 50,000 words, scoring was done for the remaining clues in the 47,000 puzzles.

The Dr Fill database also includes a database of Wikipedia titles, grammatical information about 154,000 words that tells Dr Fill their part of speech, and 1.2 million synonyms.

Armed with all this data and super-secret algorithms, Dr Fill solves crossword puzzles. Watson needed a rack of IBM Power 750 servers with 2,880 cores and 16TB of memory in a high-speed cluster, but Ginsberg says you can run Dr Fill on a notebook with two x86 cores and 8GB of memory with a compressed database that weighs in at 300MB.

Like many others, Ginsberg thinks that Watson cheated a bit. Humans did beat Watson in a dry run, and then IBM sped up the button pushing a bit for the real showdown.

"Watson, too, outperforms humans easily in terms of speed; its much-ballyhooed victory against human Jeopardy! competitors was probably due far more to Watson's mastery of button pushing than to its question-answering ability," writes Ginsberg in his JAIR paper. "In terms of the underlying cognitive task, Watson appears to not yet be a match for the best Jeopardy players, who are in general capable of answering virtually all of the questions without error."

Ginsberg says that the task for Dr Fill is a lot harder, because you can't decline to answer or bet to boost your earnings as you can on Jeopardy!. You have to find the right answers, period. And perhaps more significantly, Ginsberg is having a harder time as Dr Fill gets better and better of figuring out why the program is not working.