The GADDAG

Wouter M. Koolen

2016-02-01

Introduction

This post is about a data structure called a GADDAG. Steven de Rooij and I rediscovered this data structure in a programming project relating to Scrabble (and Wordfeud). The problem we were trying to solve is: Given a scrabble board configuration and rack tiles, generate a list of all possible moves that can be made. The main constraint is that each move has to result in legal words both along the move and across it. Scrabble is a serious sport, and so what constitutes a legal word is strictly specified, as for example by the English SOWPODS.

GADDAG

So how does one find those moves fast? That is where the GADDAG comes in. One may think of a GADDAG as a handy representation of a list of words, such that the following question can be answered quickly.

Given a substring of a word in the dictionary, which letters can be added to the left of it to stay a substring of a word in the dictionary?

Given a prefix of a word in the dictionary, which letters can be added to the right of it to stay a prefix of a word in the dictionary?

As an example, consider a dictionary containing the words cat, car, art and rat. If our current string is ar, the GADDAG can tell us that left-extensions are the letter c and the end-of-word. Moreover, it can tell us that the right extensions are the letter t (and not the end-of-word).

How does the GADDAG do that? Well, it is a DAG (directed acyclic graph) where we insert the words schizophrenically. We start anywhere in the middle of the word, and read the letters right-to-left from there. Once we get to the left end, we switch directions and read the remaining letters left-to-right until we hit the right end. For example, for cat we insert >cat<, c>at<, ac>t< and tac><. Here > denotes the left end-of-word marker, and < denotes the right end-of-word marker.

The GADDAG of the example is displayed in the figure below.

Gaddag of cat, car, art, rat. The gaddag is a graph consisting of three kinds of nodes. Red leftward nodes , blue rightward nodes, and a single green accepting node. Between these nodes there are four kinds of (labelled) transitions. Left-extension by a letter (red), left termination (grey), right-extension by a letter (blue) and right termination (green). In a red leftward node we may extend the word with one letter to the left by following a red labelled arrow. We may also sometimes opt to end the word on the left by taking a grey arrow, which leads us to a blue rightward node. In a blue rightward nodes we may extend the word by a letter to the right by following a blue labelled arrow. And we may sometimes opt to end the word completely by taking a green transition to the single accepting state.

Greedy vs Greedy

To get a feeling for whether two-player Scrabble is balanced, I looked at greedy vs greedy games. The greedy player always plays the highest-scoring move. This maximises the immediate score but ignores long-term strategy. Simulating 10,000 games takes 44 seconds. The results are as follows.

10,000 games of greedy vs greedy

Win for first player

4934

Win for second player

5030

Draw

36

So among greedy players, there is no significant advantage to playing first.