I am creating a game similar to Tetris, with two main differences: the screen already begins filled with tiles (like in Puzzle Quest for Nintendo DS and PC) and each individual tile has a letter in it. The player's objective is to eliminate tiles by forming valid words with them. Words are formed by placing letters next to each other, in any direction, except diagonally.

The player can move an entire row of tiles to the left or to the right or an entire column of tiles up or down, for as many spaces as he desires (if the movement of a row/column surpasses the limits of the board, the letter that crosses the limit will "cycle", appearing at the other end of the row/column). After the player's action, the game should check the entire board to look for valid words and remove the letters that form those words from the board. The letters above those that were removed will fall down in the place of those letters that were removed and new letters will drop from the top of the screen until the board is filled up again.

I have already written a linear algorithm that, given a sequence of characters, determines if it is a valid english word. The problem I am having is: how can I check for valid words on the board? Is brute force the only way? Testing all possible combinations from the board to see if they are valid is very slow, even for a small (5x5) board. Any help will be very appreciated, thanks!

Unfortunately, you're right. It is very slow, because of the numbers (all combinations of 3, 4, 5, ... 25 letters). Maybe restrict it to "words must be horizontally or vertically lined up" to improve performance (and not get random words made that the player didn't see)?
–
ashes999May 13 '14 at 1:41

I think you need to look again at your algorithm that matches sequence of characters to words. By my count a 5x5 grid would have 2700 potential words, which your algorithm should blow through, see e.g. Josh's answer.
–
TaemyrMay 13 '14 at 10:31

I arrive at the 2700 words in the following manner; start with left to right words on the first row. There are 1 position that a 5 letter word, 2 4 letter words, 3 3 letter words, 4 2 letter words and 5 1 letter words. We can exchange one of the letters in the word for a letter from another column. We can without losing generality assume that no letters are exchanged for the 1 letter words, and that the first letter is not exchanged for 2 letter words. This gives; 5*5*1+4*5*2+3*5*3+1*5*4+1=135. Multiply by number of rows and directions; 135*5*4=2700
–
TaemyrMay 13 '14 at 10:34

I think I didn't make this clear, but words can be formed in any direction, except diagonally, and even make corners (for instance, first tile from first row, then second tile to the right on the first row, followed by the tile from below on the second row).
–
TavioMay 13 '14 at 14:47

@Tavio Some thoughts: checking should go longer words first (if I make "aside" I don't want "as". Also, single-letter words might be better off ignored, otherwise you will never be able to use any a's. When finished, I would like to know the name you give this game so I can check it out.
–
David StarkeyMay 13 '14 at 19:56

4 Answers
4

Solving your game board is similar to solving a Boggle board, except simpler. You want to check every tile in the board, looking to see if there are any words that can be made along the appropriate directions.

You'd still like to refine your search space further so you don't bother searching along a direction if you know you can't make a word. For example, if you find two qs in a row, you should abort. To that end, you'll want some kind of data structure that allows you to tell if a given set of characters is a prefix of a valid word. For this, you can use a trie, or prefix tree; a useful data structure when solving problems like this.

A prefix tree is a hierarchical node-based structure, where every node represents some prefix of its children, and the leaf nodes (generally) represent the final values. For example, if your dictionary of valid words contains "cat," "car," and "cell," a trie might look like:

0 The root of the trie is the empty string here, although you
| can implement the structure differently if you want.
c
/ \
a e
/ \ \
t r l
\
l

Thus, begin by filling a prefix tree with every valid word in your game.

The actual process of finding valid words on the board at any given time will involve starting a recursive search from each tile on the board. Because each search through the board space starting at some given tile is independent, these can be parallelized if needed. As you search, you "follow" the prefix tree based on the value of the letter in the direction you are searching.

You will eventually reach a point where none of the surrounding letters are children of your current prefix tree node. When you reach that point, if it is also true that the current node is a leaf, you have found a valid word. Otherwise, you have not found a valid word and you may abort the search.

Example code and a discussion of this technique (and others, such as a dynamic programming solution that can be even faster by "inverting" the search space after a fashion) can be found on this fellow's blog here; he discusses solving Boggle, but adapting the solutions to your game is more or less a matter of changing which directions you allow searching to occur in.

Brute force isn't the only way like you explained yourself. :) There are a lot of prefixes that hint there is no point to keep looking. (Most [random] strings aren't words. +1
–
zehelvionMay 13 '14 at 3:42

Great answer. A "word" is anything in the game's dictionary full stop.
–
Adam EberbachMay 13 '14 at 5:33

OP states that he has an algorithm to match the word to a character string. So I don't think this answers the question.
–
TaemyrMay 13 '14 at 10:18

OTOH I think OP will want a more efficient string matching algorith than what he currently has.
–
TaemyrMay 13 '14 at 10:35

1

@Taemyr using plain trie, yes. But one could use Aho-Corasick algorithm which utilizes slightly modified trie is much more effective (linear). With Aho-Corasick algorithm one can find all valid words in nxn matrix in O(n^2) time.
–
el.pescadoMay 13 '14 at 11:42

You might have tried this, already implemented this, maybe better accompanied by another answer, etc. But I didn't see them mentioned (yet), so here it is:

You can discard a lot of the checks by keeping track of what changed and what didn't. For example:

On a 5x5 field, A vertical word is found on base of the third column,
All the rows change. However, the first, second, fourth, and fifth,
columns do not change, so you dont need to worry about them (the third did change.)
On a 5x5 field, A 3 letter word is found horizontally on row 2, column 3, to column 5.
So you need to check row 1 and 2 (row 1 because the words on that one
fell down and where replaced), as-well as columns 3, 4, and 5.

or, in psudo code

// update the board
// and check
if (vertical_word)
{
check(updated_column)
for (i in range 0 to updated_row_base)
check(i)
}
else // horizontal word
{
for (i in range 0 to updated_row)
check(i)
for (i in range 0 to updated_column_start)
check(i)
for (i in range updated_column_end+1 to final_column)
check(i)
}

And the trivial questions:

Do you have the compilers speed optimizations set? (if your using one)

Except that players are allowed to rotate rows, so finding a word in the third column will affect the other columns.
–
TaemyrMay 13 '14 at 10:40

@Taemyr IF(rowMoved){ checkColumns(); checkMovedRow(); } IF(columnMoved){ checkRows() checkMovedColumn();} If a user can only move one at a time, then on the ending of that move, no parallel letters have moved and therefore no need to recheck those.
–
David StarkeyMay 13 '14 at 20:03

Remember that every character is a value. So use that to your advantage. There are some hash functions that could be computed quickly when iterating on substrings.
For instance, lets say we give every letter a 5 bit code (just do c - 'a' + 1 in C):

We can check substrings up to 12 letters this way on most common architectures today.

If a hash code exists in your dictionary you can pull the word from there quickly cause hash codes like this are unique. When you reach the max of 12 letters, you may wish to add another data structures for words starting with those 12 letters. If you find a word that starts with a specific 12 letters than simply create a list or another tiny hash table for the suffixes of each word that starts with that prefix.

Storing a dictionary of all existing word codes should not take more than a few megabytes of memory.

Are you limited only to the classical Tetris shapes when forming words, or will any formation do? Can words bend indefinitely or only once? Can a word be as long as it wants? This get's quite complex if you can do as many bends as you like, effectively making the longest possible words 25 characters long. I would assume you have a list of accepted words. On that assumption I suggest you try something like this:

At the start of the game:
Iterate tiles:
Use tile as starting letter
Store previous tile
Check the four adjacent tiles
If a tile can continue a word started by the previous tile, carry on
Store the next tile
Move check to next tile

This will create a map on each tile with information how this tile is connected to words around it in the grid. When a column or row is moved check all tiles that has been, or is adjacent to the movement and recalculate the information. When you find a word, and no more tiles can be added to that word; remove it. I'm not sure if this will be faster, it really boils down to how many words are half created. The benefit to this is that the user is most likely trying to create a word from a half complete word on the board. By keeping all these words stored, it is easy to check if a word has been completed.