From a performance point of view, though, I think all the string manipulation might hurt - I don't know if it's smart enough to reuse the memory from the previous string when you chop off the first letter and I'm worried it'll make a new object every time. How about this?

Untested since I don't have your other classes but compiles OK with a stub TrieNode.

If that's any better I can see a few ways to improve that still; it's currently massively complex in the length of the original letter string and I don't see a way to improve that but we could simplify the constant of the complexity: rather than removing the letter from the letters array could instead swap in the one at the back of the array and reduce the max index of the letters that we scan to. Of course then we'd need to pass that max index around (or compute it every time from letters.Length - nDepth). That would also prevent the complete scan of the letters array at the final depth when it's known to be empty.

Also rather than passing the three ref arguments could instead create a 'ScrabbleMatchInternal' class which contains those three as members and put all that code in a Recurse method on the class with a single parameter for depth.

One more idea: if you sort the letters in p_strValidLetters ahead of time and remember the last letter you recursed with at each depth you could then trivially skip duplicate letters at that depth and avoid the word deduplication afterwards. I do also like Weeble's bit mask idea but I can't see a good way to work that in here too :-/

I should probably try implementing some of these other ideas too but it's late and I can't test / time the complete code anyway. Hope some of this helps!

Edit: I'm thinking in C++, aren't I? You can likely strip all of the 'ref's from my program and it will still work. They're all passed as references anyway.

I'm not sure about your ref'd parameters... these essentially translate to pointers to pointers to the objects which is fine when you want a method to update your parameter with a reference to some new object. The multiple string construction is intentional.. it is intended to generate all the letter permutations. From what I understand in your implementation however, you're keeping track of the changes to the valid letters buffer at every recursion point.. so it should be correct... very clever in fact

There's lots that can be done to improve the code.. and most of the performance ma be achieved by minimising the object construction. I merely explored the concept of combining letter permutation with simulataneous trie navigation.

So, adopting the convenient textbook approach, I leave these optimisations as an exercise to the reader

For the letters "eettaoinshrd" the query took 131ms and generated precisely 2633 words from "ad" to "tsar".

You do have a point that the chosen set was not a good representative. Would you say that the one you gave me is an average set or is it at the other extreme?

Maybe the chosen set is a bit of an extreme. It contains the most frequent letters of the English language. (see Wikipedia).
You can test an average set of letters yourself by creating several sets of 10-17 random letters and run you algorithm against them.

The biggest problem with your algorithm still is the number of permutations you have to do. For 17 letters there are 355,687,428,096,000 possible permutations. No matter how many duplicate permutations you are able to eliminate -- which of course costs time as well -- the number of permutations you have to check against the Trie is too high.
For smaller sets of letters you algorithm is good.

colinvella wrote:

The scrabble search is more naturally coded recursively due to the ongoing combo construction. The potential combinatorial explosion of this process is pruned extensively by navigating down the trie trie.

But you still have to do lots and lots of checks, even if you can discard some thousand or million permutations at once.

But that's not his algorithm. He starts at the top of the trie and at each node he reaches he explores all child trie nodes that he can go to given the set of letters he has left at that point. Ignoring multiple recurision for duplicate letters the most he'll ever do is a single pass through every node in the trie.

Well aptly put Rup
What I was trying to say is that, without the trie navigation, the algorithm would simply degenerate to a combinatorial explosion, but since I only traverse down the trie tree of valid sub-words, this explosion never occurs and the worst case is simply a complete traversal of the tree (with possible repetitions due to repeated letters)

Anyone willing to suggest a set of 17 letters? I can give it a try.. the proof of the pudding is in the eating as they say.

Slyh, yours seems to scale much better for large sets but seems a little slower for the smaller onces.

I did find a set that really killed mine: "eettaaooiinnsshhrrdd"
The algo didn't terminate even after a minute! My algo suffers when there are repeated letters, as the above. However I did manage to improve it by cleaning up my intermediary lists of subwords.. and the search terminated at about 5000ms... still slow.

Another disadvantage of my algo is that it doesn't start giving you matches immediately... it builds sub-words as substrings from the right until it recurses to the highest level. So basically there's no benefit from firing it in a separate thread and letting it trickle words to the main thread.

edit: feeding "abcdefghijklmnopqrstuvwxyz" to the algo with subword list cleanup: 1388ms .. a bit better, but not much

The way my (original) algorithm works makes sure that in the (unrealistic) worst case the number of operations is a maximum of "only" the number of all letters of all words in the word list. For our Scrabble list the number of operations therefor is 2.7 million (=2.7 million actual letters).
One operation consists of 3 array accesses, 1 comparison, 1 subtraction and 1 number decrease. (Not including the for-loop.) This is not much for a modern machine.

The actual number of operations needed for "abcdefghijklmnopqrstuvwxyz" is 1.6 million. For "xylophone" it's only 381,105 operations.

The most time is used to set up the working copy of the comparison array for each word in the list, anyway. (System.arraycopy() in the code.)

I optimized my code so this array is no longer needed. (Every word in the valid word list now has a pre-calculated information associated to it which describes how many times a certain letter occurs in the word.)
This uses a lot more memory but only improved performance around 5% (warmed-up) to 25% (interpreted).

Using the original Scrabble a single search currently is completed after 20-30ms (for warmed-up code). This statement is valid for huge sets of valid letters as well.

My algorithm suffers if the number of valid words increases. Adding each word 8 times to the list will increase run time by a factor of about 7.

Your algorithm, on the other hand, is really fast for small sets of letters and works better with bigger lists of valid words. (At least I would expect so.)

I'm still trying to come up with an algorithm that combines both ideas. Some kind of hash value for each valid word and a tree/graph containing the words, where only 26 (or so) checks have to be done for the current set of letters. Or something like that.

That's a very good analysis.. I didn't give it that much thought Now that I've had a chance to compare, your version is very effective and has the benefit of generating words immediately. You could for example sort the dictionary by word length, then alphabetically, such that the first results give you the shorter words, like a human would normally do. I'm afraid I have to concede that your algo is superior, at least for the given application

For a single-word search, I don't think there's anything faster than a trie tree. You need no more than min(K, D) iterations where K is the size of the word to search and D is the depth of the tree, corresponding to the longest word in the dictionary. It is hence independent of the dictionary size. For a hash map solution, I think that a hashing function would be order(K) but with a trie you can exit early, either because it is clear the word cannot exist, or thanks to the use of collapsed branches, if implemented.

I'm pretty sure you can avoid the explosion associated with the trie solution when you have duplicate letters in your input. The trick is to generate the permutation in a way that never creates duplicates. I think if you sort your input letters, and then any time you backtrack past a letter and look to add the next one, make sure you skip over all its neighbours that are identical. Combined with the pruning, that should make a big difference. Still, your worst case is going to be worse than the raw brute force one (where you have enough letters to make anything in the dictionary), because there's always going to be more overhead.