Most people like to use a word for their combination rather than an arbitrary string of letters. (Less secure, of course, but easier to remember.) So when manufacturing the lock, it would be good to build it to have a combination of letters which can be used to create as many 5-letter English words as possible.

Your task, should you choose to accept it, is to find an assignment of letters to reels which will allow as many words as possible to be created. For example, your solution might be

Any 5-letter word in that list is OK, including proper names. Ignore Sino- and L'vov and any other words in the list which contain a non a-z character.

The winning program is the one which produces the largest set of words. In the event that multiple programs find the same result, the first one to be posted wins. The program should run in under 5 minutes.

Edit: since activity has died down, and no better solutions have come out, I declare Peter Taylor the winner! Thanks everyone for your inventive solutions.

This is tagged as code-challenge: what's the challenge? All you've asked for is the value which maximises a function whose domain's size is about 110.3 bits. So it's not feasible to brute-force the problem, but it should be feasible to get the exact answer, and maybe even to prove it correct. Bearing all that in mind, what are the prerequisites for an answer to be considered, and what criteria are you going to use to select a winner?
–
Peter TaylorJan 21 '13 at 13:39

I was just about to edit my answer with this exact solution, but you beat me to it.
–
cardboard_boxJan 21 '13 at 19:38

When I run the same hill-climbing search from 1000 random starting combinations and select the best of the 1000 local optima found, it seems to always produce the same solution, so it seems likely to be the global optimum.
–
Peter TaylorJan 22 '13 at 13:01

That depends on your definition of likely ;-) But it is further "confirmed" by other approaches which yield 1275 as maximum. (And where did the Quantum-Tic-Tac-Toe go?)
–
HowardJan 22 '13 at 13:30

@Howard, that was just an artifact of .Net not supporting multiple entry points in a single project. I have one "sandbox" project which I use for stuff like this, and I usually change the Main method to call different _Main methods.
–
Peter TaylorJan 22 '13 at 13:42

I tried a genetic algorithm and got the same result in a few minutes, and then nothing in the next hour, so I wouldn't be surprised if it's the optimum.
–
cardboard_boxJan 22 '13 at 14:52

Python (3), 1273 ≈ 30.5%

This is a really naïve approach: keep a tally of the frequency of each letter in each position, then eliminate the "worst" letter until the remaining letters will fit on the reels. I'm surprised it seems to do so well.

What's most interesting is that I have almost exactly the same output as the C# 1275 solution, except I have an N on my last reel instead of A. That A was my 11th-to-last elimination, too, even before throwing away a V and a G.

The word count quickly (less than 10 seconds) evolves to 1275 on most runs but never gets beyond that. I tried perturbing the letters by more than one at a time in an attempt to get out of a theoretical local maximum but it never helped. I strongly suspect that 1275 is the limit for the given word list. Here is a complete run:

Python, 1210 words (~ 29%)

Assuming I counted the words correctly this time, this is slightly better than
FakeRainBrigand's solution. The only difference is I add each reel in order, and then remove all words from the list that don't match the reel so I get a slightly better distribution for the next reels. Because of this, it gives the exact same first reel.

iPython (273 210 Bytes, 1115 words)

1115/4176* ~ 27%

I calculated these in iPython, but my history (trimmed to remove debugging) looked like this.

with open("linuxwords") as fin: d = fin.readlines()
x = [w.lower().strip() for w in d if len(w) == 6]
# Saving for later use:
# with open("5letter", "w") as fout: fout.write("\n".join(x))
from string import lowercase as low
low=lowercase + "'"
c = [{a:0 for a in low} for q in range(5)]
for w in x:
for i, ch in enumerate(w):
c[i][ch] += 1
[''.join(sorted(q, key=q.get, reverse=True)[:10]) for q in c]

If we're going for short; I could trim it to this.

x = [w.lower().strip() for w in open("l") if len(w)==6]
c=[{a:0 for a in"abcdefghijklmnopqrstuvwxyz'-"}for q in range(5)]
for w in[w.lower().strip()for w in open("l") if len(w)==6]:
for i in range(5):c[i][w[i]]+=1
[''.join(sorted(q,key=q.get,reverse=True)[:10])for q in c]

Shortened:

c=[{a:0 for a in"abcdefghijklmnopqrstuvwxyz'-"}for q in range(5)]
for w in[w.lower() for w in open("l")if len(w)==6]:
for i in range(5):c[i][w[i]]+=1
[''.join(sorted(q,key=q.get,reverse=True)[:10])for q in c]

While this solution is a good heuristic and will likely return a good solution, I do not believe it is guaranteed to return the optimal solution. The reason is that you are not capturing the constraints between the reels: You are treating each reel as an independent variable when in fact they are dependent. For example, it might be the case that the words that share the most common first letter have a large variance in the distribution of their second letter. If such is the case, then your solution might produce combinations of reels that in fact do not allow any words at all.
–
ESultanikJan 21 '13 at 14:56

Q

Runs in about 170 ms on my i7. It analyses the wordlist, looking for the most common letter in each position (obviously filtering out any non-candidates). It's a lazy naive solution but produces a reasonably good result with minimal code.

I'd love to see a C or ASM version of your code so that it could actually finish this year :-) Or at least run it until it gets to 1116. Could you write it without itertools, so I can run it on jython? (faster than regular python, but easier than cython.)
–
FakeRainBrigandJan 21 '13 at 15:13

Nevermind about the jython thing. I needed to grab the alpha. It still crashed (too much memory) but that appears unavoidable.
–
FakeRainBrigandJan 21 '13 at 15:36

I'm pretty sure that even if this were implemented in assembly it would take longer than my lifetime to complete on current hardware :-P
–
ESultanikJan 21 '13 at 16:53

The issue is that I am iterating over (26 choose 10)^5 ≈ 4.23*10^33 possibilities. Even if we could test one possibility per nanosecond, it would take about 10^7 times the current age of the universe to finish.
–
ESultanikJan 21 '13 at 16:59

1

There are two characters which don't appear in the 5th position in any word in the given word list, so you can reduce the number of possibilities by a factor of about 4. That's how I got "about 110.3 bits" in my comment on the question.
–
Peter TaylorJan 21 '13 at 17:29