Rael G.C.

The main objective was create the smallest Java version, and not focusing on performance.

In the Norvig’s article, you can find the explanation, and other implementations (your favorite language can be there!).

As you can count, this implementation has 42 lines (or 35, if you want to follow the Norvig counting and not include the blank lines). Not so beautiful as the Norvig’s python (as any statically typed language), not so huge as the C version.

You can compile it using an online Java compiler, like JavaXXX Compiler Service, selecting version as 1.5 or 1.6. No errors or warnings should be generated.

This implementation also uses only JRE default libraries, and don’t requires external or custom libraries.

This code has LGPL, but if you want use it, be a nice guy and send me an email (rael.gc@gmail.com), I’ll be happy!
If you find a bug, or find a way to improve the performance, please, send me the code, and I’ll put the credits here.

History:
2007/08/13 - Original version
2007/08/15 - Removed a useless method (remained in code from the draft version)
2007/08/16 - Changed the static initializer to a constructor, removed the temp buffer in the file loading
2007/08/16 - Removed similar methods
2007/08/17 - Replace the tailored max method for the java.util.Collections.max()
2007/09/03 - Additional notes added
2007/12/11 - Some tricks and more one line removed
2007/12/12 - Added a Groovy version
2008/12/07 - Some tricks and more one line removed (thanks to Anil Madamala, “Passionate about programming”)
2009/02/09 - Useless line removed (thanks to Sergey Mikhanov)

In a real usage, you should not read the dictionary file (big.txt) all the times.

Furthermore, to gain performance, the nWords object can be initialized with a preallocated size: the number of words in the dictionary file.

Another performance tooltip is use some sorted Collection class as nWords type. This can improve the constainsKey method call, while add some penalty in the insertion time. But remember, you should read the dictionary file just one time.

Facts:

I already wrote a version (at least) 50% faster than the current displayed here.