Doug Cutting, the founder of Lucene, the text search library that powers TSS and hundreds (if not thousands) of other sites, is interviewed by TSS in our latest video Tech talk. Doug talks about a lot of behind the scenes history, challenges, and implementation details about Lucene and search in general, as well as focusing on Nutch, a complete open source search engine he is working on.

A lot of developers don't realize the kind of genius and time it takes to build a tool like Lucene, I think we are very lucky to have someone of Doug Cutting's talents spending all of his time working on this stuff, how many people could do this:

Question (from the interview): How do you make some thing like Lucene as fast as it is?

Answer: I do it a few times. I have written a few search engines and done a lot of benchmarking and looked where they spent their time and then rethought it and I think it helped a lot that it wasn’t the first search engine I had written. I think at Xerox I did a few iteration of very different architectures, then did so again at Apple and then again at Excite and so I have been through it a few times and knew what needed to be quick and what did not.

We worked with Doug and Oregon State University to deploy Nutch (I think it was one of the first large scale deployments of Nutch) replacing a commercial, licensed search engine. The project saved OSU over $470,000 dollars. Really great toolset.

There is one feature, we have not fully figured-out, yet, though. Doug mentioned it in his interview, too. It is, usually, known as "Fuzzy Search" in the search terminology. This means - searching for mistyped words, by guessing what the user might have meant.

From the Doug's interview the imperssion is - Lucene does not currently support it. However there seems to be some code in Lucene for this, but could not make sense out of it, yet.

TSS is using Lucene, but from user's experience it seems Fuzziness-support is not there.

TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations technology projects - with its network of technology-specific websites, events and online magazines.