Searching

Over the weekend, I spent some time pondering the best way to add a search facility to the Handbook. Geek that I am, I thought about rolling my own, but also explored some open source options (such as Apache Lucene), free services (such as Google Free, Picosearch, and Atomz) and turnkey solutions (such as the Google Mini).

As I've often said, the best way to reduce the risk of any software-intensive system is to not write any software at all. This rule of thumb first led me to consider any one of the free search services, which are perfect for simple sites with lots of public content; I have to also believe that these services are very self-serving for their providers, for it puts otherwise obscure sites on their radar. However, my situation is a bit more complex, because only a tiny amount of my site is public (the opening page and the blog), while the rest lives behind a passworld-protected point of entry. Further complicating matters is that each user may have different degrees of visibility into the system. A naive search strategy would give hits for parts of the site that might not otherwise be accessible (and a clever person with lots of time on their hands could reproduce these hidden parts, in the same manner as happened to the Dead Sea Scrolls, much to the annoyance of the researchers who had kept translations private for some years).

So, I have a classic time/money tradeoff here. I'm leaning toward the Mini for its simplicity and power, plus the fact that rolling my own would take longer than I have time to dedicate to writing such a feature (as much as I like to cut code).

In the meantime, though, I've learned alot about how contemporary search engines and web crawlers are constructed...