Doing search

Some time a go I wrote an item about refactoring a library. This post was about ict books, and how you can limit the amount of physical books. At the end I discussed a way to refactor your library by loosing books. It turns out I was/am in a need for a good search engine. Therefore this post is about search. I will not go very technical, but I’ll make it interesting enough, I hope.

Search comes in a lot of different flavors, you can search all bills in a stack of paper hidden in a drawer. You can search your house for books. You can search your computer for a file or search a file for a word.

One problem with search is that you cannot always find something. How often have you tried to find a file on your computer of which you are sure it is there. Maybe you have forgotten the title, or you are not sure it was a pdf or a webpage.

For every search problem there is a different solution. Sometime in the form of a specialized search tool. When I have lost something in my house, I always ask my wife. She has a much better search algorithm than I have. To bad that does not work on the laptop. Luckily I have some special search tools available on my Mac. Spotlight helps me to find files, applications, webpages that I have visited. It is a very nifty tool, and it works very well. Still I have problems when finding documents of a different title than the one I have in my head. A lot of the applications offer more specific search. Finder can do a lot, the same for Mail and other applications. Let’s not forget my Integrated Development Environment. IntelliJ has some very advanced searches. I can look for classes that use another class in a certain way. Refactoring is also for a big part search. It is all about searching classes that make use of a piece of code we are going to refactor.

This is all based on a one user, computer usage of search. What if you are looking for something you have no idea whether it is their or in what format. One of the most known search tools nowadays is Google. Their whole existence is based on search. They did what Altavista could not, create a search engine that presents search results without all distracting things. What is funny by the way is the results when you search for search in Google. See the next image for the top results.

Google already has a lot of different search capabilities that are more specific based on what you are looking for: code, Groups, Images, and others. Than every decent site has a search capability that let’s you search only that site. So also for searching the web there are a lot of options.

We can conclude that search has a lot of options. But what makes search work. To be able to do a search, there must be something that tracks what is available and gives the user some options to enter criteria to base the search on. Let’s call this the engine. Each engine uses some form of indexing. People familiar with linux will know commands like locate and which. These are also search tools with an index. Locate uses an index you can influence pretty good yourself. Which uses the path of the user to find applications that can be executed. Google also has an index. This is a little bit bigger index though. Creating the index is a very important step in configuring the engine.

At JTeam we have a lot of experience with enterprise search frameworks like Lucene, Compass and Solr. Configuring what needs to be indexed, when to rebuild the index, merging indexed and all these kind are very important when doing an enterprise search project. When you have a smart engine and a good index, you must be able to present the results in the right way. This is what attracts users, giving back the best results is what makes them come back.

So what went wrong with my search for the book. I think my index was wrong. I did not configure my engine to index the pile of books that (for some unknown reason) were on the ground behind a small chair that we do not use any more in the attic. What made the engine find the book? Well first of all, a-synchronous search. The results came long after I initiated the search. It was the christmas spirit that did the trick. I was putting away the christmas stuff for the tree and when returned back ways from the attic storage my eyes fell on a pile of books. It turns out I did not refactor the book from my library. I just commented it away unable to be found by the engine. Usually this is a bad practice, but for this time I am very glad I found back the book.

If you want more information about search, drop me an email with your details and I’ll search for someone within JTeam that is able to help you.