Things We Like

While our main blog is all about Searchdaimon we also offer you our rough look, thoughts and raves about the search industry, computer programming, tips & tricks and the world in general. From time to time we also come across companies and websites that make our life easier or are just freaking awesome.

As part of Google's spring cleaning 2012 the Google Mini will be discontinued beginning July 31 2012. The Mini was Google entry level search appliances. Now the only offer will be the full Google Search Appliances that starts at approximately 30 000 $ or using the Google Site Search (pricing from 100 $) which is a low end hosted search with much less functionality.

This also arise the question on what Google will be doing related to enterprise search in the future. There used to be a virtual edition of the Google Search Appliance, but that was discontinued for unknown reasons in 2008.

Now only the full Google Search Appliance is left, and the full version in many ways lack the features and flexibility of what the competitors are offering at a lot lower prices.

Her at Searchdaimon we do a lot of math. Unfortanly it dos't look much like the math we learned at school.

The math we do is't done by hand, with a pencil on a pice of paper. The questions is't simple problems where the only skill needed is to break the question down to a format that can be inserted into on of the formulas in the formal book.

Conrad Wolfram, director at Wolfram Research and brother of Wolfram Alpha founder Stephen Wolfram has some great point on why we should be reform the teaching of mathematics.

A customer of us recently complained that he couldn't find a specific file, even when searching for word that he knows was in it. This is't a total uncommon question. Sometime the user don't have permission to the file, or the location is't indexed yet or some other problem. So our cto Runar Buvik asked what the name of the file was, so he could take a look.

Yes, the name was in fact "Protocol_Amending_the_Agreements_Conventions_and_Protocols _on_Narcotic_Drugs_concluded_at_The_Hague_on_23_January_1912_at_Geneva_on_11_ February_1925_and_19 _February_1925_and_13_July_1931_at_Bangkok_on_27_November_ 1931_and_at_Geneva_on_26_June_1936.doc
". That 251 characters long! After some investigation it turned out that the underlying filesystem, ntfs, allow filename as long as 255 characters, but Windows refused to serve this file by SMB. Instead we got a "No such file or directory" error, even if opening the folder as a network share in Windows Explorer and clicking on the file.

There actual is such a treaty name according to Wikipedia, but that dos't mean that the file need to be named the same. Please keep you filenames below 128 characters people, or you will be in trouble sooner or later!

Tired of creating threes and writing code to manage deadlocks and work queues? Search is cpu intensive, and we uses a lot of threads. For example indexes are sorted in parallel, and the pages that go on the result page is fetched from the disk and processed in parallel.
We started out creating threads manually, but that i slow going in C. We have now almost entirely changed to OpenMP, and haven't looked back since.

We are using a lot of regular expressions her at Searchdaimon. Regex are used through Lex and Yacc to pars queries, pars html and to make the snippets on the result page. It is also heavily used to extract and validate data, tags and entropies in the crawlers.

Her I am testing out a regex to extract email addresses and names from documents. The names and email addresses could then be added as attributes to the document, to enable filtering in the search results. Constructing regexs like this using only a text editor and relaying on try and fail won't be easy.

Start-up company Blekko have made some revolutionary innovation in the field of internet search. Using their invention “slashtags” you can easy filter and sort your results. For example a search for “Apple Computers” gives you the results you would get in Google. But you can also add slastags to filter the results: