FAQ: Adding Search Engines

Can I configure Sawmill to recognize search engines other than the ones it knows already?

Short Answer

Yes -- just edit the search_engines.cfg file in the LogAnalysisInfo directory with a text editor.

Long Answer

Yes; Sawmill's search engine recognition mechanism is easily extensible.
All the search engines Sawmill knows are described in a text file
called search_engines.cfg, which is found in the LogAnalysisInfo directory of
your Sawmill installation. Sawmill puts several dozen search engines in there
to begin with (the big, well-known ones), but you can add as many more as you like,
by editing the file with a text editor. Just add a new line for each new search engine,
and the next time Sawmill processes log data, it will recognize those search engines,
and it will include them in the database.

The "name" value for a search engine name of the search engine;
put whatever you want the search engine to be called there. That's
what will appear in the statistics. The "substring" value is a "quick check"
that Sawmill uses to check if a URL might be a URL from that search engine.
If the URL contains the "quick check" string, Sawmill then does a slower check
using the "regexp" column, which is a regular expression. If the regular expression matches,
Sawmill uses the parenthesized section of the regular expression as the search terms
(it should be a series of search terms, separated by plusses (+)).
The parenthesized section is used to compute the search terms and search phrases statistics.

You might notice that the "substring" column is redundant -- Sawmill doesn't really need it at all,
since it could just check every URL with the regular expression. The reason that second
column is there is that regular expressions are relatively slow -- Sawmill can process
log data much faster if it doesn't have to check every URL in the log data against
dozens of regular expressions. This way, it only has to use the regular expressions
on a tiny proportion of the URLs that it sees.