Search Engines Corner: Meta-search Engines

Tracey Stanley discusses the next level up from conventional search engines in the 'information food chain', which provide a sophisticated approach to searching across a number of databases.

When I first started using search engines on the Web a few years ago, I had a choice between a fairly small number of available tools. In the early days of search engines, choosing an engine was really pretty much a matter of finding one where the host was available when you needed to use it, and which would process your query fairly quickly. The searching facilities which were available were all quite unsophisticated, and the databases behind the engines were generally small, when compared to the huge databases available from services like Alta Vista and Excite today.

These days, however, the whole process of selecting an appropriate search engine to use has become far more complex. For a start, there are now an estimated 2000 different search engines available on the Web, whereas in 1995 there were perhaps a dozen. Each of these search engines has different features, searching facilities and interfaces for the user to assimilate; resulting in a separate learning curve to climb each time you want to try a new service. Furthermore, none of them can offer comprehensive searching of the Web, and although many of them may have overlaps in their database coverage, it is also likely that potentially useful resources may be indexed on one search engine and not another. If you want to avoid missing out on potentially useful information sources on the Web you are probably going to have to face the fact that you will need to use more than one search engine to get the results you need.

Web searching is a time-consuming process at the best of times, so imagine how much more time-consuming it is going to be if you have to search a number of engines instead of just one each time you need to find some information. Meta-search engines have been developed in order to streamline and simplify this whole process.

Meta-search engines have been described as the next level up from conventional search engines in the 'information food chain', in that they provide a sophisticated approach to searching across a number of databases. A meta-search engine basically provides the opportunity to search a number of different search engine databases at the same time. This is in contrast to the conventional approach of having to search each database separately by calling up the search engine home page and inputting keywords. Meta-search engines don't collect or maintain their own individual databases; instead they utilise the databases of other available search engines. This can cut down significantly on hardware costs for the Meta-search engine provider, and also on utilisation of the Internet itself, as robots don't have to be used to gather and index a database.

The advantages of using a meta-search engine include:

you need only access a single Web page in order to perform your search

you need only learn one interface for searching

you need only type your search query once

you can perform more thorough searching across a wider number of search engines

you can obtain an integrated set of results, with (in many cases) the duplicate results stripped out.

Individual meta-search engines offer searching across a wide range of different search engines. Some may provide for searching across a handful of the best-known search engines, whereas others may offer searching across a much wider range of engines. Metacrawler, for example, utilises Alta Vista, Excite, Infoseek, Lycos, Webcrawler and Yahoo!

In order to take a closer look at how meta-search engines function and operate, I've chosen to investigate two examples. These are Metacrawler and WebFerret.

Metacrawler

Metacrawler was originally developed as a prototype search service in 1994 at the University of Washington. It is now operated by go2net Inc., an Internet content company based in Seattle.

Metacrawler can be used to search Alta Vista, Excite, Infoseek, Lycos, Webcrawler and Yahoo! Two options for searching are available: simple search and power search. The simple search consists of a single search form, with options available for searching on ALL keywords (Boolean AND), ANY keywords (Boolean OR) or on phrases. The power search option provides a number of additional features, such as restricting your search to documents from North America, Australia, Europe and Asia etc., and for controlling the maximum number of results you want to be retrieved from each search engine. Power searching options can be set and then re-utilised the next time the service is used.

Metacrawler performs relevancy ranking on your search results, arranging them in terms of a score for 'best match'. This ranking is presented next to each result on screen. Results from the search engines are collated, and the duplicates are stripped out.

In order to explore the performance of Metacrawler, I submitted a search on the topic "Brit Awards 1998" (with the option to search for this 'as a phrase' selected).

The first point to note from this search is than Metacrawler was able to complete it extremely quickly. It searched across all of the search engines and returned my ranked results in less than five seconds, and discovered 14 references. The search engine which has provided each of these references is listed next to the result.

The following table lists the top five references which Metacrawler retrieved for me:

1000 Brits Nominees For 1998 BRIT AWARDS 1998 The nominations for the 1998 Awards were announced Monday 12 January at the Café de Paris in London by Paul Conroy, President of Virgin Records and Chairman of the http://www.bpi.co.uk/britsall.htm (Infoseek)

The number next to the title of each resource indicates the relevancy score it has received.

However, on immediate investigation the most accurate of these results appears to be the second result, which has come from InfoSeek, as this actually mentions the Brit Awards in the title and first paragraph. Of the other sites, the ones which have been retrieved from Webcrawler appear to be least relevant to my topic. The first result actually contained the date 1998 and the word awards quite a lot in the text of the page, but no mention of my other keyword. This is possibly because Webcrawler doesn't support phrase searching, and therefore, documents which contain any of my keywords are being retrieved from that search engine.

Unfortunately there don't appear to be options available for restricting the choice of search engines which Metacrawler utilises. It would be extremely useful to be able to exclude a particular search engine from the list according to your needs, especially when, as shown above, one seems to produce results that are off-track.

I tried the search again, and this time enclosed it in quotation marks, as I am aware that many search engines use this notation to signify a phrase. This time I received nine results, only one of which was from Webcrawler.

Given that Infoseek seemed to have provided my most accurate reference, I tried the search again on Infoseek to see if Metacrawler had missed anything from that database. Infoseek retrieved 5 results, all of which I had already seen from the Metacrawler search. Three of these results came from the same Web site, which Metacrawler had stripped out as duplicates. Metacrawler had also found some of these sites on other search engines for me, and presented them, once again stripping out the duplicates.

WebFerret

WebFerret is a freely available Windows client for Meta-searching which is available for Windows 95 and NT. The great advantage of having this available as a separate Windows client rather than on a Web page is that it makes it possible to leave the search engine running in the background whilst you do other things. Potentially you could even be looking at other pages on the Web whilst the client works away quietly behind the scenes.

WebFerret has an extremely simple interface: a simple search form, with options for searching for all and any words in the search string.

WebFerret utilises the following search engines: Alta Vista, Excite, Hotbot, Yahoo, AOL Netfind, Euroseek, Galaxy, Infoseek, LookSmart, Lycos, Search.com, Webcrawler, and one Veronica engine. It is possible to select from this list so that you can choose to exclude certain search engines. Each request to each search engine is sent off simultaneously, and the results are then listed in the order in which they are received. The results are listed in a window, and the user can then automatically look at any result by double-clicking it to launch a Web browser of their choice. Options also exist to save and re-run searches using WebFerret.

To test WebFerret against Metacrawler, I ran a search on the topic "Brit Awards 1998". I selected Alta Vista, Excite, Hotbot, Infoseek, Lycos and Yahoo from the list of available search engines, and chose the option to search for all the words in my query. WebFerret began to run the query, and I stopped it when it reached 78 results. On looking at the results I discovered that many of them appeared to be irrelevant to the topic as WebFerret appeared to be retrieving documents which contained any of my keywords, so I decided to revise my search strategy. I tried again using the same phrase enclosed in quotation marks in order to utilise phrase searching on the search engines which support this feature. This time I received four results.

Interestingly, WebFerret didn't find the Infoseek result which Metacrawler had already supplied for me. All of the results which it retrieved came from Alta Vista or Yahoo! On viewing the documents, three of them contained the exact phrase "Brit Awards 1998", and the fourth contained simply the keyword 1998. Again, WebFerret was extremely fast in both processing and displaying my results.

Hints and Tips for Using Meta-Search Engines

It really does help if you already have a good understanding of the search engines which the meta-search tool is utilising. For example, if you know how to perform phrase searching in Alta Vista, you can get much better results from a Meta-search engine by using the phrase search notation when keying in your search. The Meta-search engine will then utilise this effectively when searching on your behalf.

Many of the big search engines share similar search query languages these days (Alta Vista, Excite, Yahoo, InfoSeek and Hotbot all use quotation marks to signify a phrase, for example), so you can get quite effective results from a meta-search engine which utilises these tools if you pay attention to these sorts of features.

Its also a good idea to be quite selective about the search engines you choose to utilise when using a meta-search engine. Many will have an option whereby you can select from a range of search engines. If you choose the ones which you have past experience of as being generally reliable, fast, accessible and powerful you will usually get quicker and more accurate results.

Although it may be tempting to search across a hundred search engines at the same time, you need to think about the likely number of results you are going to get, and the time it is going to take to produce these results. Nobody wants to have to sift through millions of documents retrieved from a search, and nobody wants to wait around for an hour to see their search completed. This would also be wasteful in terms of use of limited network resources.

As ever, try to do your Web searching at times of the day when the internet is less likely to be very busy. Almost all of the major search engines are based in the US, and meta-search engines don't usually give you the option of switching to European mirror sites where these are available. If you search at less busy times of the day you'll generally find that your results are retrieved more quickly and with less frustration from the search engines.

Overall, meta-searching tools do represent an extremely useful method of retrieving information from a number of search engines without having to visit each engine individually. However, they do tend to work a lot more effectively if you already have a good understanding of Web search engines in general, and if, in particular, you are very familiar with ways of getting the best out of the particular search engines that they are utilising.