Specify a query (and optional options) to the current search object. Previous query (if any) and its cached results (if any) will be thrown away. The option values and the query must be escaped; call WWW::Search::escape_query() to escape a string. The search process is not actually begun until results() or next_result() is called (lazy!), so native_query does not return anything.

The hash of options following the query string is optional. The query string is backend-specific. There are two kinds of options: options specific to the backend, and generic options applicable to multiple backends.

Generic options all begin with 'search_'. Currently a few are supported:

Call this method (anytime before asking for results) if you want to communicate cookie data with the search engine. Takes one argument, either a filename or an HTTP::Cookies object. If you give a filename, WWW::Search will attempt to read/store cookies there (by passing the filename to HTTP::Cookies::new).

$oSearch->cookie_jar('/tmp/my_cookies');

If you give an HTTP::Cookies object, it is up to you to save the cookies if/when you wish.

Enable loading proxy settings from environment variables. The proxy URL will be read from $ENV{http_proxy}. The username for authentication will be read from $ENV{http_proxy_user}. The password for authentication will be read from $ENV{http_proxy_pwd}.

If you don't want to put passwords in the environment, one solution would be to subclass LWP::UserAgent and use $ENV{WWW_SEARCH_USERAGENT} instead (see user_agent below).

Set the maximum number of hits to return. Queries resulting in more than this many hits will return the first hits, up to this limit. Although this specifies a maximum limit, search engines might return less than this number.

Set which result should be returned next time next_result() is called. Results are zero-indexed.

The only guaranteed valid offset is 0, which will replay the results from the beginning. In particular, seeking past the end of the current cached results probably will not do what you might think it should.

Results are cached, so this does not re-issue the query or cause IO (unless you go off the end of the results). To re-do the query, create a new search object.

This function provides an application a place to store one opaque data element (or many, via a Perl reference). This facility is useful to (for example), maintain client-specific information in each active query when you have multiple concurrent queries.

Escape a query. Before queries are sent to the internet, special characters must be escaped so that a proper URL can be formed. This is like escaping a URL, but all non-alphanumeric characters are escaped and and spaces are converted to "+"s.

This internal routine creates a user-agent for derived classes that query the web. If any non-false argument is given, a normal LWP::UserAgent (rather than a LWP::RobotUA) is used.

Returns the user-agent object.

If a backend needs the low-level LWP::UserAgent or LWP::RobotUA to have a particular name, $oSearch->agent_name() and possibly $oSearch->agent_email() should be called to set the desired values *before* calling $oSearch->user_agent().

If the environment variable WWW_SEARCH_USERAGENT has a value, it will be used as the class for a new user agent object. This class should be a subclass of LWP::UserAgent. For example,

Get / set the value of the HTTP_REFERER variable for this search object. Some search engines might only accept requests that originated at some specific previous page. This method lets backend authors "fake" the previous page. Call this method before calling http_request.

Get / set the method to be used for the HTTP request. Must be either 'GET' or 'POST'. Call this method before calling http_request. (Normally you would set this during _native_setup_search().) The default is 'GET'.

Get or set the URL for the next backend request. This can be used to save the WWW::Search state between sessions (e.g. if you are showing pages of results to the user in a web browser). Before closing down a session, save the value of next_url:

WARNING: It is entirely up to you to keep your interface in sync with the number of hits per page being returned from the backend. And, we make no guarantees whether this method will work for any given backend. (Their caching scheme might not enable you to jump into the middle of a list of search results, for example.)

This internal routine splits data (typically the result of the web page retrieval) into lines in a way that is OS independent. If the first argument is a reference to an array, that array is taken to be a list of possible delimiters for this split. For example, Yahoo.pm uses <p> and <dd><li> as "line" delimiters for convenience.

Fetch the next page of results from the web engine, parse the results, and prepare for the next page of results.

If a backend defines this method, it is in total control of the WWW fetch, parsing, and preparing for the next page of results. See the WWW::Search::AltaVista module for example usage of the _native_retrieve_some method.

An easier way to achieve this in a backend is to inherit _native_retrieve_some from WWW::Search, and do only the HTML parsing. Simply define a method _parse_tree which takes one argument, an HTML::TreeBuilder object, and returns an integer, the number of results found on this page. See the WWW::Search::Yahoo module for example usage of the _parse_tree method.

A backend should, in general, define either _parse_tree() or _native_retrieve_some(), but not both.

Additional features of the default _native_retrieve_some method:

Sets $self->{_prev_url} to the URL of the page just retrieved.

Calls $self->preprocess_results_page() on the raw HTML of the page.

Then, parses the page with an HTML::TreeBuilder object and passes that populated object to $self->_parse_tree().

Additional notes on using the _parse_tree method:

The built-in HTML::TreeBuilder object used to parse the page has store_comments turned ON. If a backend needs to use a subclassed or modified HTML::TreeBuilder object, the backend should set $self->{'_treebuilder'} to that object before any results are retrieved. The best place to do this is at the end of _native_setup_search.

When _parse_tree() is called, the $self->next_url is cleared. During parsing, the backend should set $self->next_url to the appropriate URL for the next page of results. (If _parse_tree() does not set the value, the search will end after parsing this page of results.)

When _parse_tree() is called, the URL for the page being parsed can be found in $self->{_prev_url}.

Given a reference to a hash of string => string, constructs a CGI parameter string that looks like 'key1=value1&key2=value2'.

If the value is undef, the key will not be added to the string.

At one time, for testing purposes, we asked backends to use this function rather than piecing the URL together by hand, to ensure that URLs are identical across platforms and software versions. But this is no longer necessary.

WWW::Search supports backends to separate search engines. Each backend is implemented as a subclass of WWW::Search. WWW::Search::Yahoo provides a good sample backend.

A backend must have the routine _native_setup_search(). A backend must have the routine _native_retrieve_some() or _parse_tree().

_native_setup_search() is invoked before the search. It is passed a single argument: the escaped, native version of the query.

_native_retrieve_some() is the core of a backend. It will be called periodically to fetch URLs. It should retrieve several hits from the search service and add them to the cache. It should return the number of hits found, or undef when there are no more hits.

Internally, _native_retrieve_some() typically sends an HTTP request to the search service, parses the HTML, extracts the links and descriptions, then saves the URL for the next page of results. See the code for the WWW::Search::AltaVista module for an example.

Alternatively, a backend can define the method _parse_tree() instead of _native_retrieve_some(). See the WWW::Search::Ebay module for a good example.

A portable language would easily allow you to move queries easily between different search engines. A query abstraction is non-trivial and unfortunately will not be done any time soon by the current maintainer. If you want to take a shot at it, please let me know.

Copyright (c) 1996 University of Southern California. All rights reserved.

Redistribution and use in source and binary forms are permitted provided that the above copyright notice and this paragraph are duplicated in all such forms and that any documentation, advertising materials, and other materials related to such distribution and use acknowledge that the software was developed by the University of Southern California, Information Sciences Institute. The name of the University may not be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.