GeoSearch: A Geographically-Aware Search Engine

System Overview

As a proof of concept, GeoSearch searches news articles from more
than 300 on-line newspapers based in the United States. Off line,
GeoSearch estimates the geographical scope of the newspapers based on
the distribution of hyperlinks to them. For example:

The geographical scope of The New York Times is
automatically estimated to be the entire United States (see
below), which intuitively indicates that this newspaper
is generally relevant to users across the country.

In contrast, the geographical scope of The Stanford Daily
is automatically estimated to be mostly the Palo Alto,
California area (see below), which intuitively indicates
that this newspaper is generally not relevant to users,
say, in New York City.

Then, for a query consisting of a list of keywords (e.g., [startups
business]) and the US ZIP code of the user's location (e.g.,
94043), Geosearch:

Uses just the keywords to rank the newspaper articles
using a standard, off-the-shelf text search engine called
Swish.

Filters out all pages coming from newspapers whose
geographical scope does not include the user's specified
ZIP code.

Recomputes the score for each surviving page and returns
the pages ranked in the resulting order. A page's new
score is a combination of the Swish-generated score for
the page and a score related to the geographical scope of
the page (see VLDB '00
paper).

Example Geographical Scopes

Some newspaper geographical scopes, derived automatically
from the distribution of hyperlinks to the newspaper
homepages:

This material is based upon work supported
by the National Science Foundation under Grants No. 9733880 and
9619124. Any opinions, findings, and conclusions or
recommendations expressed in this material are those of the
author(s) and do not necessarily reflect the views of the
National Science Foundation.