How a Search Engine Might Determine Whether a Search Involves a Geographical Intent

Many web sites involve businesses or organizations that provide goods or services or information relevant to people at a specific location, like the location of a hotel or a dentist’s office in a certain city, or building regulations for a specific town. Many searchers use search queries that may not include geographic information in a way that makes it easy for a search engine to help those searchers find those web sites.

If a search engine can understand whether a search involves a specific geographical location from a searcher’s query, it can provide a richer set of results that include information about that location.

This is true regardless of whether or not the location was even part of the query. For example, if I search for “pizza,” there’s a decent chance that I’m looking for a pizza place nearby.

This is also true if I include something like a landmark in my search, rather than the name of a location, such as a search for “space needle restaurants,” looking for restaurants near Seattle’s Space Needle.

This can be difficult, because some queries might seem to be about locations on their face, but aren’t. For instance, search for “New York Style cheese cake,” and chances are good that you want to see recipes, and not pages about New York. Search for “manhattan coffee,” and chances are good that you want to see information about coffee shops in or near Manhattan.

A paper from Xing Yi of the University of Massachusetts, and Hema Raghavan and Chris Leggetter of Yahoo! Labs, Discovering Users’ Specific Geo Intention in Web Search (pdf), explores how they might use a geographical intent analysis program that can understand whether or not there is an intention on the part of searchers to see geographically related information as part of their search results. This program involves the use of a “city language model” that calculates the probabilities that certain words and language in a query indicate an interest in information for a particular city.

We build a geo intent analysis system that uses minimal supervision to learn a model from large amounts of web-search logs for this discovery.

We build a city language model, which is a probabilistic representation of the language surrounding the mention of a city in web queries.

We use several features derived from these language models to:

(1) identify users’ implicit geo intent and pinpoint the city corresponding to this intent,

(2) determine whether the geo-intent is localized around the users’ current geographic location,

(3) predict cities for queries that have a mention of an entity that is located in a specific place.

Experimental results demonstrate the effectiveness of using features derived from the city language model. We find that

(1) the system has over 90% precision and more than 74% accuracy for the task of detecting users’ implicit city level geo intent

(2) the system achieves more than 96% accuracy in determining whether implicit geo queries are local geo queries, neighbor region geo queries or none-of these

(3) the city language model can effectively retrieve cities in location-specific queries with high precision (88%) and recall (74%); human evaluation shows that the language model predicts city labels for location-specific queries with high accuracy (84.5%).

Some interesting statistics mentioned in the paper:

13% of searches involve some level of geographic intent

50% of searches involving geographic intent don’t actually use a location in the query (like searches for “pizza” or “dentist”)

84% of queries that include locations do so on a city level

Some other interesting points from the paper:

– If a query with a geographically related intent is one where the location isn’t stated, but is likely to be associated with the location of the searcher, it can make sense for a search engine to look at information like the IP address (or GPS information if the searcher is using a mobile phone) of the user to deliver locally relevant results.

– Some queries that show a geographic intent may have a geographic region of relevance that is larger or smaller than others – people might be willing to travel only 10 miles for “pizza” but will go 30 miles away for a good “dentist.”

– A query with an explicit geographical intent consists of two parts – location and non-location. Studying non-location parts of those explicit queries can help a search engine understand queries where the location part isn’t included in a search.

– If the language model used is quickly retrained on a regular basis from query log and click-through information, it might help a search engine to adapt to seasonal changes and provide timely information, allowing answers providing location information to queries like “next red sox game.”

– The geographical intent analysis program described in this paper was trained with a month of Yahoo query logs from May 2008, and tested with a month of Yahoo query logs from June, 2008. It provides some detailed information about the processes involved in detecting whether or not a query has some kind of geographical intent, and where the location behind that intent might be.

– In the training set of data, approximately 96.2 million U.S. city level geo queries were identified. In the testing set, 96.7 million U.S. city level geo queries are identified. Overall, the researchers involved in this study found 1614 distinct cities from the data in both sets. Language models were created for each of those cities, so that when non-location information relevant to a specific city is seen in a query, it may be examined to see if it is related to that city (such as “space needle” being related to Seattle, or “Macy’s Parade” being related to New York City.)

If you’re interested in how a search engine might decide whether or not a query has a location or geographical element to it, regardless of whether or not the query actually states a specific location, you may want to spend some time with this paper.

If you have a web site that offers goods or services or information tied to a particular location, the processes described in this paper are some that may help searchers stand a better chance of finding your site online the next time that they search for “attorney’s office,” or “camping near shenandoah park,” or “Macy’s Parade Hotel,” or use some other query that may involve a geographical intent without including an actual location.

Search terms with and without a geo modifier are difficult. For example “California Pizza” could be a search for California Pizza Kitchen or a search for the best pizza in California. When Greg Sterling, Atiq, and myself were discussing this at SMX West on the local panel, we felt that 20-30% of searches had local intent– the 30% being on the high side if you include terms such as “pizza” with no geo modifier.

Even terms such as “plumber”, if you do a query from wordtracker or your favorite keyword tool, will yield a surprising number of results for items that are unmentionable in public. I’d love to see an analysis that excludes pornographic intent– or an analysis of what share of internet traffic is adult-related or adult and local (cf. craigslist’s recent problems).

Thanks for the informative post William.
I do agree with Dennis that it would be interesting to see this broke down by segments. If you do a search for “real estate” are those 90% local without a location given? How about “chicken soup” I am guessing that 75% of those people are looking for recipes.
Lots to think about.

Thank you. I’ve been thinking about all of the businesses and organizations that I know about that use geographic terms as part of their business names, and how many products and services use geographic terms as part of their product or service names. Those are just a couple of the things that can make determining a geographical intent in a search difficult. I really liked the approach taken in this paper, and was happy that they shared some statistics and some results of their experiments with us. I’d love to see more statistics that break queries down into different types as well.

I’m becoming more and more disenchanted with many of the keyword research tools that I see online. It’s discouraging to see at least one very popular one provide a keyword density tool, and none of them seem to have picked up on how Google seems to now treat stopwords.

It’s not hard to imagine that 20-30 percent of searches at the major search engines have some amount of local intent, including ones that don’t explicitly include a location within the actual query. I expect that number to grow as more businesses, even ones without websites, start paying more attention to services like Google Maps and Yahoo local, and the payoff of such searches increases.

You’re welcome. Thanks for your thoughts. I’d love to see more data from the search engines as well. I know that the amount of information that the search engines collect about searches, queries, web browsing, and other user activity can be overwhelming, and may yield some unexpected results.

For instance, you mention that 75 percent of people searching for “chicken soup” may be looking for recipes. But interestingly, the first result in Google for that term is for the book series “chicken soup for the soul.” And, a number of other results involve health related information about chicken soup. I’d love to be able to see what the breakdown between those three categories, recipes and self help and health, might be in searches as well.

If you optimize the pages of a site for specific keywords, trying to gauge the intent behind a query that uses those keywords, and how a search engine might interpret that intent is becoming more and more of a part of the analysis that you need to perform.

Good question. A search engine might be able to tell a little about your possible intent behind your searches by looking at some settings from your browser and from the preferences that you set with the search engine. Those setting might indicate a preference for sites in a certain language, for instance. They won’t tell a search engine your location, or many other things about you. For some types of searches, such as local movie times or weather, Google may ask you if you want to set a preferred location. But that doesn’t help if you’re planning a trip to a distant location, and you want to find information about businesses at that distant location, or if you’re traveling and using the search engine from somewhere outside of that “preferred” location.

And, the paper also discusses how it might try to understand if a search has a local intent, which isn’t something that you can set beforehand at a search engine or in your browser. For instance, if you search for [houston flowers], are you looking for the location of a local flower shop named “Houston Flowers,” or flowershops in the Houston area, or information about flowers that are known to grow near houston, or originate from the Houston area? Deciding if a query is associated with a location can be hard, and determining what that location is can also be difficult as well.

Thoroughly interesting post and comments. I’m slightly in fear of the way that search engines are developing to ever further predict what you are actually searching for. Due to the complex nature of the whole issue of geographical intent – for some of the reasons outlined above amongst others – I’m impressed (and a little surprised) by the claimed precision and accuracy figures quoted.

Overall I can’t help thinking that no matter how clever the search engines get, it’d be nice to have a big red button that turns these features off and lets you just see what turns up. For instance, with the Houston flowers example in your last comment William, although someone may indeed have been searching for flower shops in the Houston area, if a result popped up that gave info about what flowers were known to grow near by that may be interesting in itself and take the browser off in a direction they themselves didn’t know they wanted to go. Just a thought.

I find this to be very valuable information because a geographically targeted search very often indicates a higher commercial intent. Understanding how search engines approach geographical search can be very helpful in ranking for the best keywords (the ones that lead to sales). I appreciate all the input.

I’d like to see a big red button like that myself. The intent behind my search may have nothing to do with my past searches, or the searches of others. I may just be more likely to want to find what kinds of flowers grow around Houston than looking for a flower shop there, even if millions of other searchers were looking for florists in Texas.

Do predictive queries take away some of the “discoverability” that searchers may want – the chance not to find the same thing that everyone else is looking for, but rather something interesting, new, and unique?

Well, in one way they do have a list of keywords – and this paper explains one way how they might come up with those keywords. In the language model they describe building, if they look at queries that include both a location and a non-location, and the non-location words keep on showing up, then those non-location terms and phrases may be ones where a geographic intent may be implied.

You have me wondering how much of a correlation there is between commercial intent and geographic intent. I would really like to see data about that relationship. Of course, some geographically related searches for things like local schools, parks, and churches aren’t going to have commercial intentions behind them. But many local searches are likely going to be for businesses.

Search engines are becoming smarter day by day, they are now understanding the importance of geography in search results. These 50 percent people who have geographical intent but are not specifying it in their query will be benefited by this and will get better results from search engines.

As written by Dennis those 30 percent searches will yield better results to queries this will definitely improve the efficiency of search engines.

It feels like Google is already using IP Location and GPS (when search is done through a mobile phone) for keyword searches that imply the user is looking for local results.

I know (at least I think I know) this is happening at the mobile phone level because I see it when I search for a word like “pizza” by itself.

Try searching for “pizza” on Google, through a non GPS enabled source like your desktop, and make sure you’re not logged in your Google Account. Do you get local results? I do, but it’s pretty off from where I’m at. Is this Google experimenting with IP Location for such keywords?

Thanks. Yes, I think if a search engine can identify when people are actually searching with some geographical intent behind their search that it does benefit those searches. After reading that paper, it was easier to see why Google started showing Google Maps results for searchers typing in queries like “pizza” or “dentist” without specifying an actual location. I really enjoy when we get to see some of the research behind changes that the search engines start making.

Google has been working on providing local results for mobile searchers for a while. Don’t know if you’ve seen their “My Location” service for mobile users – see this page (warning – video starts playing as soon as you arrive) – http://www.youtube.com/watch?v=v6gqipmbcok

The “local” results that I’ve been seeing for searches like “pizza” have actually been pretty accurate for me.

Google may be getting your location from your IP address, or if you’ve set a “preferred location” at some point while using Google Maps, or searching for local weather or local movie listings, and saving that information on a cookie. Or it might take it from your profile if you are logged into your Google Account. It appears that Google has made this change for most searchers, at least in the United States.

Haven’t figured out how this helps and not hurts a service company. My business is home based in a small town 14 miles from a metro center. I go to the customer, the customer never comes to me. Because of this type of profiling my info only shows on the search engine when a search is made for the small town and not the larger city where most of the work is needed. I can overcome this by spending tons of money (that I do not have in this economy) with companies that will promote my business on the web. In this economy with spending down, I’ll spend more than I make.

There is no doubt that Google is starting to give more results that are “locally focused.” This actually makes natural SEO a lot easier for professionals who trying to optimize a website. For instance, we have a dental practice in Houston, and it is much more “natural” to optimize for the term “cosmetic dentist” than “Houston cosmetic dentist.”

That’s a very good point. It can get pretty awkward if you try to fit long phrases like “houston cosmetic dentist” within the content of pages, and it’s something that I’ve been avoiding doing as much as possible for years, with some success.

Search terms with and without a geo modifier are dipretty much dead. with the new google venice i think the global and local search is being mixed – and nobody really knows how google is mixing it. I think they are testing it.
I can see my local geo modifier changes are drastic in the last few weeks- with no reason to it.
So this is something evolving and many of the previous comments may be outdated by venice. bummer.

I think there have been a number of signals of how Google is mixing localized organic results and non-localized results, and it’s been my experience that it’s something they seem to have been experimenting with since at least 2009.

The “venice” update referred to in the Google Inside Search blog post on “40 algorithmic changes” appears to refer to including more Google Maps results into Web results rather than localized organic results, and it does seem like they’ve been experimenting with both from many of the searches I’ve performed recently.