Data Quality: The Local Achilles Heel

Google is starting to use the community here to try and improve the database. Because Google buys databases, gets feeds from other local search engines and IYPs and crawls for local information it’s arguably got the most comprehensive local database out there. (Yahoo! and MSFT would probably dispute that.) But until now Google has not directly involved the community in updating and improving the database other than business owners themselves.

There are basically four local databases and everyone in local uses one or more of these (in addition to crawling):

These databases have varying degrees of latency and inaccuracy and I’ve heard people complain about each of them (except GeoSign actually). Yahoo! Local for some time has sought to use the community to improve the quality of the database and update listings.

Involving the community is critical to augment the commercial databases and Web crawling and enhance the accuracy and freshness of the local database. There are obviously additional benefits that go to engagement and user-generated content. But what I’m saying now is that it’s almost a necessity for purposes of accuracy.

You can have tons of bells and whistles, but if your data is limited or inaccurate then all of that functionality doesn’t really matter.

10 Responses to “Data Quality: The Local Achilles Heel”

Good update, Greg. Unfortunately, the correction is only limited to business owners, not third parties, and to date, G isn’t cleaning the data internally.

If you find lots of inclusions of erroneous or old/outdated data (businesses no longer operating) there is no current way to remove the information.

In certain cases G is polluting serps with old outdated erroneous information and in so doing, is minimizing information (or moving them down the serps and page 1) from current businesses and service providers.

Correction or edit: I might be wrong on this. Just got off the phone w/Mike Blumenthal. While he sees the correction option today….I don’t. It may not have propagated across Google’s data centers. Check again tomorrow. Also, according to Mike there is an option for a 3rd party correction for sites not entered thru G business centers.

Ahmed. That is a real problem. Data should be cleaned before going out to the public.
dave

[…] issue of accuracy in local data is an important one (see Greg Sterling’s recent post: Data Quality: The Local Achilles Heel) and there is value in allowing the community to correct any errors. In fact as a tactic, the […]

Greg: I’ve been working on this for several days and asked for help and additional eyes on this from keen observors–specifically Mike Blumenthal.

While I’ve focused on my business and website (and industry) and not a broad variety of industries or types of businesses…it seems to me that Google has taken a step backwards. This is surprising in that in February 2005 they took a quantum leap forward in adding logic to searches for local businesses/services.*

*At that time they altered algo’s relative to searches for local businesses/services that essentially eliminated mega sites such as Amazon/ebay/and mega directories from the highest rankings…and established an algo that tied search logic to an existing website with clear local address/phone number information….and simultaneously eliminated the mega sites…wherein a link to the address/phone number was outside of 3-5 pages.

Bill Slawski wrote about a google patent in summer ’05 (when it was issued) that seemed to define the changes that occurred in February ’05.

The current methodology orients the key serps rankings to a map pulled from G Maps.

Unfortunately that map may well be populated with an extraordinary amount of wrong information pulled from unreliable data sources.

In my example I’ve seen the highest maps rankings populated by busineses that have not been in existance for at least 7-9 years.

Whatever data sources google is pulling from may be widespread…but the wider the net….potentially the more shallow or more prone to mistakes the information might be.

In light of how many small businesses fail within a year or two….google could be populating its G maps info with an enormous volume of sites and businesses with wrong information. Businesses relocate, they change ownership and names, they change product lines and services. Their are endless opportunities for erroneous information. Cripes….large hotels change chains and names….let alone small businesses.

While Google would be smart to involve the community in making changes they should look to additional methods to refine the data. This would have the same impact as the improvements they made in February ’05. With that algo change someone looking up a plumber in Albany New York might well have come up with an Albany plumber in google rather than Amazon or Ebay.

I’d suggest Google look to refine and limit the data sources. Pull more current information. Take them from paid directories such as the print YP. Typically that information is no older than 1 year.

Google should put more weight into existing websites rather than old information. At least that also would substantiate current information rather than outdated data from a huge variety of sources that may not have been rigorous in initially developing the data or maintaining it.

The current situation will right itself, I suspect in time. But in the meantime it appears that Yahoo again better reflects local businesses as it did prior to February 2005.