@Mapperz that is a good start, the lat/lon coordinates are not the city / state location, but the city is the nearest city. I could do the query with some tolerance and get better results than just >/<0.
–
DavidNov 2 '11 at 18:02

3

I would ardently advise against making automatic fixes, especially on the basis of such crude checks. You risk making things irretrievably worse. Instead, when a potential error is detected, flag the record and move on. In a later step you can review the flags (or apply better checks), update the data, and leave a related record containing the original values (in case it turns out the fix was incorrect).
–
whuber♦Nov 2 '11 at 18:54

As far as simple checks go, in addition to looking for sign errors, look for lat-lon reversals. Also search for points with coordinates extremely close to (0,0) (many databases store missing values as zeros, so a lot of junk gets dumped off the west African coast :-) and other points with out-of-bounds latitudes or longitudes. For more general advice I would (in all modesty) point you to essential data checking tests (on another SE site).
–
whuber♦Nov 2 '11 at 18:56

3 Answers
3

In a global dataset you will have cases where a point is close to the equator, or Greenwich meridian. At these points a sign change could leave your location in the same country, but at the wrong location.

An alternative approach is to geocode the locations based on the city, county, and country fields. Create a field measuring the distance from the geocoded point to the location in your database. Where the distance is excessive attempt to invert the sign and test the new distance.

Geocoding is a well understood process, and gives a way to actively test all your points, not just the ones that look a bit odd.

One potential pitfall: if your table stores territories with the Country value as the "parent" Country (e.g. Northern Mariana Islands or Guam have Country=US), then your check for incorrect signs will be wrong. To continue with Guam and the Northern Marianas as an example, they are located in eastern (positive) longitudes, so the check for Lon>0 would be true, but it would be wrong to convert longitude to negative. Similar issues will exist for many British territories, as well.

Also, make sure your error checking looks for 0,0 in Lat and Long. These are often entered when coordinates are unknown, and may cause other problems in your error-checking if not accounted for.

For example, the GeoNames dataset (see under "download server") already contains the Lat/Long values of many city/state/country locations. Doing some simple trigonometry (or if you have the data loaded into a spatial database like PostGIS) you could identify the distances of all features to their "known" location in the GeoNames database. Anything that is outside the a user-defined distance tolerance should be investigated further.