IP address can now pin down your location to within a half mile

A clever new research technique can geolocate IP address to within a few …

On the Internet, nobody knows you're a dog—but they might now have an easy time finding your kennel.

In a research paper and technical report presented at the USENIX Networked Systems Design and Implementation (NDSI) conference at the beginning of April, researchers from Northwestern University presented new methods for estimating the exact physical location of an IP address tens or hundreds of times more accurately than previously thought possible. The technique builds on existing approaches but adds a new element: it uses local businesses, government agencies, and educational institutions as landmarks, helping it achieve a median accuracy of just 690m—less than half a mile.

The researchers, led by Yong Wang, used a variety of statistical techniques to combine data from 163 public ping servers and 136 traceroute servers into a precise estimate of the range of possible physical locations for a particular IP address. They state that, despite the large number of data sources they need to combine, their technique is capable of real-time use, giving results in just one or two seconds in real-world applications. The novel technique uses several iterations to successively hone in on a target's location.

How it works

Step one: a signal travels through optical cables at about two-thirds the speed of light, which drops down to about four-ninths the speed of light once you account for queuing at uncongested routers. The researchers' first iteration takes advantage of this fact by pinging the targeted address from multiple servers, then recording the amount of time that it takes a signal to return. Since the servers have known locations, this method of absolute timing results in a selection of circles around the ping servers; and the target must lie within the area where all of these circles overlap.

At this point, the researchers have a pretty good idea of the general area of the target address, (to within several miles) so they can start homing in by looking for local landmarks.

Step two: a selection of points within the possible area are selected, and these geographic points are converted into their corresponding postal ZIP codes. For each ZIP code found, a commercial mapping service is used to guess at a variety of possible businesses, schools, and other institutions in the area. The researchers are looking for locations that publish their street address on their website and also host their website from that same physical address. The websites of the candidate business are scraped, looking for a street address.

Meanwhile, a couple of clever techniques are used to weed out websites that are hosted by a CDN, on a shared hosting service, or otherwise located away from the physical address. The resulting places are very important landmarks, because they combine a known location on the network with a precise geographic point.

Step three: now that the researchers have reliable pairs of IP and physical addresses, they can start searching for Internet backbone routers in the vicinity. They send traceroute requests from as many servers as possible to both the nearby landmarks and to the target IP address. Comparing some of these traces and the geographic locations of the known landmarks, they can deduce which nearby routers are connected to both the target and the landmark.

Then, using timing data from the pings, they eliminate congested routers which add too much delay to be reliable sources of distance data. The time it takes these nearby routers to ping the target allows for another, more fine-grained set of circles which constrain the target's location again, this time down to the area of just a few city blocks.

It turns out that physical distances vary in close proportion with relative ping times of nearby landmarks. The researchers can look at a particular router and see how long it takes pings through that router to reach landmarks and the target. The relative ping times can then be translated into quite accurate local distances. Now, the research team can guess how close the target is to the small number of landmarks which remain in the possible area, and associate its physical location with that of the nearest, most reliable landmark.

This final analysis gives a very good guess at the target's location: the median estimate is about 690m away from the target's actual position. That's almost close enough to send in the black helicopters—or the lawyers.

Here come the ads

The most important part of the research is that the method described is completely client independent: it doesn't require any particular software on (or even permission from) the computer being targeted. This makes it particularly valuable to advertisers, who can now choose to target ads for the burger joint down the street or the record shop a block over.

But the technique also has some serious privacy implications. Before this, turning an IP address into a truly accurate location required a lot of work and some human interaction. With this method, the barriers to accessing real location data are considerably lower.

OK, maybe that's nice for static IP's, but what about DHCP?What about when Comcast decides to rebalance their nodes and put you on a different CMTS?What if your ISP changes your IP weekly?

Great, more Hardees ads for me while living on the west coast. >.<

Seriously, with all the different ISP's, different equipment, different ping times, different lag times (even in a HFC network) I am surprised, and alarmed, that they can get to that level of geo-location through aggregate data.

I wonder what the ads would be like for a satellite connection? *smirk*

...and of course all someone has to do to bollix this is to patch the router to respond to pings with a variable time delay. In fact, their own approach could be used in reverse to identify the locations of the landmark servers so as to optimize the ability to spoof the router's location, or to merely inject sufficient noise in the timing signals. Of course, the locations of the routers which are used to scope out the landmark servers could presumably be compromised while doing so, so best of all is to have someone maintain a list of landmark servers, and distribute it via firmware update to your router. Maybe this'd go well with the tomato firmware. :-)

...and of course all someone has to do to bollix this is to patch the router to respond to pings with a variable time delay. In fact, their own approach could be used in reverse to identify the locations of the landmark servers so as to optimize the ability to spoof the router's location, or to merely inject sufficient noise in the timing signals. Of course, the locations of the routers which are used to scope out the landmark servers could presumably be compromised while doing so, so best of all is to have someone maintain a list of landmark servers, and distribute it via firmware update to your router. Maybe this'd go well with the tomato firmware. :-)

I like that idea! You could even program your router(s) to purposely delay a ping response if coming from a non-whitelisted address (non-whitelisted being your own privacy-centered router(s) advertised to customers as being safer for them).

The bright side of a somewhat better geolocated addressing space is that it may help ferret out some weasels out of their burrows. I’m tired of Nigerian impostors calling from Tennessee or Lithuania. Better geolocation will allow me to consider only genuine offers from authentic Nigerian princes.

If this technique relies on timing pings, couldn't you hide your location by configuring your router to add a random delay (say 100 to 300ms) when responding to pings?

You'd likely just get advertisements focused for times within the increased range. Though very possible to configure most routers to add additional hop counts depending on the type of routing you are using (BGP for instance), which is commonly used to prefer a secondary path on redundant routers. I'm sure there is some what to configure routers to at least confuse the system.

If this technique relies on timing pings, couldn't you hide your location by configuring your router to add a random delay (say 100 to 300ms) when responding to pings?

Since it relies on pings, I would think this approach is pretty much defeated by most firewalls, which already don't respond to ping packets. You're back to whatever ISP machine is closest to you and can be pinged.

Neat research, but I would think this would be of limited use to advertisers. If the target does not respond to a ping request, the whole thing falls apart (you can traceroute and find the last responding ISP router, but that's it). I believe the firewalls on most computers and consumer routers/NAT boxes block pings by default these days anyways...

With widespread use in advertising, this would probably just result in a lot more people blocking pings -- particularly the unrelated "landmarks" who would have to pay for additional bandwidth unrelated to their business.

C Boy: http://en.wikipedia.org/wiki/Content_delivery_network ... and no, there's not really a reson for your average consumer to allow their router to respond to pings. It's mostly for diagnostic purposes, and not even relevant there unless you're running some kind of server.

Back in the good ol' days, @Home, AT&T Broadband, and now Comcast used to use a 10.x.x.x IP to internally communicate (provision and interrogate status) with them via their HFC network (which seemed like a good idea) but within the last few years they use public IP's. Why? Have no idea, seems to be a waste of IP's. Maybe they are just banking them for further use. I see no reason for the communications between the CMTS and the modem to use other than private network IP's as they would be a little more secure...even though they are using MD5 hashes to encrypt the data between the modem and CMTS.

Now if there were only a way to not respond to signals from unwanted servers or programs... Wait a minute, Noscript, peerblock, adblock+, ghostery, vpn tunneling, and a hundred other ad killing and privacy enhancing applications/protocols are available.

It's not a bad idea for serving personalized ads, but if someone doesn't want to be found online this won't do much to stop them.

If this technique relies on timing pings, couldn't you hide your location by configuring your router to add a random delay (say 100 to 300ms) when responding to pings?

Though very possible to configure most routers to add additional hop counts depending on the type of routing you are using (BGP for instance), which is commonly used to prefer a secondary path on redundant routers. I'm sure there is some what to configure routers to at least confuse the system.

Didn't adding to the BGP hop count bring down the internet in half of Europe a few years ago?

Now if there were only a way to not respond to signals from unwanted servers or programs... Wait a minute, Noscript, peerblock, adblock+, ghostery, vpn tunneling, and a hundred other ad killing and privacy enhancing applications/protocols are available.

It's not a bad idea for serving personalized ads, but if someone doesn't want to be found online this won't do much to stop them.

That's all true; anything that masks your IP will work, as well as those other techniques. But if it's effective against your average internet user, the technique would still see widespread use. Thankfully, this technique really isn't very effective at all, despite the slightly alarmist tone of the last section of the article.

"This makes it particularly valuable to advertisers, who can now choose to target ads for the burger joint down the street or the record shop a block over."-Can't they already get this from your browser or computer location data?

Probably has already been said, but I'm pretty sure the internet doesn't work like they think it does, and I would be surprised if this actually works well enough to get a 5 mile radius on a real user on a real network. Physical distance is only an indirect input on internet routing. Additionally, self hosting, from your published physical address seems awfully rare these days.

So, where can I test this to see if it actually works? Theory is interesting, but it seems to rely on a lot of assumptions and guesses. Until I see a live demo I dont believe it actually works as well as they claim.

Why do you think Apple and Google are collecting all that location information on their cell phones! They know were we live! It could be argued that all that data could be crossed referenced with your home IP. In other words, your mobile phone gave you up!-I always have said, Google is the NSA, maybe Apple too, and other technology companies too!