I'm working on the backend for an iOS/Android app that collects locations every hour and syncs with the server every three hours. The mobile apps themselves are not under my control and so I can't tweak or even read the code (although I could ask the developers about specifics if need be). The server, given those "batches" of location data enhanced with timestamp and vertical/horizontal accuracy information, tries to determine visits to states/countries: if, for example, it gets some locations that can be placed in new jersey for a given day, it is said that the user visited new jersey once that day. It's based on reverse geocoded addresses to determine the places visited.

The problem is that some of those locations can be noise: bad accuracy, "jumps" or readings from the very boundaries of states can affect the algorithm (which is very naïve: every location it finds, if it's on a different place, is considered as the beginning of another visit, unless it's too unique or too inaccurate).

I was reading earlier about kalman filters, but, as I understand, they not only warrant a very good understanding of the mathematical model involved, they are more suited for real-time measurements. In my case, I already have enough inputs to not need so much prediction as I need smoothing. I'm starting to read about the Douglas–Peucker algorithm to maybe find a solution, but I don't know if I'm on the right track. What do you guys suggest?

I read this paper: "Compression and Mining of GPS Trace Data: New Techniques and Applications" and it seems that a modified version of the Douglas-Pecker algorithm would be my best shot, since I have multimodal data. But, anyway, I'm not sure how to deal with varying accuracy points, what if I include in the compressed curve points which are actually noise? (like appearing in cuba if you're in florida, that's certainly far form the line segment!)
–
lfborjasJul 18 '12 at 23:28

Does each device grab a location exactly once per hour? And how are the devices supposed to be traveling? (e.g. is it all driving only, or can they be travelling by air?) One very simple and actually very effect filter to use is to reject any point which requires a travel speed to and from the previous and subsequent point greater than the maximum expected speed for the subject.
–
blord-castilloAug 8 '12 at 20:04

They can be travelling by any medium (they're mobile phones that people that travel a lot carry with them), and the frequency could be less than an hour or more, depending on the user settings and the data coverage.
–
lfborjasAug 17 '12 at 20:57