I have a large number of GPS tracks from a number of different users. I'd like to identify tracks that take the same route and remove those duplicates. For example, a user takes the same streets from the same origin/destination, but obviously with GPS accuracy issues, the lat/longs aren't exactly the same.

Any sort of intersect function simply provides a large set of intersecting sections, but doesn't compare a full track from start to finish.

Currently using PostGIS, but trialing in ArcGIS to visualize and test a procedure before applying it to the 1,000+ tracks and 8 million datapoints.

4 Answers
4

@relet seems to be going in the right direction but I think it would be better to create
ST_Buffer around the lines and then use ST_AREA on the intersection of the two buffers and compare it to the area of the buffer of each track. Determining if the tracks go in the same direction would be more difficult though, comparing of segments suggested earlier seems to be the best.

@Llaves Maybe, checking if the another track lies completely in the buffer should work too but a single point that's too far away would ruin the result (like parking 20 meters further at the end of the road or difference with precision on turn or curve). Though it seems faster so using this first and the area as second might be good.
–
Jakub KaniaMay 6 '13 at 8:14

2

You might want to think about a multi-step procedure using cheap fast tests that quickly determine that tracks aren't similar, then use increasingly expensive tests until you use the buffer intersection test to make your final determination. For example, compare the length of two tracks If the tracks are the same, the lengths should match to some precision, say 95%. Once you have tracks the same length, look to see if the two start points are close together, likewise the end points (this will also distinguish tracks that are identical except for direction traveled.)
–
LlavesMay 6 '13 at 19:05

From the way you write it, the order of the points in the track is not important.

Assuming this, you can split each track into its individual points.
Now, for every two tracks, calculate the sum of distances (or their squares) from each point in the one track, to any point in the other track. If the sum remains below your threshold, the tracks can be considered equal.

If that sum reaches a certain threshold, you can assume that the two tracks are distinct, and continue. In most of the cases, that threshold will be reached with the first point you consider, so the loop should complete reasonably fast. Depending on your threshold function and the differences you expect, you may want to divide by the total length of the track.

The order does matter, sorry that wasn't clear. I was thinking of something along those lines. Will use St_simplify to hopefully reduce the amount of points to compare.
–
Aaron GMay 2 '13 at 15:25

Now that I think about it, this probably won't work as a difference in speed traveled will result in incorrect results. So the 40th point on a track may be at a very different point of a different track even if the same route were taken.
–
Aaron GMay 2 '13 at 15:39

1

You can still compare the tracks segment by segment, increasing the index on track A until it passes the current index on track B, then increasing the index on B and repeat.
–
reletMay 3 '13 at 12:30

Depending on how accurately you must do the matching and how valuable it is to you, and assuming these tracks lie on streets (as you hint in your question), you could convert your tracks to routes on the street network. Once you've done the conversion, the routes are directly comparable. Of course, this raises the question of how to convert the tracks to routes on the street network (and requires that you have a database of all the streets.)

I would approach the conversion by taking the street network, buffering each street, then intersecting the track with the street buffers. If the track really is following streets, you should be able to find each segment on the street network from the beginning intersection to the ending one on each street. Not a trivial undertaking, but it would be very accurate for matching purposes.

The ST_Hausdorffdistance can be used to compare each line_geom against every other one with the same user_id. 500 feet is my boundary to check the two trips' origin points distance from each other to confirm the same route direction and the difference in trip lengths is determined before computing the Hausdorff distance.
A Hausdorff distance that is less than 5% of the total trip length, for our project's purpose denotes two routes that take essentially the same path (that part is in a separate query).