Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

I have a massive table in SQL Server 2008, it contains the position reported by technicians every minute. I need to report on this table but in order to control the amount of records that are displayed in the report both a time and distance separation factors need to be taken into account.

So, a query may look like : "Return all records with no less than 5 minutes and/or 300 feet between them".

The time part is done, but I'm having a hard time with the distance factor. I have the latitude and longitude for each point, and I have no problem if I need to include a SQL Server 2008 spatial UDT in order to resolve the problem.

Things I have considered:

Bring the records by the time factor, and apply the separation constrain in the client by calculating the distance between adjacent points and discard those which falls inside the the factor. (the easiest, but it must be the one consuming more resources).

Keep the last record per technician in a cache, pre-calculate the distance between the record and its predecessor, and resolve the constrain in the client. (should consume less resources than 1) since the distance is pre-calculated, however and since the table is BIG It will increase the size of the dataset, not sure if the space is worth the processing savings).

Use the spatial functions in SQL Server 2008, but honestly I had been reading and I couldn't find anything that helps me resolve this type of requirement. Any GIS expert??

I would like to go with the best option possible (maybe not listed above?) and IMO should be the one using the SQL Server features most efficiently.

1 Answer
1

Prior to SQL 2008, the most common solution was to use a UDF to calculate the great-circle distance between two points on a sphere. The Haversine formula is probably the most commonly used method.

Of course the Earth is not actually a perfect sphere, but this was considered "good enough" for most uses.

In SQL 2008, as you anticipated, such calculations are simplified and made more accurate by the introduction of the Geography and Geometry data types. Here's a brief sample of how you can use them to simplify distance calculations.

The SRID is the key to the improved accuracy. The WGS 84 specification to which it refers includes a standardized coordinate system and a reference ellipsoid. In other words, it accounts for the non-spherical nature of the Earth, giving better results than a pure spherical Great Circle calculation.

If GIS accuracy is important to your work, this is the simplest way to implement it in SQL 2008.

Jonathan, while this is great to calculate the distance between 2 points my problem actually is how to (given a set of points) extract only those that are above certain distance threshold, or (maybe the same) consider them a cluster so X points near each other below the threshold can be considered a single point. <br /><br /> Ideally I want to do this w/o having to iterate through the entire collection; and I was hoping to find a spatial function that does that for me.
–
Raciel R.Dec 14 '11 at 14:36

Finding points within a threshold is quite simple. I added an additional line to the code block to demonstrate how you can test for the intersection of a point with a buffer around another point. If you need to do nearest-neighbor clustering that is also feasible, but more complex and more computationally expensive.
–
Jonathan Van MatreDec 14 '11 at 19:17

Jonathan, sorry I'm not a GIS guy so I'm not 100% familiar with the terms.
–
Raciel R.Dec 14 '11 at 19:56

Let's say you started walking and turn on an app that reports your location every minute. Let's say that at certain speed you are reporting your position every 100 feet, but for the type of analysis I need to do at the end of the day, I'm just interested in your position every 500 feet. If you move at a constant speed (lets assume so for simplicity) that means I need to report every 4 points.
–
Raciel R.Dec 14 '11 at 20:02

So, the whole deal is not about 2 points, but about how to return the number of records given the threshold fir the whole set w/o having to loop through all records at least not in the client or by using a SQL cursor.
–
Raciel R.Dec 14 '11 at 20:04