Calculate the Distance Between Two Latitude/Longitude Points using Plain SQL

13062011

June 13, 2011 (Modified June 14, 2011)

A question recently appeared on the comp.databases.oracle.server Usenet group that oddly made me recall a lesson from a mathematics class that I taught in the early 1990s. A bit strange how a question related to Oracle Database would trigger such a memory, but it happened. The question posed in the thread initially asked how to calculate the distance between the points:

So, if the earth is flat, the Pythagorean Theorem seems to give a sensible answer… maybe. My recollection of the formulas used for the correct calculation is a bit fuzzy. Good news, I found the three page lesson plan that I put together in the early 1990s to solve a problem similar to the one posed by the OP in the Usenet thread. The lesson plan begins by stating that part of the point of the lesson was to destroy Euclid’s Elements, which were introduced 23 centuries ago… a bit ambitious, looking back. Even though the lesson plan is not too terribly clear given the amount of mathematics knowledge that I have forgotten, I managed to find the following formula in the lesson plan:

That looks complicated, even without the help of a computer… what is alpha, and what is theta – it’s Greek to me?

A Google search found another helpful page that describes how to calculate distances between pairs of latitude and longitude coordinates. Included in that page is the following formula that is compatible with Microsoft Excel:

=ACOS(SIN(lat1)*SIN(lat2)+COS(lat1)*COS(lat2)*COS(lon2-lon1))*6371 KM

Other than outputting kilometers rather than miles, and using SIN where my formula uses COS, the two formulas are close to being identical. Based on that observation, alpha must be the latitude coordinate and theta must be the longitude coordinate in the formula that I found in my lesson plan.

Note in the above that I had to convert the degrees and minutes notation to decimal numbers and then also to equivalent radian values. It would probably be best just to store the radian values, because those values can be used directly in the calculations. Let’s take a look at the contents of this table:

The values returned are identical to the value I wrote into the lesson plan, so the formulas are probably correct.

The OP’s second question is to identify all locations that are within a given radius of a specified location. This might be easy, or it might not. Someone in the Usenet thread suggested using Oracle Spatial, which is an additional cost licensed item that may be added to the Enterprise Edition for a list price of $17,500 per CPU plus $3,850 (USD) annual maintenance. As I am not familiar with the product, I do not know if Oracle Spatial could answer this question (does someone know for certain?).

We need some more test data to see if we can find a solution for the OP. We will use the SIN and COS functions just to insert some “random” yet repeatable data:

If you tried either of the above two SQL statements, you probably found that it took a couple of seconds to make it through the 10,004 or so rows in the table. All of the calculations are likely hammering the server’s CPU (go parallel and hammer more than one CPU?).

Maybe we should try submitting bind variables in place of some of the calculations that we are requesting for the server to perform? Let’s set up the bind variables to prepare another test:

Still 13 rows returned, but the distances are a bit off due to the curvature of the earth. What happened to the other 17 rows… did I make a mistake, or is the world not flat?

Note that the original version of this article made a critical mistake – latitude values cannot exceed 90 degrees. While trying to implement my suggestion in a comment to use +-0.0002525254004999719 radians to speed up the query (note, still need to take into account the wrap-around for longitude values) I discovered inconsistent results. This article is corrected to fix that mistake.

I just had another thought regarding how to further decrease the CPU requirements for this SQL statement, and thus to speed up the SQL statement… the mathematics logic is starting to come back to me. The latitude values must be in the range of +- the radian value that is associated with the distance along the surface of the earth. The same is true for the longitude values.

With that logic in mind, it would be possible to reduce the number of rows leaving the inline view, and thus entering the WHERE clause of the main SQL statement. If the radius of the earth is roughly 3960 miles, the circumference is 3960 * 2 * 3.14159275 = 24881 miles. 1 degree = 69.115 miles, 1 mile = 0.0144686 degrees, 1 mile = 2.525254004999719e-4 radians. Thus, for each mile of the radius, the search range beyond the central latitude/longitude pair would be extended by +-2.525254004999719e-4 radians to pre-filter the majority of the rows that would exit the inline view and enter the WHERE clause of the main portion of the query.

sdo_distance is one of the SDO_GEOM permitted under Locator (or as I prefer to call it sub-spatial). [Why they can’t put the permitted procedures in a separate package from the non-permitted ones is a question only the Auditors can answer.]

I don’t recall comparing the long equation mechanism with Locator/Spatial, but I did switch to the ‘flat earth’ calculation because it was a lot less cpu-intensive that SDO_GEOM and the error wasn’t significant for my purposes.

Interesting analyses and optimizations, including the care folks stated about results being within requirements. If you need to be extremely precise though, remember that the earth is neither flat nor an ideal sphere.

It probably would be a good idea to point out that the calculations do not determine driving distances between locations, nor do the calculations exactly determine flying distances (if a plane is a couple miles up in the air, that effectively makes the earth’s radius a couple of miles larger from the perspective of the plane. I now have a little more respect for the GPS manufacturers whose products attempt to determine the final destination arrival time.

Hi, I just stumbled upon this, in an effort to figure distances between radio towers for which I have the Lat/Lon coordinates.

I typed your first ‘flat Earth’ formula into Excel and it returned a figure very close to what I was expecting (tho’ a bit larger than the county’s GIS mapping system’s ‘digitize distance’ function provides). I’m not worried at this point about absolute precision.

BUT….then I typed into Excel the kilometer formula that started with acos — and I got blown out of the water. The 2.9 miles of the first equation swelled to -I forget, some couple thousand km’s. So, is there a decimal point missing somewhere in there?

I probably won’t return to learn your answer, but wanted to point out the ‘mis-spelling’. (I checked the formula I had typed and it perfectly reflected yours.)

Any way, thanks for the interesting post! I also came out with lots of respect for the challenges GPS software (and users) must cope!

I am not sure what might have happened with your formula in Excel. In this article I converted the Excel formula to work in a SQL statement, and that formula appeared to be producing the expected results. A couple of things to keep in mind:
* North and West positions should be entered as negative values
* Two negatives make a positive ( -p1 – -p2 ) = ( -p1 + p2)
* Accidentally reversing the longitude and latitude values will cause the calculations to be significantly off
* Excel, SQL, and various programming languages expect SIN and COS values in radians, so you need to convert the longitude and latitude values to their radian equivalents before plugging the values into the formula in Excel. This article describes how to perform the conversion:

=SIN(degree / 180 * 3.141592)

Once the conversion to radians is performed, you should be able to obtain the correct answer. The following is from the SQL statement:

Hints for Posting Code Sections in Comments

********************
When the spacing of text in a comment section is important for readability (execution plans, PL/SQL blocks, SQL, SQL*Plus output, etc.) please use a <pre> tag before the code section and a </pre> tag after the code section:

<pre>

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;
TODAY
---------
01-MAR-12

</pre>
********************
When posting test case samples, it is much easier for people to reproduce the test case when the SQL*Plus line prefixes are not included - if possible, please remove those line prefixes. This:

SELECT
SYSDATE TODAY
FROM
DUAL;

Is easier to execute in a test case script than this:

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;

********************
Greater than and Less than signs in code sections are often interpretted as HTML formatting commands. Please replace these characters in the code sections with the HTML equivalents for these characters: