I need to obtain on each element on one table the closest point of another table. The first table contains traffic signs and the second one the Entrance Halls of the town.
The thing is that I can't use ST_ClosestPoint function and I have to use ST_Distance function and get the min(ST_distance) record but I am quite stuck building the query.

I need to obtain the id of the closest entrnce_hall of every traffic_sign.

My query so far:

SELECT senal.id,port.id,ST_Distance(port."GEOMETRY",senal."GEOMETRY") as dist
FROM traffic_signs As senal, entrance_halls As port
ORDER BY senal.id,port.id,ST_Distance(port."GEOMETRY",senal."GEOMETRY")

With this I am getting the distance from every traffic_sign to every entrance_hall. But how can I get only the minimun distance?

4 Answers
4

You are nearly there. There is a little trick which is to use Postgres's distinct operator, which will return the first match of each combination -- as you are ordering by ST_Distance, effectively it will return the closest point from each senal to each port.

SELECT
DISTINCT ON (senal.id) senal.id, port.id, ST_Distance(port."GEOMETRY", senal."GEOMETRY") as dist
FROM traffic_signs As senal, entrance_halls As port
ORDER BY senal.id, port.id, ST_Distance(port."GEOMETRY", senal."GEOMETRY");

If you know that the minimum distance in each case is no more than some amount x, (and you have a spatial index on your tables), you can speed this up by putting a WHERE ST_DWithin(port."GEOMETRY", senal."GEOMETRY", distance), eg, if all the minumum distances are known to be no more than 10km, then:

Obviously, this needs to be used with caution, as if the minimum distance is greater, you will simply get no row for that combination of senal and port.

Note: The order by order must match the distinct on order, which makes sense, as distinct is taking the first distinct group based on some ordering.

It is assumed that you have a spatial index on both tables.

EDIT 1. There is another option, which is to use Postgres's <-> and <#> operators, (center point and bounding box distance calculations, respectively) which make more efficient use of the spatial index and don't require the ST_DWithin hack to avoid n^2 comparisons. There is a good blog article explaining how they work. The general thing to note is that these two operators work in the ORDER BY clause.

SELECT senal.id,
(SELECT port.id
FROM entrance_halls as port
ORDER BY senal.geom <#> port.geom LIMIT 1)
FROM traffic_signs as senal;

EDIT 2. As this question has received a lot of attention and k-nearest neighbours (kNN) is generally a hard problem (in terms of algorithmic run-time) in GIS, it seems worthwhile to expand somewhat on the original scope of this question.

The standard way for find the x nearest neighbours of one object is to use a LATERAL JOIN (conceptually similar to a for each loop). Borrowing shamelessly from dbaston's answer, you would do something like:

So, if you want to find the nearest 10 ports, ordered by distance, you simply have to change the LIMIT clause in the lateral sub-query. This is much harder to do without LATERAL JOINS and involves using ARRAY type logic.
While this approach works well, it can be sped up enormously if you know you only have to search out to a given distance. In this instance, you can use ST_DWithin(signs.geom, ports.geom, 1000) in the subquery, which because of the way indexing works with the <-> operator -- one of the geometries should be a constant, rather than a column reference -- may be much faster. So, for example, to get the 3 nearest ports, within 10km, you could write something like the following.

It should be noted that this won't perform well with big ammounts of data.
– Jakub KaniaFeb 23 '15 at 14:08

@JakubKania. It depends on whether you can use ST_DWithin or not. But, yes, point taken. Unfortunately, the Order by <->/<#> operator requires one of the geometries to be a constant, no?
– John PowellFeb 23 '15 at 14:09

@JohnPowellakaBarça any chance you know where that blog post lives nowadays? - or, a similar explanation of the <-> and <#> operators? Thanks!!
– DPSSpatialJul 3 '17 at 15:20

@DPSSpatial, that's annoying. I don't, but there is this and this which talk a bit about this approach. The 2nd one, using lateral joins too, which is another interesting enhancement.
– John PowellJul 3 '17 at 16:39

@DPSSpatial. It is all a bit slippery this <->, <#> and lateral join stuff. I have done this with very large datasets and performance has been horrible, without using ST_DWithin, which all of this is supposed to avoid. Ultimately, knn is a compllicated problem, so usage may vary. Good luck :-)
– John PowellJul 3 '17 at 16:51

Cool use of a lateral join. Hadn't seen that before in this context.
– John PowellFeb 23 '15 at 14:33

1

@JohnBarça It's one of the best contexts I've seen. I also suspect it would be helpful when you really need to use ST_DISTANCE() to find nearest polygon and cross join is causing the server to run out of memory. The nearest polygon query is still unsolved AFAIK.
– Jakub KaniaFeb 23 '15 at 14:54

The correct one looks like this (i used points and lines): SELECT DISTINCT ON (points.id) points.id, lines.id, ST_Distance(lines.geom, points.geom) as dist FROM development.passed_entries As points, development."de_muc_rawSections_cleaned" As lines ORDER BY points.id, ST_Distance(lines.geom, points.geom),lines.id;
– blackgisJan 11 '18 at 12:30

1

OK, I get you now. It is actually probably better to use the LATERAL JOIN approach, as in @dbaston's answer, which makes it clear what thing is being compared to what other thing in terms of closeness. I don't use the approach above any more.
– John PowellJan 11 '18 at 17:03