Tuesday, 30 January 2018

distance - PostGIS nearest points with ST_Distance, kNN


I need to obtain on each element on one table the closest point of another table. The first table contains traffic signs and the second one the Entrance Halls of the town. The thing is that I can't use ST_ClosestPoint function and I have to use ST_Distance function and get the min(ST_distance) record but I am quite stuck building the query.


CREATE TABLE traffic_signs
(
id numeric(8,0) ),
"GEOMETRY" geometry,
CONSTRAINT traffic_signs_pkey PRIMARY KEY (id),
CONSTRAINT traffic_signs_id_key UNIQUE (id)
)

WITH (
OIDS=TRUE
);

CREATE TABLE entrance_halls
(
id numeric(8,0) ),
"GEOMETRY" geometry,
CONSTRAINT entrance_halls_pkey PRIMARY KEY (id),
CONSTRAINT entrance_halls_id_key UNIQUE (id)

)
WITH (
OIDS=TRUE
);

I need to obtain the id of the closest entrnce_hall of every traffic_sign.


My query so far:


SELECT senal.id,port.id,ST_Distance(port."GEOMETRY",senal."GEOMETRY")  as dist
FROM traffic_signs As senal, entrance_halls As port
ORDER BY senal.id,port.id,ST_Distance(port."GEOMETRY",senal."GEOMETRY")


With this I am getting the distance from every traffic_sign to every entrance_hall. But how can I get only the minimun distance?


Regards,



Answer



You are nearly there. There is a little trick which is to use Postgres's distinct operator, which will return the first match of each combination -- as you are ordering by ST_Distance, effectively it will return the closest point from each senal to each port.


SELECT 
DISTINCT ON (senal.id) senal.id, port.id, ST_Distance(port."GEOMETRY", senal."GEOMETRY") as dist
FROM traffic_signs As senal, entrance_halls As port
ORDER BY senal.id, port.id, ST_Distance(port."GEOMETRY", senal."GEOMETRY");


If you know that the minimum distance in each case is no more than some amount x, (and you have a spatial index on your tables), you can speed this up by putting a WHERE ST_DWithin(port."GEOMETRY", senal."GEOMETRY", distance), eg, if all the minumum distances are known to be no more than 10km, then:


SELECT 
DISTINCT ON (senal.id) senal.id, port.id, ST_Distance(port."GEOMETRY", senal."GEOMETRY") as dist
FROM traffic_signs As senal, entrance_halls As port
WHERE ST_DWithin(port."GEOMETRY", senal."GEOMETRY", 10000)
ORDER BY senal.id, port.id, ST_Distance(port."GEOMETRY", senal."GEOMETRY");

Obviously, this needs to be used with caution, as if the minimum distance is greater, you will simply get no row for that combination of senal and port.


Note: The order by order must match the distinct on order, which makes sense, as distinct is taking the first distinct group based on some ordering.


It is assumed that you have a spatial index on both tables.



EDIT 1. There is another option, which is to use Postgres's <-> and <#> operators, (center point and bounding box distance calculations, respectively) which make more efficient use of the spatial index and don't require the ST_DWithin hack to avoid n^2 comparisons. There is a good blog article explaining how they work. The general thing to note is that these two operators work in the ORDER BY clause.


SELECT senal.id, 
(SELECT port.id
FROM entrance_halls as port
ORDER BY senal.geom <#> port.geom LIMIT 1)
FROM traffic_signs as senal;

EDIT 2. As this question has received a lot of attention and k-nearest neighbours (kNN) is generally a hard problem (in terms of algorithmic run-time) in GIS, it seems worthwhile to expand somewhat on the original scope of this question.


The standard way for find the x nearest neighbours of one object is to use a LATERAL JOIN (conceptually similar to a for each loop). Borrowing shamelessly from dbaston's answer, you would do something like:


SELECT

signs.id,
closest_port.id,
closest_port.dist
FROM traffic_signs
CROSS JOIN LATERAL
(SELECT
id,
ST_Distance(ports.geom, signs.geom) as dist
FROM ports
ORDER BY signs.geom <-> ports.geom

LIMIT 1
) AS closest_port

So, if you want to find the nearest 10 ports, ordered by distance, you simply have to change the LIMIT clause in the lateral sub-query. This is much harder to do without LATERAL JOINS and involves using ARRAY type logic. While this approach works well, it can be sped up enormously if you know you only have to search out to a given distance. In this instance, you can use ST_DWithin(signs.geom, ports.geom, 1000) in the subquery, which because of the way indexing works with the <-> operator -- one of the geometries should be a constant, rather than a column reference -- may be much faster. So, for example, to get the 3 nearest ports, within 10km, you could write something like the following.


 SELECT
signs.id,
closest_port.id,
closest_port.dist
FROM traffic_signs
CROSS JOIN LATERAL

(SELECT
id,
ST_Distance(ports.geom, signs.geom) as dist
FROM ports
WHERE ST_DWithin(ports.geom, signs.geom, 10000)
ORDER BY ST_Distance(ports.geom, signs.geom)
LIMIT 3
) AS closest_port;

As always, usage will vary depending on your data distribution and queries, so EXPLAIN is your best friend.



Finally, there is a minor gotcha, if using LEFT instead of CROSS JOIN LATERAL in that you have to add ON TRUE after the lateral queries alias, eg,


SELECT
signs.id,
closest_port.id,
closest_port.dist
FROM traffic_signs
LEFT JOIN LATERAL
(SELECT
id,
ST_Distance(ports.geom, signs.geom) as dist

FROM ports
ORDER BY signs.geom <-> ports.geom
LIMIT 1
) AS closest_port
ON TRUE;

No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...