Tuesday 24 April 2018

postgis - How to UPDATE with LATERAL Nearest-Neighbour query?


Say I have table gps like:


CREATE TABLE gps(
gps_id serial primary key,
measured_timestamp timestamp,

geom Geometry(Point,4326),
stop_id int --Null
);

And I have a table of stops like:


CREATE TABLE stops(
stop_id serial primary key,
geom Geometry(Point,4326)
);


If I want to do an UPDATE on gps to find the nearest stop to each point, is there a way to use a LATERAL query?


I tried something like


UPDATE gps
SET stop_id = nearest.stop_id
FROM LATERAL(SELECT stop_id FROM stops ORDER BY stops.geom <-> gps.geom LIMIT 1) nearest

but that told me


ERROR:  invalid reference to FROM-clause entry for table "gps"
^
HINT: There is an entry for table "gps", but it cannot be referenced from this part of the query.


So is the only way to do?


UPDATE gps
SET stop_id = nearest.stop_id
FROM (SELECT gps.gps_id, stop_id
FROM gps
LATERAL JOIN (SELECT stop_id FROM stops ORDER BY stops.geom <-> gps.geom LIMIT 1) stops) nearest
WHERE nearest.gps_id = gps.gps_id

This feels like joining the same table to itself, which wouldn't need to happen with a SELECT INTO




Answer



No need for JOIN LATERAL (or do you really just want to use it?); an UPDATE will pass each processing row to the following query, which is the same concept as using a JOIN LATERAL.[*]

Try


UPDATE gps
SET stop_id = (
SELECT stops.stop_id
FROM stops
ORDER BY gps.geom <-> stops.geom
LIMIT 1
);



You are a very experienced PostGIS user; still, let me add some notes on distances and the KNN operator ,)

For better precision, consider casting to geography; the <-> operator then measures on a sphere, while ST_Distance uses the actual spheroid. In my experience, for points the <-> operator tends to perform only slightly faster than with plain old ST_Distance with geography type and a limit condition (check if the index scan actually kicks in with <->; it should consider the passed in geometry as a constant, but sometimes it doesn't for me).

If the tables are large, you can add a WHERE ST_DWithin(gps.geom, stops.geom, ) (or, if the planner denies an index only scan, use ST_Expand(gps.geom, ) && stops.geom; for point on point KNN and geometry type, this is ultimatively fast) to only compare those stops in each gps points' vicinity (note that the distance given uses the CRS units (i.e. degrees for EPSG:4326) for geometry, but meter for geography).

[*] Just to give an example on that; consider a SELECT instead to find the closest stop to each gps point using JOIN LATERAL:


SELECT a.gps_id,
a.measured_timestamp,
a.geom,
b.stop_id
FROM gps AS a
JOIN LATERAL ( -- you can use 'CROSS JOIN LATERAL' without 'ON true',
SELECT stops.stop_id -- but I get slightly faster results this way
FROM stops

ORDER BY a.geom <-> stops.geom
LIMIT 1
) AS b
ON true;

Each row in gps is now passed individually and subsequentially to the JOIN LATERAL sub-query to be processed; this (sort of) mimicks the UPDATE command (note how it is the same sub-query).


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...