Thursday 20 October 2016

postgis - How to efficiently find the closest point over the dateline?


I have a PostgreSQL 9.1 table with hundreds of thousands of PostGIS POINTs. For each of these I'd like to find the closest point in another table of POINTs. The points in the second table represent a grid over the whole world, so I know there is always going to be a match within 1 degree. This is the query I'm using so far, which makes use of GIST indexes, so it's reasonably fast (about 30 seconds total).


SELECT DISTINCT ON (p.id)
p.id, ST_AsText(p.pos)
, ST_AsText(first_value(g.location) OVER (PARTITION BY p.id ORDER BY ST_Distance(p.pos, g.location::geography)))
FROM point p
JOIN grid g ON ST_DWithin(p.pos::geometry, g.location, 1)


The only problem is the dateline. The grid points only have latitude 180, not -180. When using the geometry version of ST_Distance this does not return points on the other side of the dateline. Eg. if p.pos is POINT(-179.88056 -16.68833) the nearest grid point might be POINT(180 -16.25), but the above query doesn't return it. What's the best way to fix this?


I don't really want to have two coordinates for a single grid point (-180 and +180). I tried adding in my own function which checks for this specific case, but then the query doesn't return in 5 minutes, probably because it can no longer use the index. I also tried using the geography version of ST_DWithin and that query also didn't return after 5 minutes.



Answer



OK, I finally figure out a way to hack it that not only works around the dateline issue, but is also faster.


CREATE OR REPLACE FUNCTION nearest_grid_point(point geography(Point))
RETURNS integer
AS $BODY$
SELECT pointid
FROM
(

-- The normal case
SELECT pointid, location
FROM grid
WHERE ST_DWithin($1::geometry, location, 1)

UNION ALL

-- The dateline hack
SELECT pointid, location
FROM grid

WHERE (ST_X($1::geometry) < -178.75 AND longitude = 180)
) sub
ORDER BY ST_Distance($1, location::geography)
LIMIT 1;
$BODY$ LANGUAGE SQL STABLE;

SELECT p.id, ST_AsText(p.pos), g.pointid, ST_AsText(g.location)
FROM point p
JOIN grid g ON nearest_grid_point(p.pos) = g.pointid


I was very surprised to see that this function, which is called for every row, is faster than the original window function, but it is - over 10 times faster. PostgreSQL performance really is a black art!


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...