Monday 31 December 2018

postgresql 10 - Speed up point sampling with ST_Value function in PostGIS, raster/vector overlay


I managed to extract raster values in PostGIS based on a table of point geometries by following Point sampling raster with PostGIS


SELECT id, ST_Value(rastertable.rast, 1, pointtable.geom) as RasterValue

FROM rastertable, pointtable
WHERE ST_Intersects(rastertable.rast, pointtable.geom)
LIMIT 20

The above query takes roughly 20 seconds, which is one second per case. In the end I want to extract raster values for over 10 million points. With the current query that would take roughly 115 days; not really an option. How could I adjust the above query to increase the performance? Or do I maybe even need to change the method?


Edit: Using QGIS point sampling plugin also results in a 3 second per 10 point speed, reducing the waiting time to about 40 days. That still is not workable.


Version info: enter image description here



Answer



I was facing the same issues as you, and simply going from 1000x1000 to 100x100 less to a speed up of several orders of magnitude. In order to tile your input raster use the -t switch in raster2pgsql, eg,


raster2pgsql -t 200x200 -I -s .......


My understanding is that ST_Value works as an offset into an array (2D or 3D, depending on bands) and that as each individual raster is being stored as a blob, you will need to open each one in order to find the pixel at x, y or that intersects with a geometry. So, it follows that smaller tiles lead to quicker lookups, especially as the index structure will be very evenly balanced boxes.


There is obviously a sweet spot for tile size, which will depend on which form of ST_Value you are using and the work load in question -- I would expect polygons to behave very differently to points. There are some good pointers in this blog post by Duncan Golicher, showing a continual decline in speed as tile size falls for point lookups and an initial decline with polygons, with a sweet spot at around 150, before increasing again -- which makes sense a priori.


Here are a couple of images from Duncan's post illustrating the point.


enter image description here


enter image description here


I found that occasionally I got an error from ST_Value about lookups being outside the raster bounds and eventually used ST_NearestValue instead, with the same performance characteristics.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...