I managed to extract raster values in PostGIS based on a table of point geometries by following Point sampling raster with PostGIS
SELECT id, ST_Value(rastertable.rast, 1, pointtable.geom) as RasterValue
FROM rastertable, pointtable
WHERE ST_Intersects(rastertable.rast, pointtable.geom)
LIMIT 20
The above query takes roughly 20 seconds, which is one second per case. In the end I want to extract raster values for over 10 million points. With the current query that would take roughly 115 days; not really an option. How could I adjust the above query to increase the performance? Or do I maybe even need to change the method?
Edit: Using QGIS point sampling plugin also results in a 3 second per 10 point speed, reducing the waiting time to about 40 days. That still is not workable.
Answer
I was facing the same issues as you, and simply going from 1000x1000 to 100x100 less to a speed up of several orders of magnitude. In order to tile your input raster use the -t switch in raster2pgsql, eg,
raster2pgsql -t 200x200 -I -s .......
My understanding is that ST_Value works as an offset into an array (2D or 3D, depending on bands) and that as each individual raster is being stored as a blob, you will need to open each one in order to find the pixel at x, y or that intersects with a geometry. So, it follows that smaller tiles lead to quicker lookups, especially as the index structure will be very evenly balanced boxes.
There is obviously a sweet spot for tile size, which will depend on which form of ST_Value you are using and the work load in question -- I would expect polygons to behave very differently to points. There are some good pointers in this blog post by Duncan Golicher, showing a continual decline in speed as tile size falls for point lookups and an initial decline with polygons, with a sweet spot at around 150, before increasing again -- which makes sense a priori.
Here are a couple of images from Duncan's post illustrating the point.
I found that occasionally I got an error from ST_Value about lookups being outside the raster bounds and eventually used ST_NearestValue instead, with the same performance characteristics.
No comments:
Post a Comment