Tuesday, 17 July 2018

postgresql - Performance in calculating raster statistics in PostGIS


I'm trying to calculate raster statistics (min, max, mean) for each polygon in a vector layer using PostgreSQL/PostGIS.


This GIS.SE answer describes how to do this, by calculating the intersection between the polygon and the raster and then calculating a weighted average: https://gis.stackexchange.com/a/19858/12420


I'm using the following query (where dem is my raster, topo_area_su_region is my vector, and toid is a unique ID:


SELECT toid, Min((gv).val) As MinElevation, Max((gv).val) As MaxElevation, Sum(ST_Area((gv).geom) * (gv).val) / Sum(ST_Area((gv).geom)) as MeanElevation FROM (SELECT toid, ST_Intersection(rast, geom) AS gv FROM topo_area_su_region,dem WHERE ST_Intersects(rast, geom)) foo GROUP BY toid ORDER BY toid;

This works, but it's too slow. My vector layer has 2489k features, with each one taking around 90ms to process - it would take days to process the entire layer. The speed of the calculation doesn't seem to be significantly improved if I only calculate the min and max (which avoids the calls to ST_Area).


If I do a similar calculation using Python (GDAL, NumPy and PIL) I can significantly reduce the amount of time it takes to process the data, if instead of vectorizing the raster (using ST_Intersection) I rasterize the vector. See code here: https://gist.github.com/snorfalorpagus/7320167



I don't really need a weighted average - a "if it touches, it's in" approach is good enough - and I'm reasonably sure this is what is slowing things down.


Question: Is there any way to get PostGIS to behave like this? i.e. to return the values of all the cells from the raster that a polygon touches, rather than the exact intersection.


I'm very new to PostgreSQL/PostGIS, so maybe there is something else I'm not doing right. I'm running PostgreSQL 9.3.1 and PostGIS 2.1 on Windows 7 (2.9GHz i7, 8GB RAM) and have tweaked the database config as suggested here: http://postgis.net/workshops/postgis-intro/tuning.html


enter image description here



Answer



You're right, using ST_Intersection slows down your query noticeable.


Instead of using ST_Intersection it is better to clip (ST_Clip) your raster with the polygons (your fields) and dump the result as polygons (ST_DumpAsPolygons). So every raster cell will be converted into a little polygon rectangle with distinct values.


For receiving min, max or mean from the dumps you can use the same statements.


This query should do the trick:


SELECT 

toid,
Min((gv).val) As MinElevation,
Max((gv).val) As MaxElevation,
Sum(ST_Area((gv).geom) * (gv).val) / Sum(ST_Area((gv).geom)) as MeanElevation
FROM (
SELECT
toid,
ST_DumpAsPolygons(ST_Clip(rast, 1, geom, true)) AS gv
FROM topo_area_su_region,dem
WHERE ST_Intersects(rast, geom)) AS foo

GROUP BY toid
ORDER BY toid;

In the statement ST_Clip you define the raster, the raster band (=1), the polygon and if the crop should be TRUE or FALSE.


Besides you can use avg((gv).val) to calculate the mean value.


EDIT


The result of your approach is the more exact, but the slower one. The results of the combination of ST_Clip and ST_DumpAsPolygons are ignoring the raster cells that are intersecting with less than 50% (or 51%) of their size.


These two screen shots from a CORINE Land Use intersection show the difference. First picture with ST_Intersection, second one with ST_Clip and ST_DumpAsPolygons.


enter image description here


enter image description here



No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...