I'm trying to calculate raster statistics (min, max, mean) for each polygon in a vector layer using PostgreSQL/PostGIS.
This GIS.SE answer describes how to do this, by calculating the intersection between the polygon and the raster and then calculating a weighted average: https://gis.stackexchange.com/a/19858/12420
I'm using the following query (where dem
is my raster, topo_area_su_region
is my vector, and toid
is a unique ID:
SELECT toid, Min((gv).val) As MinElevation, Max((gv).val) As MaxElevation, Sum(ST_Area((gv).geom) * (gv).val) / Sum(ST_Area((gv).geom)) as MeanElevation FROM (SELECT toid, ST_Intersection(rast, geom) AS gv FROM topo_area_su_region,dem WHERE ST_Intersects(rast, geom)) foo GROUP BY toid ORDER BY toid;
This works, but it's too slow. My vector layer has 2489k features, with each one taking around 90ms to process - it would take days to process the entire layer. The speed of the calculation doesn't seem to be significantly improved if I only calculate the min and max (which avoids the calls to ST_Area).
If I do a similar calculation using Python (GDAL, NumPy and PIL) I can significantly reduce the amount of time it takes to process the data, if instead of vectorizing the raster (using ST_Intersection) I rasterize the vector. See code here: https://gist.github.com/snorfalorpagus/7320167
I don't really need a weighted average - a "if it touches, it's in" approach is good enough - and I'm reasonably sure this is what is slowing things down.
Question: Is there any way to get PostGIS to behave like this? i.e. to return the values of all the cells from the raster that a polygon touches, rather than the exact intersection.
I'm very new to PostgreSQL/PostGIS, so maybe there is something else I'm not doing right. I'm running PostgreSQL 9.3.1 and PostGIS 2.1 on Windows 7 (2.9GHz i7, 8GB RAM) and have tweaked the database config as suggested here: http://postgis.net/workshops/postgis-intro/tuning.html
Answer
You're right, using ST_Intersection
slows down your query noticeable.
Instead of using ST_Intersection
it is better to clip (ST_Clip
) your raster with the polygons (your fields) and dump the result as polygons (ST_DumpAsPolygons
). So every raster cell will be converted into a little polygon rectangle with distinct values.
For receiving min, max or mean from the dumps you can use the same statements.
This query should do the trick:
SELECT
toid,
Min((gv).val) As MinElevation,
Max((gv).val) As MaxElevation,
Sum(ST_Area((gv).geom) * (gv).val) / Sum(ST_Area((gv).geom)) as MeanElevation
FROM (
SELECT
toid,
ST_DumpAsPolygons(ST_Clip(rast, 1, geom, true)) AS gv
FROM topo_area_su_region,dem
WHERE ST_Intersects(rast, geom)) AS foo
GROUP BY toid
ORDER BY toid;
In the statement ST_Clip
you define the raster, the raster band (=1), the polygon and if the crop should be TRUE or FALSE.
Besides you can use avg((gv).val)
to calculate the mean value.
EDIT
The result of your approach is the more exact, but the slower one. The results of the combination of ST_Clip
and ST_DumpAsPolygons
are ignoring the raster cells that are intersecting with less than 50% (or 51%) of their size.
These two screen shots from a CORINE Land Use intersection show the difference. First picture with ST_Intersection
, second one with ST_Clip
and ST_DumpAsPolygons
.
No comments:
Post a Comment