Monday, 27 January 2020

Spatial clustering with PostGIS including attributes



Spatial clustering with PostGIS including attributes


I have a points table with some attributes like : year, type, etc. And I'd like to do a points clustering in PostGIS. I followed the subject in Spatial clustering with PostGIS


But I have the following problems:


first problem without considering the attributes: I got a table with points and I can visualise it in Qgis but not in ArcMap and the result has no SRID (the original table has srid :4326) and I'd like the output with geom as point or polygon (not geometry collection); second problem: I'd like to do a clustering for every year (2014,2015, 2016) and every type (a, b, c) and get the result in one table but I do not know if it is possible and how to do it?


CREATE TABLE table_clusterd AS (
SELECT row_number() over () AS id,
ST_NumGeometries(gc) as radius,
gc AS geom_collection,
ST_Centroid(gc) AS centroid,
ST_MinimumBoundingCircle(gc) AS circle,

FROM (
SELECT unnest(ST_ClusterWithin(geom, 100)) gc
FROM myTable
)
f);

Answer



To answer your main question, to cluster by attributes, you simply use GROUP BY attribute. AS ST_ClusterWithin will operate on any geometry type, it returns a GeometryCollection. To only pull out points, use ST_CollectionExtract(unnest(ST_ClusterWithin(geom, dist)),1). Finally, you can call ST_SetSRID(geom, 4326) to convert back to 4326. Here is an example that clusters on attributes, a, b, c and years, 2014, 2015 and 2016.


WITH 
testvals (att, year, geom) AS
(SELECT

('[0:2]={a,b,c}'::text[])[trunc(random()*3)],
('[0:2]={2014, 2015, 2016}'::int[])[trunc(random()*3)],
ST_MakePoint(random()*100, random()*100)
FROM generate_series(1, 1000)),
clusters(att, year, geom) AS
(SELECT
att,
year,
ST_CollectionExtract(unnest(ST_ClusterWithin(geom, 50)), 1)
FROM testvals

GROUP BY att, year
ORDER BY att, year)
SELECT
row_number() over() AS id,
att,
year,
ST_Numgeometries(geom) as num_geoms,
ST_AsText(
ST_SetSRID(
ST_Centroid(

ST_MinimumBoundingCircle(geom))
, 4326)
)
FROM clusters;

To convert this in to a create table, simply remove the ST_AsText (that is for debugging) and the WITH clause and replace testvals with your own table name and SELECT from clusters.


This produces something like the following:


id | att | year | num_geoms | geom
----+-----+------+------------------+-----------------------------


1 | a | 2014 | 104 | POINT(47.8713526488699 47.5714471330078)



2 | a | 2015 | 133 | POINT(50.5456980698809 48.7293153893785)


3 | a | 2016 | 92 | POINT(51.6473020685052 48.6573579298721)


4 | b | 2014 | 95 | POINT(53.6228174459529 45.5578126914714)


etc. As I have used 1000 points spread over a 100 x 100 grid with the distance parameter of 50, there are 9 clusters produced, as you would expect, for each attribute, year combo. For real world data, you will likely get more clusters. Note, if there is only one point in the cluster, then ST_Centroid(ST_MinimumBoundingCircle(... will return EMTPY POINT, which you might need to handle.


This crazy looking construct,


('[0:2]={a,b,c}'::text[])[trunc(random()*3)]

is a quick way to randomly generate a series of a, b, and c. For more see this answer and generate_series is used like a loop to create x number of samples.


Depending on the distance parameter to ST_ClusterWithin, and the number of rows created by generate_series, the above demo will return anything from 9 rows (distance 100), ie, all combinations of a, b, c and 2014, 2015, 2016 to 100 rows (distance 1). Usage may vary.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...