Wednesday, 1 February 2017

clustering - Using GROUP BY linear spatial clusters with PostGIS


I have a series of points in a horizontal line, arranged into discrete spatial clusters by the underlying data pattern. Here's an example of what they look like:



enter image description here


How can I use PostGIS to group them spatially into the highlighted sets A, B and C?


This is a harder problem than it initially seemed to me. So far I've tried:




  • Using ST_X() to order them, then lag() to identify points between which there are gaps. Unfortunately this just groups the points into "gap" groups and "interior" groups.




  • Using convex hulls around arbitrary spatial thresholds. This both failed to separate all clusters and had the side effect of creating mid-cluster separations. (For future reference, this was a dumb idea.)





  • Using ST_GeoHash() to cluster them. This doesn't accurately catch the linear nature of the geometry.





Answer



I'm not at a computer that has access to PostGIS right now, but I feel as though this algorithm might work. Of course if you have vertical groups, you would need to use an exclusion or inclusion clause for ST_Y().


DECLARE @totalUnique int = 0
DECLARE @lastUnique int = 1

CREATE TABLE #TABLEX (ID1 int, ID2 int)

CREATE TABLE #TABLEX2 (ID1 int, ID2 int)

--Get distances of objects
INSERT INTO #TABLEX
SELECT ID1, ID2
FROM (
SELECT T1.ID AS ID1,
T2.ID AS ID2
FROM BaseTable AS T1
INNER JOIN

BaseTable AS T2
ON ST_Distance(T1.Shape, T2.Shape) <= SeparationDistance
) AS X

--Loop for as long as new connections can be made
WHILE(@lastUnique <> @totalUnique)
BEGIN
--Count the number of current connections
SELECT @lastUnique = COUNT(*)
FROM (

SELECT * FROM #TABLEX
GROUP BY ID1, ID2
) AS XX

--Look for new connections via current known paths
INSERT INTO #TABLEX (ID1, ID2)
SELECT A.ID1, B.ID2
FROM #TABLEX AS A
INNER JOIN
#TABLEX AS B

ON A.ID2 = B.ID1
AND
A.ID1 <> B.ID2

--Count the number of current connections
SELECT @totalUnique = COUNT(*)
FROM (
SELECT * FROM #TABLEX
GROUP BY ID1, ID2
) AS XX


--Group each path set by the lowest ID
INSERT INTO #TABLEX2(ID1, ID2)
SELECT MIN(ID1) AS theGroup, ID2
FROM #TABLEX
GROUP BY ID2

TRUNCATE TABLE #TABLEX

--Reload our new path sets

INSERT INTO #TABLEX (ID1, ID2)
SELECT ID1, ID2 FROM #TABLEX2

TRUNCATE TABLE #TABLEX2
END

--Show final results
SELECT ID1 AS theGroup, ID2
FROM #TABLEX


DROP TABLE #TABLEX
DROP TABLE #TABLEX2

No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...