clustering - Using GROUP BY linear spatial clusters with PostGIS

Wednesday, 1 February 2017

clustering - Using GROUP BY linear spatial clusters with PostGIS

I have a series of points in a horizontal line, arranged into discrete spatial clusters by the underlying data pattern. Here's an example of what they look like:

enter image description here

How can I use PostGIS to group them spatially into the highlighted sets A, B and C?

This is a harder problem than it initially seemed to me. So far I've tried:

Using ST_X() to order them, then lag() to identify points between which there are gaps. Unfortunately this just groups the points into "gap" groups and "interior" groups.

Using convex hulls around arbitrary spatial thresholds. This both failed to separate all clusters and had the side effect of creating mid-cluster separations. (For future reference, this was a dumb idea.)

Using ST_GeoHash() to cluster them. This doesn't accurately catch the linear nature of the geometry.

Answer

I'm not at a computer that has access to PostGIS right now, but I feel as though this algorithm might work. Of course if you have vertical groups, you would need to use an exclusion or inclusion clause for ST_Y().

DECLARE @totalUnique int = 0
DECLARE @lastUnique int = 1

CREATE TABLE #TABLEX (ID1 int, ID2 int)

CREATE TABLE #TABLEX2 (ID1 int, ID2 int)

--Get distances of objects
INSERT INTO #TABLEX
SELECT ID1, ID2
FROM   (
         SELECT T1.ID AS ID1, 
                T2.ID AS ID2
         FROM   BaseTable AS T1
                INNER JOIN

                BaseTable AS T2
                    ON  ST_Distance(T1.Shape, T2.Shape) <= SeparationDistance
       ) AS X

--Loop for as long as new connections can be made
WHILE(@lastUnique <> @totalUnique)
BEGIN
  --Count the number of current connections
  SELECT @lastUnique = COUNT(*)
  FROM (

         SELECT * FROM #TABLEX
         GROUP BY ID1, ID2
       ) AS XX 

  --Look for new connections via current known paths
  INSERT INTO #TABLEX (ID1, ID2)
  SELECT A.ID1, B.ID2
  FROM #TABLEX AS A
       INNER JOIN
       #TABLEX AS B

       ON A.ID2 = B.ID1
          AND
          A.ID1 <> B.ID2

  --Count the number of current connections   
  SELECT @totalUnique = COUNT(*)
  FROM (
         SELECT * FROM #TABLEX
         GROUP BY ID1, ID2
       ) AS XX


  --Group each path set by the lowest ID
  INSERT INTO #TABLEX2(ID1, ID2)
  SELECT MIN(ID1) AS theGroup, ID2
  FROM   #TABLEX
  GROUP BY ID2

  TRUNCATE TABLE #TABLEX

  --Reload our new path sets

  INSERT INTO #TABLEX (ID1, ID2)
  SELECT ID1, ID2 FROM #TABLEX2

  TRUNCATE TABLE #TABLEX2
END

--Show final results        
SELECT ID1 AS theGroup, ID2
FROM   #TABLEX


DROP TABLE #TABLEX
DROP TABLE #TABLEX2

Blog

Wednesday, 1 February 2017

clustering - Using GROUP BY linear spatial clusters with PostGIS

No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?