postgis - Problems with ST_ClusterDBSCAN cluster sizes

Sunday, 7 August 2016

postgis - Problems with ST_ClusterDBSCAN cluster sizes

I want to cluster all villages (and towns, ...) around the world. So instead of having millions of them, I'd like to reduce them by combining villages that are close enough (like 10km or so) to each other.

So I was looking into ST_ClusterDBSCAN and it's doing quite a good job. This is how I made my table:

INSERT INTO villages_clustered
  SELECT
    name,
    way,

    ST_ClusterDBSCAN(way, eps := 2000, minpoints := 1) over () AS cluster_id
  FROM villages;

Now when I look at my data, I get lots of small clusters which is what I wanted. But there are also some, which are just suuuuper large that should split into smaller ones.

Would really love to know how to improve my query to get better results. What I basically want is clusters of like 10km or so.

Answer

The eps distance is the maximum distance between points in the cluster, not the maximum width of the entire cluster.

So if you have points A, B, and C, as long as each point is within the eps distance of one other point, then it gets included in the cluster. If the eps distance was 1 km, A could be within 1 km of B, and C can be within 1 km of B, but A can be 2 km from C and ABC are still a cluster because A & C are within 1 km of B.

https://en.wikipedia.org/wiki/DBSCAN

Blog

Sunday, 7 August 2016

postgis - Problems with ST_ClusterDBSCAN cluster sizes

No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?