Sunday 7 August 2016

postgis - Problems with ST_ClusterDBSCAN cluster sizes


I want to cluster all villages (and towns, ...) around the world. So instead of having millions of them, I'd like to reduce them by combining villages that are close enough (like 10km or so) to each other.


So I was looking into ST_ClusterDBSCAN and it's doing quite a good job. This is how I made my table:


INSERT INTO villages_clustered
SELECT
name,
way,

ST_ClusterDBSCAN(way, eps := 2000, minpoints := 1) over () AS cluster_id
FROM villages;

Now when I look at my data, I get lots of small clusters which is what I wanted. But there are also some, which are just suuuuper large that should split into smaller ones.


Would really love to know how to improve my query to get better results. What I basically want is clusters of like 10km or so.


enter image description here



Answer



The eps distance is the maximum distance between points in the cluster, not the maximum width of the entire cluster.


So if you have points A, B, and C, as long as each point is within the eps distance of one other point, then it gets included in the cluster. If the eps distance was 1 km, A could be within 1 km of B, and C can be within 1 km of B, but A can be 2 km from C and ABC are still a cluster because A & C are within 1 km of B.


https://en.wikipedia.org/wiki/DBSCAN



No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...