Thursday 13 October 2016

algorithm - Clustering undirected lines


I'm looking for an efficient way to cluster lines independent of their direction. That means that a line between New York and Los Angeles should be in the same cluster as a line in the other direction between Los Angeles and New York. The start / end point locations should be similar (i.e. San Diego to Long Island should be in the same cluster as LA-NY but probably not San Francisco to Boston) and there are no intermediate points. Input data would be similar to this example:


enter image description here (By Cassiopeia sweet at Japanese Wikipedia GFDL or CC-BY-SA-3.0, via Wikimedia Commons)


I have previously tried to sort the lines in advance, e.g. to make them all run from west to east, but this does not solve the problem for lines running from north to south and other way around.


Do you know any algorithm dealing with this problem? I've been looking but besides Algorithm to compute average direction of undirected segments I haven't found anything remotely helpful, so I must be using the wrong search terms.



Answer



If I understand you right you want to cluster lines that is about the same without respect to direction.



Here is an idea that I think could work.




  1. split the lines in start point and end point




  2. Cluster the points and get cluster id




  3. Find lines with the same combination of cluster id. Those are a cluster





This should be possible in PostGIS (of course :-) ) version 2.3


I haven't tested the ST_ClusterDBSCAN function, but it should do the work.


If you have a line table like this:


CREATE TABLE the_lines
(
geom geometry(linestring),
id integer primary key
)


And you want to create the cluster where start and end points is max 10 km apart. And there must be at least 2 points to be a cluster then the query could be something like:


WITH point_id AS
(SELECT (ST_DumpPoints(geom)).geom, id FROM the_lines),
point_clusters as
(SELECT ST_ClusterDBSCAN(geom, 10000, 2) cluster_id, id line_id FROM point_id)
SELECT array_agg(a.line_id), a.cluster_id, b.cluster_id
FROM point_clusters a
INNER JOIN point_clusters b
ON a.line_id = b.line_id AND a.cluster_id < b.cluster_id

GROUP BY a.cluster_id, b.cluster_id

By joining with a.cluster_id you get comparable cluster id independent of direction.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...