I'm trying to cluster the geometries in a sf
object that touch each other. For the moment I'm using the same approach that it's explained in the answer here, but the problem is that the same idea doesn't work if I consider a bigger number of geometries because the touching matrix becomes too memory-heavy and I'm not using its sparse properties.
A small and (I hope) reproducible example
# packages
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3
library(osmdata)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright
# download data
milan_primary_highways <- opq("Milan, Italy") %>%
add_osm_feature(key = "highway", value = "primary") %>%
osmdata_sf() %>%
trim_osmdata(bb_poly = getbb("Milan, Italy", format_out = "polygon")[[1]]) %>%
magrittr::use_series(osm_lines)
# Determine the non-sparse matrix of touching geometries
touching_matrix <- st_touches(milan_primary_highways, sparse = FALSE)
#> although coordinates are longitude/latitude, st_touches assumes that they are planar
# Cluster the geometries
hc <- hclust(as.dist(!touching_matrix), method="single")
# Cut the dendrogram
groups <- cutree(hc, h=0.5)
# Results
table(groups)
#> groups
#> 1 2 3 4 5 6 7 8 9 10 11 12
#> 1444 21 18 8 4 5 8 2 3 5 2 2
plot(st_geometry(milan_primary_highways), col = groups)
Is there an alternative approach that takes advantage of the sparse property of the touching matrix (that I cannot use because as.dist just accepts a numeric matrix). Do you want me to provide an example when this approach does not work? The error is simply "R cannot allocate a vector of size n GB" when I use the function hclust(as.dist(x))
.
I don't know if it's important but I tried to explain the reason why I need this clustering here
Answer
Make an adjacency list:
touching_list = st_touches(milan_primary_highways)
this isn't NxN so won't grow - its a list of length N where each element is only the adjacent elements. This might help it scale up to larger problems.
Now use igraph
to create a graph objects and get the connectivity:
library(igraph)
g = graph.adjlist(touching_list)
c = components(g)
The membership
element of c
is of length N and says which cluster it is in. It summarises the same as your example:
> table(c$membership)
1 2 3 4 5 6 7 8 9 10 11 12
1444 21 18 8 4 5 8 2 3 5 2 2
So if you want to assign it to the data, do:
milan_primary_highways$groups = c$membership
No comments:
Post a Comment