This is a follow up to my earlier question (Spatial join in R - Adding points to polygons with multiple corresponding points).
I have successfully joined a spatial points file to a polygon file in R using the st_join function within the sf package with more than one point being assigned to a polygon if necessary, duplicating rows but keeping all points which fall within a polygon.
st_join(polygons, points)
However I also need to join points which fall outside of the polygons but within 500m of a polygon to their nearest polygon. Points which are >500m away from a polygon can be discarded.
I thought that combining the above with st_nn from the nngeo package should work using the following:
st_join(polygons, points, join = st_nn, maxdist = 500)
However in this case only 1 point is assigned to a polygon, even if more than one point falls within a polygon or within 500m of a polygon. i.e. the rows are not duplicated.
Here is a screenshot of a sample of points and polygons:
And here is table showing how the points should be assigned to the polygons and how they have been assigned in the respective methods:
I find it a little strange that the second method does not keep the duplicates, even though it is based on the same function. Can anyone tell me what I'm doing wrong here?
Edit: I tried adjusting the k parameter but this simply joins the first points within the given distance up to the max number given and therefore can assign 1 point to 2 polygons. e.g.
st_join(polygons, points, join = st_nn, k = 10, maxdist = 500)
returns 5 points for polygon 89028 as there are 5 points within 500m, when in fact only 1 point should be returned (011-05-0529) as the other 4 points are already assigned to the polygon in which they fall. A point should only be assigned to one polygon.
Answer
I used a combination of @Michael's answer plus some additional manipulation to get the correct format. The resulting file is a polygon file with no duplicate polygons. If a polygon has >1 associated point, then the point columns from the join are repeated until every associated point is included.
library(sf)
library(data.table)
library(nngeo)
#Load files
Poly <-st_read("Path/Poly.shp")
Pts <- st_read("Path/Pts.shp")
names(Pts) #Get list of names for selecting required columns
col_interest <- c("col1", "col2") #add column names here
Join Pts to Poly resulting in a pts file with the ID of the nearest polygon within 500m attached in the polygon attributes
Poly_Pts_pts <- st_join(Pts, Poly, join = st_nn, maxdist = 500)
#convert to data.table
Poly_Pts_pts_DT <- as.data.table(Poly_Pts_pts)
#add a new column with running number for each individual Point within each polygon ID
Poly_Pts_pts_DT <- Poly_Pts_pts_DT[, New_ID := seq_len(.N), by = ID]
#Cast into wide format
Poly_Pts_pts_wide <- dcast.data.table(Poly_Pts_pts_DT, ID ~ New_ID, value.var = col_interest)#output is data.table
#Join Pts wide format to original polygons on ID column
Poly_Pts <- merge(Poly, Poly_Pts_pts_wide, by = "ID", all.x = TRUE)
#Write to disk
st_write(Poly_Pts, "Path/Poly_Pts.shp")```
Awesome Post!!! Thanks for posting it.
ReplyDeleteData Science in Business
How Data Science Helps Business