Monday, 16 July 2018

r - Combining st_join with st_nn to include all points within, as well as within a given distance of a polygon


This is a follow up to my earlier question (Spatial join in R - Adding points to polygons with multiple corresponding points).


I have successfully joined a spatial points file to a polygon file in R using the st_join function within the sf package with more than one point being assigned to a polygon if necessary, duplicating rows but keeping all points which fall within a polygon.


st_join(polygons, points)


However I also need to join points which fall outside of the polygons but within 500m of a polygon to their nearest polygon. Points which are >500m away from a polygon can be discarded.


I thought that combining the above with st_nn from the nngeo package should work using the following:


st_join(polygons, points, join = st_nn, maxdist = 500)

However in this case only 1 point is assigned to a polygon, even if more than one point falls within a polygon or within 500m of a polygon. i.e. the rows are not duplicated.


Here is a screenshot of a sample of points and polygons:


enter image description here


And here is table showing how the points should be assigned to the polygons and how they have been assigned in the respective methods:


enter image description here



I find it a little strange that the second method does not keep the duplicates, even though it is based on the same function. Can anyone tell me what I'm doing wrong here?


Edit: I tried adjusting the k parameter but this simply joins the first points within the given distance up to the max number given and therefore can assign 1 point to 2 polygons. e.g.


st_join(polygons, points, join = st_nn, k = 10, maxdist = 500)


returns 5 points for polygon 89028 as there are 5 points within 500m, when in fact only 1 point should be returned (011-05-0529) as the other 4 points are already assigned to the polygon in which they fall. A point should only be assigned to one polygon.



Answer



I used a combination of @Michael's answer plus some additional manipulation to get the correct format. The resulting file is a polygon file with no duplicate polygons. If a polygon has >1 associated point, then the point columns from the join are repeated until every associated point is included.


library(sf)
library(data.table)
library(nngeo)


#Load files
Poly <-st_read("Path/Poly.shp")
Pts <- st_read("Path/Pts.shp")

names(Pts) #Get list of names for selecting required columns
col_interest <- c("col1", "col2") #add column names here

Join Pts to Poly resulting in a pts file with the ID of the nearest polygon within 500m attached in the polygon attributes
Poly_Pts_pts <- st_join(Pts, Poly, join = st_nn, maxdist = 500)


#convert to data.table
Poly_Pts_pts_DT <- as.data.table(Poly_Pts_pts)

#add a new column with running number for each individual Point within each polygon ID
Poly_Pts_pts_DT <- Poly_Pts_pts_DT[, New_ID := seq_len(.N), by = ID]

#Cast into wide format
Poly_Pts_pts_wide <- dcast.data.table(Poly_Pts_pts_DT, ID ~ New_ID, value.var = col_interest)#output is data.table

#Join Pts wide format to original polygons on ID column

Poly_Pts <- merge(Poly, Poly_Pts_pts_wide, by = "ID", all.x = TRUE)

#Write to disk
st_write(Poly_Pts, "Path/Poly_Pts.shp")```

1 comment:

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...