Saturday 26 May 2018

select by location - Random selection of points in SpatialPointsDataFrame R object with distance constraint


I need to select points randomly (without replacement) from a SpatialPointsDataFrame (input object) in R with a minimum distance of 3,000 meters between them. I want to get random 50% of all points. I know that "Create Random Points" tool from ArcGIS, as mentioned before Randomly sampling points in R with minimum distance constraint, can do this processing, but I really need to do this inside R. I tried to use sample() function but I still did not realised how to set the geographical constraint. I tried to run QGIS inside R, but it seems that QGIS does not have a tool for that.


> input
class : SpatialPointsDataFrame
features : 205
extent : 203294.5, 259880.6, 7600123, 7668676 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=utm +zone=23 +south +ellps=GRS80 +units=m +no_defs
variables : 3
names : Sites, Long_X, Lat_Y
min values : CPF - 050, 203295, 7600120

max values : JES - S94, 259881, 7668680


> head(input)
Sites Long_X Lat_Y
1 CPF - 050 235441 7617150
2 CPF - 052 234106 7615740
3 CPF - 054 232683 7614280
4 CPF - 056 233863 7614420
5 CPF - 058 234012 7612890

6 CPF - 062 236929 7612850

rdm_input <- sample(x= nrow(input), size=(0.5*205), replace= FALSE)
[1] 167 189 79 80 126 129 144 100 4 109 170 72 123 73 132 93 169 5 134 176 196
158 152 183 23 136 180
[28] 130 12 142 179 11 13 66 22 2 96 29 54 137 120 171 184 36 113 3 81 115 30
85 61 162 98 102
[55] 103 181 90 133 56 174 76 201 150 197 14 86 118 121 28 97 160 178 1 186
141 163 172 32 65 168 74
[82] 114 182 128 131 67 70 165 187 185 69 17 194 16 154 119 192 156 106 25

101 198 105 108 151 125 190 10
[109] 84 38 51 161 40 94 45 19 145 75 42 122 117 49 87

Answer



I just added a function "sample.distance" to the development version of spatialEco package. You can install the development version from GitHub using: devtools::install_github("jeffreyevans/spatialEco")


I included a replacement argument to allow for sampling with or without replacement. I also added a d.max argument that allows for maximum as well as minimum (d) sampling distance. The defaults are no replacement (FALSE) and no maximum sampling distance. The trace argument is to print min/max sample distances for each random samples as well as the number of iterations for distance convergence.


Please note that just because you specify a condition for your data does not mean that it can actually be met. Here is an example using the meuse data. The data cannot meet the condition of a 500m minimum sampling distance for greater than ~15 points (n for 50% sample is 78). This is obviously dictated by the configuration of the randomization but n should not vary that much. I added error checking for non-convergence and the function will return the subsample on however many samples can be identified using the given conditions.


library(sp)
library(spatialEco)
data(meuse)
coordinates(meuse) <- ~ x+y


p = round( nrow(meuse) * 0.50, 0 )
sub.meuse <- sample.distance(meuse, n = p, d = 500, trace = TRUE)
plot(meuse,pch=19, main="min dist = 500")
points(sub.meuse, pch=19, col="red")

If you end up using this function in your research, please cite the package.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...