I'm extracting the area and percent cover of different land use types from a raster based on several thousand polygon boundaries. I've found that the extract function works much faster if I iterate through each individual polygon and crop then mask the raster down to the size of the particular polygon. Nonetheless, it's pretty slow, and I'm wondering if anyone has any suggestions for improving the efficiency and speed of my code.
The only thing I've found related to this is this response by Roger Bivand who suggested using GDAL.open()
and GDAL.close()
as well as getRasterTable()
and getRasterData()
. I looked into those, but have had trouble with gdal in the past and don't know it well enough to know how to implement it.
Reproducible Example:
library(maptools) ## For wrld_simpl
library(raster)
## Example SpatialPolygonsDataFrame
data(wrld_simpl) #polygon of world countries
bound <- wrld_simpl[1:25,] #name it this to subset to 25 countries and because my loop is set up with that variable
## Example RasterLayer
c <- raster(nrow=2e3, ncol=2e3, crs=proj4string(wrld_simpl), xmn=-180, xmx=180, ymn=-90, ymx=90)
c[] <- 1:length(c)
#plot, so you can see it
plot(c)
plot(bound, add=TRUE)
Fastest Method so far
result <- data.frame() #empty result dataframe
system.time(
for (i in 1:nrow(bound)) { #this is the number of polygons to iterate through
single <- bound[i,] #selects a single polygon
clip1 <- crop(c, extent(single)) #crops the raster to the extent of the polygon, I do this first because it speeds the mask up
clip2 <- mask(clip1,single) #crops the raster to the polygon boundary
ext<-extract(clip2,single) #extracts data from the raster based on the polygon bound
tab<-lapply(ext,table) #makes a table of the extract output
s<-sum(tab[[1]]) #sums the table for percentage calculation
mat<- as.data.frame(tab)
mat2<- as.data.frame(tab[[1]]/s) #calculates percent
final<-cbind(single@data$NAME,mat,mat2$Freq) #combines into single dataframe
result<-rbind(final,result)
})
user system elapsed
39.39 0.11 39.52
Parallel Processing
Parallel processing cut the user time by half, but negated the benefit by doubling the system time. Raster uses this for the extract function, but unfortunately not for the crop or mask function. Unfortunately, this leaves a slighly larger amount of total elapsed time due to "waiting around" by the "IO."
beginCluster( detectCores() -1) #use all but one core
run code on multiple cores:
user system elapsed
23.31 0.68 42.01
then end the cluster
endCluster()
Slow Method: The alternative method of doing an extract directly from the raster function takes a lot lot longer, and I'm not sure about the data management to get it into the form I want:
system.time(ext<-extract(c,bound))
user system elapsed
1170.64 14.41 1186.14
Answer
I have finally gotten around to improving this function. I found that for my purposes, it was fastest to rasterize()
the polygon first and use getValues()
instead of extract()
. The rasterizing isn't much faster than the original code for tabulating raster values in small polygons, but it shines when it came to large polygon areas that had large rasters to be cropped and the values extracted. I also found getValues()
was much faster than the extract()
function.
I also figured out the multi-core processing using foreach()
.
I hope this is useful for other people who want an R solution for extracting raster values from a polygon overlay. This is similar to the "Tabulate Intersection" of ArcGIS, which did not work well for me, returning empty outputs after hours of processing, like this user.
#initiate multicore cluster and load packages
library(foreach)
library(doParallel)
library(tcltk)
library(sp)
library(raster)
cores<- 7
cl <- makeCluster(cores, output="") #output should make it spit errors
registerDoParallel(cl)
Here's the function:
multicore.tabulate.intersect<- function(cores, polygonlist, rasterlayer){
foreach(i=1:cores, .packages= c("raster","tcltk","foreach"), .combine = rbind) %dopar% {
mypb <- tkProgressBar(title = "R progress bar", label = "", min = 0, max = length(polygonlist[[i]]), initial = 0, width = 300)
foreach(j = 1:length(polygonlist[[i]]), .combine = rbind) %do% {
final<-data.frame()
tryCatch({ #not sure if this is necessary now that I'm using foreach, but it is useful for loops.
single <- polygonlist[[i]][j,] #pull out individual polygon to be tabulated
dir.create (file.path("c:/rtemp",i,j,single@data$OWNER), showWarnings = FALSE) #creates unique filepath for temp directory
rasterOptions(tmpdir=file.path("c:/rtemp",i,j, single@data$OWNER)) #sets temp directory - this is important b/c it can fill up a hard drive if you're doing a lot of polygons
clip1 <- crop(rasterlayer, extent(single)) #crop to extent of polygon
clip2 <- rasterize(single, clip1, mask=TRUE) #crops to polygon edge & converts to raster
ext <- getValues(clip2) #much faster than extract
tab<-table(ext) #tabulates the values of the raster in the polygon
mat<- as.data.frame(tab)
final<-cbind(single@data$OWNER,mat) #combines it with the name of the polygon
unlink(file.path("c:/rtemp",i,j,single@data$OWNER), recursive = TRUE,force = TRUE) #delete temporary files
setTkProgressBar(mypb, j, title = "number complete", label = j)
}, error=function(e){cat("ERROR :",conditionMessage(e), "\n")}) #trycatch error so it doesn't kill the loop
return(final)
}
#close(mypb) #not sure why but closing the pb while operating causes it to return an empty final dataset... dunno why.
}
}
So to use it, adjust the single@data$OWNER
to fit with the column name of your identifying polygon (guess that could have been built into the function...) and put in:
myoutput <- multicore.tabulate.intersect(cores, polygonlist, rasterlayer)
No comments:
Post a Comment