Saturday 24 November 2018

Processing vector to raster faster with R


I am converting vector to raster in R. However the process was too long. Is there possibility to put the script into multithread or GPU processing in order to do it more faster?


My script to rasterized vector.


r.raster = raster()
extent(r.raster) = extent(setor) #definindo o extent do raster
res(r.raster) = 10 #definindo o tamanho do pixel
setor.r = rasterize(setor, r.raster, 'dens_imov')


r.raster


class : RasterLayer dimensions : 9636, 11476, 110582736 (nrow, ncol, ncell) resolution : 10, 10 (x, y) extent : 505755, 620515, 8555432, 8651792 (xmin, xmax, ymin, ymax) coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0


setor


class : SpatialPolygonsDataFrame features : 5419 extent : 505755, 620515.4, 8555429, 8651792 (xmin, xmax, ymin, ymax) coord. ref. : +proj=utm +zone=24 +south +ellps=GRS80 +units=m +no_defs variables : 6 names : ID,CD_GEOCODI, TIPO, dens_imov,area_m,domicilios1 min values : 35464, 290110605000001, RURAL, 0.00000003,100004,1.0000 max values : 58468, 293320820000042, URBANO, 0.54581673,99996, 99.0000


Print of setor enter image description here



Answer



I tried to "parallelize" the function rasterize using the R package parallel in this way:



  1. split the SpatialPolygonsDataFrame object in n parts


  2. rasterize every part separately

  3. merge all the parts into one raster


In my computer, the parallelized rasterize function took 2.75 times less than the no-parallelized rasterize function.


Note: the code below download a polygon shapefile (~26.2 MB) from the web. You can use any SpatialPolygonDataFrame object. This is only an example.


Load libraries and example data:


# Load libraries
library('raster')
library('rgdal')


# Load a SpatialPolygonsDataFrame example
# Load Brazil administrative level 2 shapefile
BRA_adm2 <- raster::getData(country = "BRA", level = 2)

# Convert NAMES level 2 to factor
BRA_adm2$NAME_2 <- as.factor(BRA_adm2$NAME_2)

# Plot BRA_adm2
plot(BRA_adm2)
box()


# Define RasterLayer object
r.raster <- raster()

# Define raster extent
extent(r.raster) <- extent(BRA_adm2)

# Define pixel size
res(r.raster) <- 0.1


BrazilSPDF


Figure 1: Brazil SpatialPolygonsDataFrame plot


Simple thread example


# Simple thread -----------------------------------------------------------

# Rasterize
system.time(BRA_adm2.r <- rasterize(BRA_adm2, r.raster, 'NAME_2'))

Time in my laptop:


# Output:

# user system elapsed
# 23.883 0.010 23.891

Multithread thread example


# Multithread -------------------------------------------------------------

# Load 'parallel' package for support Parallel computation in R
library('parallel')

# Calculate the number of cores

no_cores <- detectCores() - 1

# Number of polygons features in SPDF
features <- 1:nrow(BRA_adm2[,])

# Split features in n parts
n <- 50
parts <- split(features, cut(features, n))

# Initiate cluster (after loading all the necessary object to R environment: BRA_adm2, parts, r.raster, n)

cl <- makeCluster(no_cores, type = "FORK")
print(cl)

# Parallelize rasterize function
system.time(rParts <- parLapply(cl = cl, X = 1:n, fun = function(x) rasterize(BRA_adm2[parts[[x]],], r.raster, 'NAME_2')))

# Finish
stopCluster(cl)

# Merge all raster parts

rMerge <- do.call(merge, rParts)

# Plot raster
plot(rMerge)

BrazilRaster


Figure 2: Brazil Raster plot


Time in my laptop:


# Output:
# user system elapsed

# 0.203 0.033 8.688

More info about parallelization in R:



No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...