Tuesday 25 June 2019

shapefile - raster::rasterize set wrong values with enclosed polygons


Similarly to this post


rasterize (R package raster) fail to rasterize island polygons?


the rasterize function in the raster package is giving me problems with enclosed polygons. In my case, some enclosed polygons are rasterized with cell values equal to the enclosing polygons, as they were somehow 'merged'.


Here's my sample code


require(sp)
require(rgdal)
require(rgeos)
require(raster)


## set wd
wd <- 'M:/foo'
setwd(wd)

## open shapefile
sample.poly <- readOGR(dsn=wd, layer='sample')

## rasterize
resol <- 10

r <- raster(res=resol, ext=extent(sample.poly))
sample.r <- rasterize(sample.poly, r, 'Code')

## export files
writeRaster(x=sample.r, filename='sample.tif', format='GTiff')
writeOGR(obj=sample.poly, dsn=wd, layer='sample', driver='ESRI Shapefile')

Here's the structure of table of attributes


str(sample.poly@data)
'data.frame': 11 obs. of 2 variables:

$ OBJECTID: num 6249 14593 14614 15434 15683 ...
$ Code : int 33200 32110 33300 32410 32210 31210 33300 32110 32210 31210 ..

data$Code are CORINE LCM-like codes. I already tried to set them as characters instead of integers, but the same error occurred.


Here's how the sample shapefile looks like (in QGIS)


enter image description here


and here's the result of rasterize


enter image description here


As you can see, the enclosed polygon marked with yellow arrow is properly rasterized, while the one marked with the red arrow is 'dissolved' in the enclosing polygon.


Any suggestion? Is it a bug? This is just a sample taken from a larger shapefile, with dozens of such errors.




Answer



This is not a bug but an expected behavior. Note that rasterize takes a fun argument that handles grid cells with two or more values. By default it uses that last function, namely the value that appears last on the data data.frame is used. Similarly first will use the value that appears first. Other functions include count, mean, etc.


Here is a short example with a proposed solution. In short, if you are able to sort the shape file features by their area from smallest to largest, the first function promise to get all islands rasterized.


QGIS vector with islands


The data is given here; islands are in bold.



str(shape@data) 'data.frame': 4 obs. of 2 variables: $ id : int 1 2 3 4 $ type: int 1 2 2 1



Using the first function, as in: r2first <- rasterize(shape, r, "type", fun = "first") does not rasterize the islands, since they appear last in the attribute table.


Islands are not being rasterized



The following code fixes the problem:


area <- sapply(1:nrow(shape), function(x) {shape@polygons[[x]]@Polygons[[1]]@area}) shape1 <- shape[order(area), ] r2first_1 <- rasterize(shape1, r, "type", fun = "first") plot(r2first_1)


enter image description here


The idea of using area as the sort criteria is that islands must be smaller that their accommodating polygons.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...