Monday 1 June 2015

qgis - How can I determine which raster attribute covers the highest area of a polygon?


I'm relatively new to GIS and QGIS but I have a problem that I'm not certain how to solve. I searched here for relevant information but found none. (Or, I used incorrect search terms.) My problem is thus:


I have a raster file of crop data determined from satellite imagery. Pixel size is 30 meter x 30 meter.



I have a shapefile with over 270,000 polygons of varying sizes.


I can get general 'zonal statistics' through QGIS but these are relatively meaningless: count, mean, and sum don't help me in determining the predominant--by area--crop type within a given polygon.


Ideally, I'd like to be able to determine which crop type covers the highest area in every polygon. As an example, in the image below, the parcel in the center of the image has at least six crop types within it's boundaries. However, clearly there is one predominant crop type. How can I associate that single crop type with that parcel? Is this possible?


An example image


I'm using QGIS version 2.4.0 on Windows 7.


If there is an R solution, I would enjoy hearing that, too.



Answer



I ended up using an R solution almost identical to that provided by @JeffreyEvans.


First, I was able to pare down the number of polygons of my shapefile I was genuinely interested in. I pared these data down by eliminating those polygons that were outside of city limits in the county in which I was interested. This was a simple process done in QGIS to intersect my parcel layer with a city limits layer.


I used the extract() function from the Raster package with the below function for mathematical Mode to extract the singular value for each polygon (of only ~22,000 rather than ~290,000):



  Mode <- function(x, na.rm = FALSE) {
if(na.rm){
x = subset(x, !is.na(x))
}

ux <- unique(x)
return(ux[which.max(tabulate(match(x, ux)))])
}

(I wrote this Mode() function to avoid the density() function returning values that were not integers.)



Then it was just an issue of applying the extract() function with my raster and SpatialPolygonsDataFrame, and Mode() function as values:


    ep <- as.data.frame(extract(myRaster, myShapefile, fun = Mode, na.rm = T))
names(ep) <- c("cropValue")

then inserting the crop data back into my SpatialPolygonsDataFrame:


    myShapefile@data$cropValue <- ep[,1]

I timed how long the data processing took. Initially, I ran the extract function on only the first 10 and 100 rows of my data. For the original SpatialPolygonsDataFrame, I calculated that it would have taken > 100 hours to do these analyses. I contemplated running it on a super-fast virtual machine via AWS EC2 but was able to pare down the spatial data frame to < 10% of its initial size. Still, running the above function on only ~22,000 polygons still took > 6 hours on my moderately fast desktop.


Thanks for the tips and advice! I was unable to discover a QGIS solution but was intrigued enough to start learning Python to remedy that in the future. However, R did the job.


Thanks again.



No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...