Monday 26 September 2016

Joining polygons in R



I am wondering how to join spatial polygons using R code?


I'm working with census data where certain areas change over time and I wish to join the polygons and the corresponding data and simply report on the joined areas. I am maintaining a list of polygons that have changes census to census and that I plan to merge. I'd like to use this list of area names as a lookup list to apply to census data from different years.


I'm wondering what R function to use to merge selected polygons and respective data. I have googled it but simply become confused by results.



Answer



The following solution is based on a post by Roger Bivand on R-sig-Geo. I took his example replacing the German shapefile with some census data from Oregon you can download from here (take all shapefile components from 'Oregon counties and census data').


Let's start with loading the required packages and importing the shapefile into R.


# Required packages
libs <- c("rgdal", "maptools", "gridExtra")
lapply(libs, require, character.only = TRUE)


# Import Oregon census data
oregon <- readOGR(dsn = "path/to/data", layer = "orcounty")
oregon.coords <- coordinates(oregon)

Next, you need some grouping variable in order to aggregate the data. In our example, grouping is simply based on the single county coordinates. See the image below, black borders indicate the original polygons, whereas red borders represent polygons aggregated by oregon.id.


# Generate IDs for grouping
oregon.id <- cut(oregon.coords[,1], quantile(oregon.coords[,1]), include.lowest=TRUE)

# Merge polygons by ID
oregon.union <- unionSpatialPolygons(oregon, oregon.id)


# Plotting
plot(oregon)
plot(oregon.union, add = TRUE, border = "red", lwd = 2)

Original and grouped Oregon shapefile


So far, so good. However, data attributes related to the original shapefile's subregions (e.g. population density, area, etc.) get lost when performing unionSpatialPolygons. I guess you'd like to aggregate your census data associated to the shapefile as well, so you'll need an intermediate step.


You first have to convert your polygons to a dataframe in order to perform aggregation. Now let's take data attribute columns six to eight ("AREA", "POP1990", "POP1997") and aggregate them according to the above IDs applying function sum.


# Convert SpatialPolygons to data frame
oregon.df <- as(oregon, "data.frame")


# Aggregate and sum desired data attributes by ID list
oregon.df.agg <- aggregate(oregon.df[, 6:8], list(oregon.id), sum)
row.names(oregon.df.agg) <- as.character(oregon.df.agg$Group.1)

Finally, reconvert your dataframe back to a SpatialPolygonsDataFrame providing the previously unified shapefile oregon.union and you obtain both generalized polygons and your census data derived from above summarization aggregation step.


# Reconvert data frame to SpatialPolygons
oregon.shp.agg <- SpatialPolygonsDataFrame(oregon.union, oregon.df.agg)

# Plotting

grid.arrange(spplot(oregon, "AREA", main = "Oregon: original county area"),
spplot(oregon.shp.agg, "AREA", main = "Oregon: aggregated county area"), ncol = 1)

Oregon areas


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...