Saturday 20 October 2018

spatial statistics - How to aggregate values of different statistical areas that are overlapping?


Let's say I have one polygon layer counties with the polygons of administrative borders and another layer postal codes with the polygons of postal code areas.


I have some statistical values per polygon for both, let's say the population per county and the number of people who read Magazine X per postal code.



Both datasets have the same total coverage, they are always overlapping but the boundaries of the polygons inside are very different.


What options are there to combine values here? For example if I would like to calculate the number of people in county Y who read Magazine X? I know this is a hard problem, especially because in the example I chose population in reality is not uniformly distributed in a county nor a postal code area. Still, surely there are some existing techniques?


I am looking for general terms and ideas behind techniques or algorithms, not specific tools in software.



Answer



In your example, the population value of the county is irrelevant to the question (it would have to be something like percentage of people in county Y who read Magazine X). The county borders simply serve as boundaries for aggregating the zip code values. However, if the issue is that zip code polys cross the boundary of a county and you only want the portion of the zip in the county, you need to first apportion (opposite of aggregation) the zip data and then aggregate it by county.


It appears you're familiar with aggregation, or the combination of smaller units of data into larger units. The opposite is know as apportioning, or allocating some part of an attribute value of a whole shape to the individual parts created when that shape is split up in some manner. Overlay operations, such as intersect or union, can split up your two layers so that you have non-overlapping polygons with boundaries of the areas each has in common. However, by themselves those tools typically don't account for attribute values. Either a processing environment setting can control it, or you have to do manual calculations.


The most common method of apportioning is by area (aka area weighting). You determine the percentage of the area of the smaller pieces to the total area of the original feature. Then you multiply those percentages (which should total 100) by the original value, which gives you the new value for each piece. As you mention, this method is limited in that it assumes a uniform distribution of the value throughout the original shape.


There are other methods. Esri has a presentation in pdf form that outlines a couple using their ArcGIS software. Specifically mask area weighting and filtered area weighting; both of which require a third ancillary dataset to give additional criteria on how to split the value up. The general concepts in those methods could apply to any software. For instance they use the term Ratio Policy within the software and documentation, but the general concept is 'in what ratio should this value be split up as the feature is'.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...