qgis - Seeking method for automatic aggregation of polys with sparse data?

Monday, 23 February 2015

qgis - Seeking method for automatic aggregation of polys with sparse data?

I am part of a project on presenting health related data on maps. In order to maximize the analytical usefulness of the data, we wish to divide them into as small areas as possible. However, to protect individuals we also have to make sure that each area contains enough cases to secure anonymity. We are therefore looking for methods to automatically aggregate an area with too sparse data to a neighboring area – or perhaps a bit more advanced – to a non-neighboring area in the map, which according to sociodemographic parameters resembles the area in question.

I seek any input on methods for automatically:

Aggregate an area with too few cases to a neighboring area – for example the neighboring area with the lowest number of cases.

Aggregate an area A to the area B – neighboring or non-neighboring – which on a number of parameters has the closest resemblance to area A.

The work will be done in QGIS and/or ArcGIS Desktop, so any readily implemented methods in any of these systems are welcome, but general theoretical descriptions of statistical methods – probably most relevant for the second case – are also of interest.

Answer

Further research came across this question, which terms what you are trying to do not only as aggregation, but more specifically Zone Design. There may already be tools out there designed to aggregate data for solving this specific type of problem. Based on an answer in the aforelinked question, AZTool may fit your needs. If not, this will require a model or script to completely solve 'automatically'.

I agree with GISKid's comment that merging to non-neighboring areas would be a mistake given you are looking at a geographic relationship component. No matter how many other parameters match, if they're in completely different places you're changing distribution rather than aggregating. Much beyond that I can't address the statistical rationale for making certain choices; only the way to carry them out.

The specific attributes and structure of your data is going to play a significant role here, so I'm just going to outline a general process/workflow. I'm assuming you have a single polygon layer with attributes.

Create a new attribute field called ToMerge with a default value of 0.

Select all records where cases < cut-off, and set ToMerge to 1.

Iterate through each of those records to find a poly to merge to. This is the tricky part, may actually require multiple steps or a significant decision tree, and you'll have to make the rules to. If two are already adjacent, just merge them? Three or more adjacent? Merge to the neighbor with the lowest count? The highest? Match specific parameters? Check for more than one condition? Iterate through each of the neighbors and find the best match, or take the first one? Account for polygon size (significant alteration of density - say a very large poly with too few cases next to a very small poly with a lot of cases)? Preserve case density? However you decide, eventually you'll pick one neighbor and change its ToMerge value to 1 before moving on to the next feature that needs a neighbor selected.

After going through all of step 2's records and assigning a neighbor to be merged to, run a Dissolve tool on the layer using ToMerge as the dissolve field, do not allow multipart features, and set up the required statistics fields so that your attributes are summed or whatever method is required to combine the two poly's attribute values. Note I have ArcGIS's Dissolve tool in mind as I write this.

With the resulting dataset, ensure that there is still a ToMerge field and reset all values to 0, then repeat from step 2. The repeat may not be necessary if you take care of the check in step 3, but you do need to make sure that your aggregated/dissolved polygons meet the cut-off even after adding two together.

Once all loops are complete, you'll have your output dataset that has been aggregated as required.

Another approach rather than Dissolve (again ArcGIS) would be the Aggregate Polygons tool, followed by some joins and field calculations to aggregate the attribute values. You'd still have to have pick which ones to aggregate though.

Blog

Monday, 23 February 2015

qgis - Seeking method for automatic aggregation of polys with sparse data?

No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?