raster - Finding largest number of observations within a given radius

Saturday, 10 March 2018

raster - Finding largest number of observations within a given radius

I have the following problem: Given the location of insured properties (say houses with fire insurance), i have to find the largest sum of insured values which fit into a 300m radius. Now, my first approach would be to divide the map into grids of appropriate size(e.g. 55m), assign the properties to the grids and calculate the sum of the grids in the first step, and then aggregate blockwise (10x10 grids), and finally search for the largest block. But I have a fealing that there must be a more elegant approach.

Answer

Raster solution

The first approach works well provided you use an efficient algorithm. The most efficient is to compute the Fast Fourier Transform of a grid representing the data (cells are either zeros or contain the total insured values of all properties within occupied cells), multiply by the FFT of a "simple" kernel representing an average over a 300 m circle, and take the inverse FFT (which is just the FFT itself up to a constant). Then a scan over this grid identifies the local (and global) maxima. The cell(s) with highest values designate the centers of 300m circles containing the highest mean (or total) insured value.

(The FFT is not needed when a typical circle occupies only a handful of cells, but for small cellsizes--say, around 10m or less, where a circle can occupy thousands of cells--the FFT can be orders of magnitude faster than obvious brute force method of cumulatively allocating each property's value to all cells within 300m of it. However, if properties are so spread out that the great majority of cells have no properties in them, the brute force method tends to be competitive, but you usually have to hand code it to achieve this performance.)

Raster-based GISes and matrix-oriented computational platforms (such as MatLab and Mathematica) support this "spread-and-accumulate" operation. In GIS terminology it is variously called (up to the irrelevant multiplicative constant) a "focal sum," a "focal mean," a "neighborhood average," and a "simple kernel density."

Vector solution

Another approach creates a "vector" representation of each property as a circular feature (of 300m radius) in a "feature layer." The GIS "union" operation (of this layer with itself) breaks all overlaps of features into discrete patches of mutual overlap. Summing the property values associated with each patch gives the desired total. One more pass through this summary table will identify all global maxima.

Comparisons

The raster approach can require immense amounts of data storage if the properties cover a wide area (such as a large state), because for reasonable precision, cell sizes substantially smaller than 300m are needed. It, however, will handle extremely large numbers of properties (tens or even hundreds of millions) with aplomb. The vector approach can be efficient when properties are relatively few in number, but will bog down--and in most GISes take an impossibly long time to execute--when many millions of properties are involved. It will be slightly more accurate than the raster calculation, because there's no need to discretize space into finite cells, but be aware that property locations are themselves slightly inaccurate and spatially diffuse anyway, so requiring locational precision better than about 10m is probably overkill.

Blog

Saturday, 10 March 2018