Thursday, 24 March 2016

Making linear regression across multiple raster layers using ArcGIS Desktop?


I have a series of 5 rasters which represent a value for 5 consecutive years.


Is there a way to create a linear regression for each cell over the 5 layers and output the slope?


The result would be a raster with cell values equal to the slope of the linear regression line.


Essentially it is exactly what is done in How to represent trend over time?



Answer




For a problem this small the slopes are easily computed with a simple raster calculation. Given that the years are consecutive, let's name the rasters [y.1], [y.2], [y.3], [y.4], and [y.5] in temporal order. The slope grid is


(2/10) * ([y.5] - [y.1]) + (1/10) * ([y.4] - [y.2])



For other than five rasters--but still assuming they represent consecutive times--there is a similar formula. Each raster [y.i], for i = 1, 2, ..., through n, gets multiplied by a coefficient and all these results are added up. The coefficients are obtained by writing down the numbers


12, 24, 36, ..., 12n

and subtracting 6(n+1) from them. For instance, with n=8 we would subtract 6(8+1) = 54 from each, giving the eight numbers


-42, -30, -18, -6, 6, 18, 30, 42


These would multiply the rasters in temporal order. It's convenient to pair them by common coefficient sizes so you could write this out as


42 * ([y.8] - [y.1]) + 30 * ([y.7] - [y.2]) + 18 * ([y.6] - [y.3]) + 6 * ([y.5] - [y.4])

That reduces the amount of writing and the number of grid multiplications that are done. Finally, divide the result by n^3 - n. In the case n = 8, n^3 - n = 512 - 8 = 504. The net effect (if you want to compare this to other formulas) would be to multiply the input rasters by the coefficients


-1/12, -5/84, -1/28, -1/84, 1/84, 1/28, 5/84, 1/12

and add up the results.


In more general situations, where there may be varying intervals between the rasters, there is still a similar formula: the slope grid is always a linear combination of the rasters, but the coefficients will be less regular. The coefficients can be found from the general formula (X'X)^(-1)X' where X is the n by 2 "design matrix" having a column of n 1's and a second column set to the times of the grids.




Usually the wrong way to do this is to loop over all the cells, pick out the cell values in the n rasters, and send them (and the times) to a line-fitting routine. That's much more work than necessary, because in effect each call to that routine is working out the same coefficients millions of times over (once for each cell). If, however, the rasters have a large number of missing values occurring in many different patterns, this longer way would make sense, for then you could obtain slopes even where one or more of the rasters is missing a value but the remaining rasters do have values.



No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...