I'm looking for recommendations on best practices for dealing with raster data layers with different resolutions and projections. The advice I've been given is to always resample to the layer with the lowest resolution before performing any analysis, but this seems like huge waste of precision to me and I've never been given a solid explanation for why it should be done.
When is it reasonable to resample to match a higher resolution grid and what are the implications compared to resampling to a lower resolution?
I realize that this is likely highly situation dependent. I'm mostly looking for general guidelines, but here's my specific scenario for reference:
Scenario: I'm looking to construct a spatial regression model predicting land use based on a variety of environmental and socio-economic layers. My land use map is Landsat derived and hence 30m resolution. Examples of explanatory layers include the SRTM DEM (3 arc-seconds, ~90m) and Bioclim climate layers (30 arc-seconds, ~1km).
Answer
Actually it's not all that situation dependent and is all about statistical error.
Any time you resample to a higher resolution, you are introducing false accuracy. Consider a set of data measured in feet at whole numbers only. Any given point may be +/- 0.5 feet from its actual location. If you resample to the nearest tenth, you are now saying any given number is no more than +/- 0.1 from its actual location. Yet you know your original measurements were not that accurate, and you are now operating within the margin of error. However if you go the other way and resample to the lower resolution, you know that any given point value is definitely accurate because it is contained within the larger sample's margin of error.
Outside of statistical math, the first place that this comes to mind is in land surveying. Older surveys only specified bearings down to the nearest half-minute and distances to the tenth of a foot. Plotting a boundary traverse with these measurements can often result in a misclosure (the start point and end point should be the same but are not) measured in feet. Modern surveys go to at least the nearest second and hundreth of a foot. Derived values (such as the area of a lot) can be significantly affected by the difference in precision. The derived value itself can also be given as overly precise.
In your analysis case, if you resample to the higher resolution your results will imply a much greater accuracy than the data on which they are based. Consider your SRTM at 90m. By whatever method they measure the elevation (avg/max/mean return), the smallest unit (pixel) that can be differentiated from its neighbors is 90m. If you resample that to 30m, either:
- you assume all nine of the resulting pixels are that same elevation when in truth maybe only one - the center, or the top left - (or none!) is
- you interpolate between pixels, creating derived values not present before
Thus in both cases you introduce false accuracy because your new subsamples were not actually measured.
Related question: What practices are available for modelling land suitability?
No comments:
Post a Comment