arcmap - Calculating Flow Direction and Delineating Basins from Projected vs. Unprojected DEM Data

This is somewhat of a theoretical question stemming from some discussions with colleagues on the topic of implications with delineating basins with projected (e.g., Albers Equal Area) vs. unprojected (NAD 83) data derived from a 10m DEM that's in NAD 83.

Some have stated that it's not an issue as the values calculated from unprojected data simply get adjusted if you do decide to project.

I'm not sure this is the case though, as there are inherent differences between data in a geographic coordinate system and projected data. I tried one example going through the routine starting with unprojected DEM data, then tested the same site with projected DEM data. Steps taken for both were done (all work done in ArcGIS 9.3.1) using 10m DEM data.

One run was done using a DEM in NAD 83, and the second run was done by projecting the same DEM into USA_Contiguous_Albers_Equal_Area_Conic_USGS_version.

In comparing the two I could notice a visual difference between the display of the Flow Direction grids.

NOTE: After more subsequent research I believe the striping effect is due to not using a CUBIC resampling but by mistakenly going with the default of NEAREST in the ArcGIS Project Raster tool. I don't believe this provides any sort of resolution to this debate though...

Flow directions using unprojected DEM

enter image description here

Flow directions using projected DEM

enter image description here

I understand that visual comparison is not 100% scientific but can be a good starting point.

Accordingly, there was a difference between the pour point with how it snapped for each run. And, there was a definite difference in the derived watersheds given how the snap pour point tool decided to snap based on the respective projected/unprojected datasets. The watershed shown in green is the watershed derived using the projected DEM and subsequent projected-derived elevation derivative data. The watershed shown in the purple outline is the watershed derived using the unprojected DEM data.

The watershed

enter image description here

I've come across these two other GIS forum threads (links below) that discuss this issue in the old ESRI forums, but I'm still not clear as to how the Flow Direction tool works relative to projected vs. unprojected data (I understand the concept of hydrologic flow and flow direction though). If each cell still has the same elevation value in a projected DEM vs. an unprojected DEM (is this correct?), why is there a difference in a flow direction raster derived from projected data versus one derived from DEM data in NAD83?

http://forums.esri.com/Thread.asp?c=93&f=995&t=292503

http://forums.esri.com/Thread.asp?c=93&f=995&t=290652

Also, would any differences theoretically be less of an issue if doing delineations in a higher Latitude such as, Shenandoah National Park in Virginia versus doing delineations in the state of Texas?

I spoke with one cartography expert that thought that the east-west distortion you get as you move away from the equator could likely be an issue (like how in some maps Canada is extremely bloated and distorted), in that if you're more than 10 degrees of latitude away from the equator they thought projected data is the way to go if you're concerned with accuracy.

One major unknown is the level of uncertainty with basins delineated using unprojected data that we're trying to get a handle on. There is a difference, but what is the magnitude?

Thanks to anyone that can provide a straight-forward answer to this discussion, or just some helpful insight into this.

Edit

The main issue we're interested/concerned with is if there will be accuracy issues with the delineated watersheds as a result of starting the process using an unprojected DEM.

So, if I'm understanding the reply, the delineated basins should be fine in terms of representing the drainage area for a pour point? It seems though if the flow directions are wrong that will result in some error in the final delineated watershed.

This is a very interesting and really an important topic - I have yet to see a report or documentation stating it's OK to use UN-projected data for delineating watersheds. I have set through ESRI User Conference technical talks led by the lead developer engineer of the Spatial Analyst extension (which houses the Hydrology tools) where they said you should use an equal area projection (such as Albers equal area) as well.

As well, there doesn't seem to be any authoritative "bible" standard for how to go about this - just seems that it's an almost acknowledged de facto approach to project the data before calculating your elevation derivatives.

Nowhere have I been able to find a concise and straightforward answer as to how this impacts flow direction calculation and subsequently the delineation of a watershed.

And, if you end up working with watersheds delineated using unprojected DEM data and then you project those watersheds, isn't the inaccuracy still there (e.g., in terms of determining a watershed area or any other characteristics such as land cover proportions etc)?

Furthermore, I'm assuming that projecting a flow direction raster that was derived from an unprojected DEM does not correct the errors either since the source data were unprojected....

thanks - appreciate any additional insight you can provide

EDIT - 20110331

@whuber:

thanks for this extensive discussion. We've been researching this issue more and actually have come across some references that suggest that it's actually better to not project the DEM before getting flow dir., flow accum., and delineating.

One email response from an anonymous source (but who is a pretty reputable person), when posed the question of 1.) project DEM 2.) produce derivatives OR 1.) produce derivatives 2.) project DEM said:

In a nutshell, it depends on the derivative. For continuous derivatives that will be visualized, you should derive and then project—this reduces the risk of tile boundary artifacts being enhanced or introduced (by the projection algorithm) and then passed along to the derivative if you were to project the DEM first. The exception to this is when you are also using distance or area as the basis for your derivative calculation. This is of course relative to how large the distances/areas are and how far you can acceptably get away from the equator. So imagine that for derivatives like slope or hillshade, which depend on the cellsize, there are consequences. These derivatives will be most accurate at the equator, and the accuracy will degrade significantly past 60 degrees north or south. In both cases, I am assuming the DEM covers a very large area (wider than 1.5 UTM Zones) and a traditional tile-based approach where the tiles are either arbitrary or conform to existing standards like USGS Quad sheet boundaries. So saying the implication is that much of this thinking predates mosaic datasets, which I am less able to comment on. The main concern for me would be wanting to know how well matched the DEM tiles are. If they are well matched (like NED) then I expect things to work well, with derivatives being derived from tiles (as functions applied to the mosaic dataset) and then these are displayed on the fly. If they are not well matched, then garbage in, garbage out. Back to your original question, I think if it is just watershed boundaries, it would be possible to derive these without projecting because it’s not how much curvature or slope that matters, just where it is and that it exists.

They went on to say:

The reason I would stick to the un-projected methodology is that we are using rasters which are in and of themselves a derivative of DEM (which we typically don’t have, but think LiDAR point cloud). For rasters that cover very large areas, like continents at relatively fine levels of resolution, projecting to something like Albers will result in loss or introduction of information, when the raster uses regular sized cells (like Esri’s rasters do). That means tools like Flow Accumulation will produce results based on partial or interpolated information. Basically all projection algorithms applied to rasters will cause problems as soon as there is a expansion or shrinkage of more than the distance of a pixel width (projections like Albers can introduce error by introducing new pixels between two old ones). Deriving from these means the potential for cumulative error is high.

This seems to suggest the opposite - that projecting introduces more noise, unless you get above 60 degrees latitude.

We've also come across some published sources that have suggested similar that unprojected is an acceptable approach for smaller watersheds (last 2 paragraphs of section 1.6) from Distributed Hydrologic Modeling for GIS (Vieux, 2004): http://www.springerlink.com/content/x877238532533g20/fulltext.pdf

So, in the end, does it just boil down to a matter of 1.) where you're doing work on the earth's surface 2.) the scale you're working at, and 3.) whether the noise introduced by a projection that will better preserve attributes that affect the flow direction algorithm is less than the distortion introduced by unprojected data (the benefit increasing as you move towards the poles) to determine whether you should project to something like conformal, or if it doesn't matter?

When you start digging into this topic it seems like the larger consensus is to project, but there are some that seem to say that's not a hard and fast rule.

Answer

You are correct that distortions in the projection can bias flow direction (and flow accumulation) estimates. (Using "unprojected" data is tantamount to using the highly distorting Plate Carree projection.)

For merely delineating basins, though, there actually is little problem: although the flow directions and flow amounts will be wrong, the projection won't cause water to appear to flow into areas it doesn't go. Downhill is still downhill.

By means of simple examples, it's not hard to see where the bias comes from. Consider two points 141 meters apart, one northeast of the other and immediately downgradient. The flow direction therefore is due northeast. In coordinates, the downgradient point is offset 100 meters in the x direction and 100 meters in the y direction. If you are at (say) latitude 60 degrees using unprojected data, the offsets will actually look like 200 meters in the x direction and 100 meters in the y direction. (200 = 100/cos(60).) That translates into a bearing of 63 degrees east of north rather than 45 degrees. In many flow direction/flow accumulation/delineation algorithms only 8 cardinal directions are possible. Thus, instead of indicating a northeast flow, the grid might shift this into a due easterly flow.

(The 63 degrees is computed trigonometrically as a function of the relative distortion in the projection between the direction of maximum distortion and the direction of minimum distortion. This begins to quantify the effect of using unprojected data.)

A good way to visualize this is to draw the 8 compass directions correctly on a sheet of rubber. Stretch the rubber sideways (with more stretch for higher latitudes): the more you stretch, the more the arrows all tend to point east-west. In those directions the angles shrink, while towards the north and south the angles expand. In the meantime, the elevations on the grid remain unchanged. The result is that both the slope and the aspect of the land are distorted, because they depend on the rate of change of elevation with respect to the positional coordinates.

Cardinal directions

Distorted directions

There will actually be more of an issue in Virginia than in Texas because of this. Your cartographer is correct. (I don't know where the 10 degree cutoff comes from, though. It sounds reasonable but rules of thumb like this need to be assessed in light of your accuracy requirements. In some cases you can get away with no projection and in others you might want much more accuracy.)

Most of these issues become moot when you adopt an appropriate workflow. Begin by projecting your data with the best conformal projection you can find (because there are no distortions of relative angles). Compute flow and anything else that involves the direction information. Then unproject (or reproject) the results back to whatever coordinate system you want to use for follow-on analysis or mapping. For instance, to compute areas of the delineated basins, reproject with an equal-area projection. The point is that reprojecting is simple enough that you can afford to, and should, change projections as needed to accommodate the calculations and mapping you are performing: you're not stuck with a single compromise projection.

An addendum to the original question focuses on watershed delineation. Let's address this. To do so, we need to understand how flow directions are estimated.

The ArcGIS method for computing slopes and aspects is documented:

The direction of flow is determined by the direction of steepest descent from each cell.

Specifically, let x[0,0] designate the value in a cell and let x[i,j] designate the value in the cell i columns to the right and j rows below. Apart from some special cases dealing with sinks and resolving ties, the algorithm selects the largest of the eight directional slope estimates (x[0,0]-x[i,j])/Sqrt[i^2+j^2] where |i| <= 1 and |j| <= 1 and assumes that is the direction of flow. These numbers are ratios: the numerators are differences in elevation and the denominators are distances computed via the Pythagorean Theorem in whatever coordinates are in use.

Upon reprojecting the grid, two things happen: (1) the cells are moved (and distorted as this happens) and therefore (2) the grid values (elevations) are resampled onto the lattice of cells for the new grid. Small changes in elevation can occur due to resampling and these could induce occasional changes in the estimated flow direction. Typically such changes should be rare, so let's ignore them. These changes will be dwarfed by changes induced by metric distortions in the reprojection. For instance, in reprojecting from Plate Carree (essentially a geographic coordinate system) into a conformal projection, the east-west direction will shrink by the cosine of the latitude. In the space (along a row) where one cell used to fit, 1/cos(latitude) cells now have to fit. This will typically magnify any apparent slope estimate in any direction having an east-west component (i.e., the NE, E, SE, SW, W, and NW directions). Whereas earlier such slopes might not have appeared to be the largest, and therefore were not selected by the ArcGIS algorithm, by being made larger they might now be selected as the flow direction. Accordingly, at many places a north or south flow direction will be converted into NE, NW, SE, or SW, and a NE direction might be converted into due E, etc.

The effects of any reprojection can be predicted using a similar calculation: you need to know the directional distortions that occur in going from one to the other.

Let's consider what it means to "be in the watershed" of a "pour point" x. Let's agree that any location y "lies in the watershed of x" means that if the surface were bare, frictionless, impermeable, and smooth, and if water were to flow without spreading (purely advective flow), then it would flow from y to x. That, anyway, is what the GIS does in computing flow accumulation (which is at the heart of watershed delineation).

In most locations, when the pour point x lies along a stream bed, the distortions from reprojection make no essential difference: they cause the apparent flow path from y to x to change, but ultimately the water arrives in the same stream bed anyway, albeit perhaps by a slightly different route. If any discrepancy occurs, it must be because either (a) the flow path arrives further downgradient along the stream from x (and so y is no longer considered to be in the watershed of x), (a') points y' that flowed into points downstream of x now flow into x (and so now are included in the watershed of x) or (b) the new flow path goes into a different stream (which is really a special case of (a) and (a')). The first (a and a') might happen a lot, but it will create differences primarily for pour points along stream segments, not within parts of watersheds bordered by confluent streams. The second change can happen whenever a flow path runs close to a gap in a ridge. Whereas in one projection it might have been steered to one side of the gap, in another it might--due to the slight differences in distortion--get steered to the other side. I suspect this is relatively rare and it should primarily affect minor subwatersheds high along the periphery of any major watershed.

Thus, ultimately, the qualitative nature of the watershed structure should change little, but quantitatively (in terms of relative area) it could change noticeably upon reprojection.

What to do then? If you're stuck with this eight-direction-only algorithm, the key is to get the relative directions right. By definition, this requires the use of a conformal projection, or at least one which is very close to conformal. But, because conformal projections cannot be (exactly) equal area, for large-area work you don't want to use conformal projections to compute watershed areas. The solution is what I originally proposed:

Compute flow directions and delineate watersheds using a conformal projection.

Compute the areas (and land cover percentage, etc.) of delineated watersheds using (of course) an equal-area projection.

(Note that this does not guarantee accurate flow accumulation calculations. Those require good estimates of the areas while at the same time getting the flow directions right. One approach is to recognize that so much uncertainty, fudging, and assuming is going on to get to this point that we might just be splitting hairs. Another approach--worth considering when doing continent-level calculations--is that one can do flow accumulations in a conformal projection but adjust the inputs (the amount of "rain" falling in the watershed) according to the areal distortion. This is easier than it sounds when you use simple conformal projections such as Mercator or Stereographic, where the areal distortion is easy to compute mathematically.)

For small-area calculations, there always exist projections that are so close to being conformal and equal area that you don't have to bother using two projections (e.g., for areas that fit within a single UTM zone, use the UTM coordinates). This stuff really matters for study areas that are state or country or continent sized.

Because a GCS is reasonably free of distortion only near the equator (where (lat, lon) is approximately conformal and equal area), a good rule of thumb is do not do your grid calculations in lat-lon coordinates!

I still haven't covered all the nuances (for instance, small nearly random changes in estimated flow directions will occur when you uniformly rotate a grid except by multiples of 90 degrees, I glossed over all discussion of sinks and flat areas, and I haven't mentioned alternative (non-ArcGIS) algorithms), but I hope this analysis helps clarify the key aspects of the situation.

Blog

Tuesday, 10 July 2018