This is somewhat of a theoretical question stemming from some discussions with colleagues on the topic of implications with delineating basins with projected (e.g., Albers Equal Area) vs. unprojected (NAD 83) data derived from a 10m DEM that's in NAD 83.

Some have stated that it's not an issue as the values calculated from unprojected data simply get adjusted if you do decide to project.

I'm not sure this is the case though, as there are inherent differences between data in a geographic coordinate system and projected data. I tried one example going through the routine starting with unprojected DEM data, then tested the same site with projected DEM data. Steps taken for both were done (all work done in ArcGIS 9.3.1) using 10m DEM data.

One run was done using a DEM in NAD 83, and the second run was done by projecting the same DEM into USA_Contiguous_Albers_Equal_Area_Conic_USGS_version.

In comparing the two I could notice a visual difference between the display of the Flow Direction grids.

NOTE: After more subsequent research I believe the striping effect is due to not using a CUBIC resampling but by mistakenly going with the default of NEAREST in the ArcGIS Project Raster tool. I don't believe this provides any sort of resolution to this debate though...

Flow directions using unprojected DEM

Flow directions using projected DEM

I understand that visual comparison is not 100% scientific but can be a good starting point.

Accordingly, there was a difference between the pour point with how it snapped for each run. And, there was a definite difference in the derived watersheds given how the snap pour point tool decided to snap based on the respective projected/unprojected datasets. The watershed shown in green is the watershed derived using the projected DEM and subsequent projected-derived elevation derivative data. The watershed shown in the purple outline is the watershed derived using the unprojected DEM data.

The watershed

I've come across these two other GIS forum threads (links below) that discuss this issue in the old ESRI forums, but I'm still not clear as to how the Flow Direction tool works relative to projected vs. unprojected data (I understand the concept of hydrologic flow and flow direction though). If each cell still has the same elevation value in a projected DEM vs. an unprojected DEM (is this correct?), why is there a difference in a flow direction raster derived from projected data versus one derived from DEM data in NAD83?

Also, would any differences theoretically be less of an issue if doing delineations in a higher Latitude such as, Shenandoah National Park in Virginia versus doing delineations in the state of Texas?

I spoke with one cartography expert that thought that the east-west distortion you get as you move away from the equator could likely be an issue (like how in some maps Canada is extremely bloated and distorted), in that if you're more than 10 degrees of latitude away from the equator they thought projected data is the way to go if you're concerned with accuracy.

One major unknown is the level of uncertainty with basins delineated using unprojected data that we're trying to get a handle on. There is a difference, but what is the magnitude?

Thanks to anyone that can provide a straight-forward answer to this discussion, or just some helpful insight into this.

Edit

The main issue we're interested/concerned with is if there will be accuracy issues with the delineated watersheds as a result of starting the process using an unprojected DEM.

So, if I'm understanding the reply, the delineated basins should be fine in terms of representing the drainage area for a pour point? It seems though if the flow directions are wrong that will result in some error in the final delineated watershed.

This is a very interesting and really an important topic - I have yet to see a report or documentation stating it's OK to use UN-projected data for delineating watersheds. I have set through ESRI User Conference technical talks led by the lead developer engineer of the Spatial Analyst extension (which houses the Hydrology tools) where they said you should use an equal area projection (such as Albers equal area) as well.

As well, there doesn't seem to be any authoritative "bible" standard for how to go about this - just seems that it's an almost acknowledged de facto approach to project the data before calculating your elevation derivatives.

Nowhere have I been able to find a concise and straightforward answer as to how this impacts flow direction calculation and subsequently the delineation of a watershed.

And, if you end up working with watersheds delineated using unprojected DEM data and then you project those watersheds, isn't the inaccuracy still there (e.g., in terms of determining a watershed area or any other characteristics such as land cover proportions etc)?

Furthermore, I'm assuming that projecting a flow direction raster that was derived from an unprojected DEM does not correct the errors either since the source data were unprojected....

thanks - appreciate any additional insight you can provide

EDIT - 20110331

@whuber:

thanks for this extensive discussion. We've been researching this issue more and actually have come across some references that suggest that it's actually better to not project the DEM before getting flow dir., flow accum., and delineating.

One email response from an anonymous source (but who is a pretty reputable person), when posed the question of 1.) project DEM 2.) produce derivatives OR 1.) produce derivatives 2.) project DEM said:

In a nutshell, it depends on the
derivative. For continuous
derivatives that will be visualized,
you should derive and then
project—this reduces the risk of tile
boundary artifacts being enhanced or
introduced (by the projection
algorithm) and then passed along to
the derivative if you were to project
the DEM first. The exception to
this is when you are also using
distance or area as the basis for your
derivative calculation. This is of
course relative to how large the
distances/areas are and how far you
can acceptably get away from the
equator. So imagine that for
derivatives like slope or hillshade,
which depend on the cellsize, there
are consequences. These derivatives
will be most accurate at the equator,
and the accuracy will degrade
significantly past 60 degrees north or
south. In both cases, I am
assuming the DEM covers a very large
area (wider than 1.5 UTM Zones) and a
traditional tile-based approach where
the tiles are either arbitrary or
conform to existing standards like
USGS Quad sheet boundaries. So
saying the implication is that much of
this thinking predates mosaic
datasets, which I am less able to
comment on. The main concern for me
would be wanting to know how well
matched the DEM tiles are. If they
are well matched (like NED) then I
expect things to work well, with
derivatives being derived from tiles
(as functions applied to the mosaic
dataset) and then these are displayed
on the fly. If they are not well
matched, then garbage in, garbage out.
Back to your original question, I
think if it is just watershed
boundaries, it would be possible to
derive these without projecting
because it’s not how much curvature or
slope that matters, just where it is
and that it exists.

They went on to say:

The reason I would stick to the
un-projected methodology is that we
are using rasters which are in and of
themselves a derivative of DEM (which
we typically don’t have, but think
LiDAR point cloud). For rasters that
cover very large areas, like
continents at relatively fine levels
of resolution, projecting to something
like Albers will result in loss or
introduction of information, when the
raster uses regular sized cells (like
Esri’s rasters do). That means tools
like Flow Accumulation will produce
results based on partial or
interpolated information.
Basically all projection algorithms
applied to rasters will cause problems
as soon as there is a expansion or
shrinkage of more than the distance of
a pixel width (projections like Albers
can introduce error by introducing new
pixels between two old ones).
Deriving from these means the
potential for cumulative error is
high.

This seems to suggest the opposite - that projecting introduces more noise, unless you get above 60 degrees latitude.

So, in the end, does it just boil down to a matter of 1.) where you're doing work on the earth's surface 2.) the scale you're working at, and 3.) whether the noise introduced by a projection that will better preserve attributes that affect the flow direction algorithm is less than the distortion introduced by unprojected data (the benefit increasing as you move towards the poles) to determine whether you should project to something like conformal, or if it doesn't matter?

When you start digging into this topic it seems like the larger consensus is to project, but there are some that seem to say that's not a hard and fast rule.

Are the differences between projected/unprojected greater when working in a really high latitude?
–
Kirk KuykendallMar 29 '11 at 22:38

@user If you have a good illustration and don't yet have the reputation to post it, make it available on the Web and provide the URL. Often a moderator will convert that into an embedded image on your behalf. (This is such an interesting question, though, that I expect you will quickly acquire the needed reputation through positive votes on the question alone. :-)
–
whuber♦Mar 29 '11 at 23:11

1

I disagree with some of the 20110331 conclusions. (1) is correct; (2) is irrelevant; (3) is correct but I think it is based on misunderstanding what projections do and how they work. Projection per se does not "introduce noise" but the resampling method can. However, that can be controlled and cleaned up so AFAIC it's a non-issue when you do things right. The advice you quote is generally good but it succumbs to a misperception I had before carrying out the analysis in my reply: even when "it's just watershed boundaries," it can matter whether you project or not.
–
whuber♦Apr 2 '11 at 3:30

1

(Continued) I do agree there are no "hard rules" here, but there definitely are principles and there is a good basis for quantitative analysis to make informed, effective processing decisions for any given dataset and investigation objective. I tried to show those principles in the edit to my reply. In the end the principles will stand up; if you understand them, you should rely on them and your own thinking, not on any authority or "consensus."
–
whuber♦Apr 2 '11 at 3:33

1 Answer
1

You are correct that distortions in the projection can bias flow direction (and flow accumulation) estimates. (Using "unprojected" data is tantamount to using the highly distorting Plate Carree projection.)

For merely delineating basins, though, there actually is little problem: although the flow directions and flow amounts will be wrong, the projection won't cause water to appear to flow into areas it doesn't go. Downhill is still downhill.

By means of simple examples, it's not hard to see where the bias comes from. Consider two points 141 meters apart, one northeast of the other and immediately downgradient. The flow direction therefore is due northeast. In coordinates, the downgradient point is offset 100 meters in the x direction and 100 meters in the y direction. If you are at (say) latitude 60 degrees using unprojected data, the offsets will actually look like 200 meters in the x direction and 100 meters in the y direction. (200 = 100/cos(60).) That translates into a bearing of 63 degrees east of north rather than 45 degrees. In many flow direction/flow accumulation/delineation algorithms only 8 cardinal directions are possible. Thus, instead of indicating a northeast flow, the grid might shift this into a due easterly flow.

(The 63 degrees is computed trigonometrically as a function of the relative distortion in the projection between the direction of maximum distortion and the direction of minimum distortion. This begins to quantify the effect of using unprojected data.)

A good way to visualize this is to draw the 8 compass directions correctly on a sheet of rubber. Stretch the rubber sideways (with more stretch for higher latitudes): the more you stretch, the more the arrows all tend to point east-west. In those directions the angles shrink, while towards the north and south the angles expand. In the meantime, the elevations on the grid remain unchanged. The result is that both the slope and the aspect of the land are distorted, because they depend on the rate of change of elevation with respect to the positional coordinates.

There will actually be more of an issue in Virginia than in Texas because of this. Your cartographer is correct. (I don't know where the 10 degree cutoff comes from, though. It sounds reasonable but rules of thumb like this need to be assessed in light of your accuracy requirements. In some cases you can get away with no projection and in others you might want much more accuracy.)

Most of these issues become moot when you adopt an appropriate workflow. Begin by projecting your data with the best conformal projection you can find (because there are no distortions of relative angles). Compute flow and anything else that involves the direction information. Then unproject (or reproject) the results back to whatever coordinate system you want to use for follow-on analysis or mapping. For instance, to compute areas of the delineated basins, reproject with an equal-area projection. The point is that reprojecting is simple enough that you can afford to, and should, change projections as needed to accommodate the calculations and mapping you are performing: you're not stuck with a single compromise projection.

Edit

An addendum to the original question focuses on watershed delineation. Let's address this. To do so, we need to understand how flow directions are estimated.

The direction of flow is determined by the direction of steepest descent from each cell.

Specifically, let x[0,0] designate the value in a cell and let x[i,j] designate the value in the cell i columns to the right and j rows below. Apart from some special cases dealing with sinks and resolving ties, the algorithm selects the largest of the eight directional slope estimates (x[0,0]-x[i,j])/Sqrt[i^2+j^2] where |i| <= 1 and |j| <= 1 and assumes that is the direction of flow. These numbers are ratios: the numerators are differences in elevation and the denominators are distances computed via the Pythagorean Theorem in whatever coordinates are in use.

Upon reprojecting the grid, two things happen: (1) the cells are moved (and distorted as this happens) and therefore (2) the grid values (elevations) are resampled onto the lattice of cells for the new grid. Small changes in elevation can occur due to resampling and these could induce occasional changes in the estimated flow direction. Typically such changes should be rare, so let's ignore them. These changes will be dwarfed by changes induced by metric distortions in the reprojection. For instance, in reprojecting from Plate Carree (essentially a geographic coordinate system) into a conformal projection, the east-west direction will shrink by the cosine of the latitude. In the space (along a row) where one cell used to fit, 1/cos(latitude) cells now have to fit. This will typically magnify any apparent slope estimate in any direction having an east-west component (i.e., the NE, E, SE, SW, W, and NW directions). Whereas earlier such slopes might not have appeared to be the largest, and therefore were not selected by the ArcGIS algorithm, by being made larger they might now be selected as the flow direction. Accordingly, at many places a north or south flow direction will be converted into NE, NW, SE, or SW, and a NE direction might be converted into due E, etc.

The effects of any reprojection can be predicted using a similar calculation: you need to know the directional distortions that occur in going from one to the other.

Let's consider what it means to "be in the watershed" of a "pour point" x. Let's agree that any location y "lies in the watershed of x" means that if the surface were bare, frictionless, impermeable, and smooth, and if water were to flow without spreading (purely advective flow), then it would flow from y to x. That, anyway, is what the GIS does in computing flow accumulation (which is at the heart of watershed delineation).

In most locations, when the pour point x lies along a stream bed, the distortions from reprojection make no essential difference: they cause the apparent flow path from y to x to change, but ultimately the water arrives in the same stream bed anyway, albeit perhaps by a slightly different route. If any discrepancy occurs, it must be because either (a) the flow path arrives further downgradient along the stream from x (and so y is no longer considered to be in the watershed of x), (a') points y' that flowed into points downstream of x now flow into x (and so now are included in the watershed of x) or (b) the new flow path goes into a different stream (which is really a special case of (a) and (a')). The first (a and a') might happen a lot, but it will create differences primarily for pour points along stream segments, not within parts of watersheds bordered by confluent streams. The second change can happen whenever a flow path runs close to a gap in a ridge. Whereas in one projection it might have been steered to one side of the gap, in another it might--due to the slight differences in distortion--get steered to the other side. I suspect this is relatively rare and it should primarily affect minor subwatersheds high along the periphery of any major watershed.

Thus, ultimately, the qualitative nature of the watershed structure should change little, but quantitatively (in terms of relative area) it could change noticeably upon reprojection.

What to do then? If you're stuck with this eight-direction-only algorithm, the key is to get the relative directions right. By definition, this requires the use of a conformal projection, or at least one which is very close to conformal. But, because conformal projections cannot be (exactly) equal area, for large-area work you don't want to use conformal projections to compute watershed areas. The solution is what I originally proposed:

Compute flow directions and delineate watersheds using a conformal projection.

(Note that this does not guarantee accurate flow accumulation calculations. Those require good estimates of the areas while at the same time getting the flow directions right. One approach is to recognize that so much uncertainty, fudging, and assuming is going on to get to this point that we might just be splitting hairs. Another approach--worth considering when doing continent-level calculations--is that one can do flow accumulations in a conformal projection but adjust the inputs (the amount of "rain" falling in the watershed) according to the areal distortion. This is easier than it sounds when you use simple conformal projections such as Mercator or Stereographic, where the areal distortion is easy to compute mathematically.)

For small-area calculations, there always exist projections that are so close to being conformal and equal area that you don't have to bother using two projections (e.g., for areas that fit within a single UTM zone, use the UTM coordinates). This stuff really matters for study areas that are state or country or continent sized.

Because a GCS is reasonably free of distortion only near the equator (where (lat, lon) is approximately conformal and equal area), a good rule of thumb is do not do your grid calculations in lat-lon coordinates!

I still haven't covered all the nuances (for instance, small nearly random changes in estimated flow directions will occur when you uniformly rotate a grid except by multiples of 90 degrees, I glossed over all discussion of sinks and flat areas, and I haven't mentioned alternative (non-ArcGIS) algorithms), but I hope this analysis helps clarify the key aspects of the situation.