I have to calculate the average upslope percent slope for a large dataset, the basic method is detailed here. However, I've begun to wonder if harmonic mean might be more appropriate than the standard arithmetic mean, since it is technically a rate of change. I haven't seen this turn up in any of the other discussions on averaging slope over points, areas, lines, etc. It should be fairly straightforward to accomplish.

edit: The purpose of calculating average slope in this case is to generate one parameter (of many) to be used in modeling channel initiation thresholds. I have a set of field-collected channel head locations that I will collect the flow accumulation, various average upslope parameters, etc, and will use multiple linear regression to try to describe accumulation thresholds in terms of the other parameters.

It depends on why you're computing the average slope. What's the purpose? What physical quantity are you trying to measure? Although many forms of average are legitimate, beware of the harmonic mean: it causes problems when any slope is zero, which frequently happens.
–
whuber♦Mar 1 '13 at 22:18

3 Answers
3

Average slope sounds like a natural quantity but it's rather a strange thing. For instance, the average slope of a flat horizontal plain is zero, but when you add a tiny bit of random, zero-average noise to a DEM of that plain, the average slope can only go up. Other strange behaviors are the dependence of the average slope on DEM resolution, which I have documented here, and its dependence on how the DEM was created. For instance, some DEMs created from contour maps are actually slightly terraced--with tiny abrupt jumps where the contour lines lie--but otherwise are accurate representations of the surface on the whole. Those abrupt jumps, if given too much or too little weight in the averaging process, can change the average slope.

Bring up weighting is relevant because, in effect, a harmonic mean (and other means) are differentially weighting the slopes. To understand this, consider the harmonic mean of just two positive numbers x and y. By definition,

where the weights are a = y/(x+y) and b = x/(x+y). (These deserve to be called "weights" because they are positive and sum to unity. For the arithmetic mean, the weights are a=1/2 and b=1/2). Evidently, the weight attached to x, equal to y/(x+y), is large when x is small compared to y. Thus harmonic means over-weight the smaller values.

It may help to broaden the question. The harmonic mean is one of a family of averages parameterized by a real value p. Just as the harmonic mean is obtained by averaging the reciprocals of x and y (and then taking the reciprocal of their average), in general we may average the pth powers of x and y (and then take the 1/pth power of the result). The cases p=1 and p=-1 are the arithmetic and harmonic means, respectively. (We can define a mean for p = 0 by taking limits and thereby obtain the geometric mean as a member of this family, too.) As p decreases from 1, the smaller values are more and more heavily weighted; and as p increases from 1, the larger values are more and more heavily weighted. It follows that the mean can only increase as p increases and must decrease as p decreases. (This is evident in the second figure below, in which all three lines are either flat or increasing from left to right.)

Taking a practical view of the matter, we might instead study the behavior of various means of slopes and add this knowledge to our analytical toolbox: when we expect slopes to enter into a relationship in such a way that smaller slopes ought to be given more of an influence, we might choose a mean with p less than 1; and conversely, we might increase p above 1 in order to emphasize the largest slopes. To this end, let's consider various forms of drainage profiles in the vicinity of a point.

To show what could go on, I have considered three qualitatively different local terrains: one is where all slopes are equal (which makes a good reference); another is where locally we are situated at the bottom of a bowl: around us the slopes are zero, but then gradually increase and eventually, around the rim, become arbitrarily large. The inverse of this situation occurs where near us the slopes are moderate but then level off away from us. That would seem to cover a fairly extreme range of behaviors.

Here are pseudo 3D plots of these three types of drainage forms:

Here I have computed the mean slope of each--with the same color coding--as a function of p, letting p range from -1 (harmonic mean) through 2.

Of course the blue line is horizontal: no matter what value p takes on, the mean of a constant slope cannot be anything other than that constant (which has been set to 1 for reference). The high slopes around the far rim of the red bowl strongly influence the mean slopes as p varies: notice how large they become once p exceeds 1. The horizontal rim in the third (gold-green) surface causes the harmonic mean (p=-1) to be zero.

It is noteworthy that the relative positions of the three curves changes at p=0 (the geometric mean): for p greater than 0, the red bowl has larger average slopes than the blue, while for negative p, the red bowl has smaller average slopes than the blue. Thus, your choice of p can alter even the relative position of average slopes.

The profound effect of the harmonic mean (p=-1) on the yellow-green shape should give us pause: it shows that when there are enough small slopes in the drainage, the harmonic mean can be so small that it overwhelms any influence of all the other slopes.

In the spirit of an exploratory data analysis, you might consider varying p--perhaps letting it range from 0 to slightly greater than 1 in order to avoid extreme weights--and finding which value creates the best relationship between mean slope and the variable you are modeling (such as channel initialization thresholds). "Best" usually is understood in the sense of "most linear" or "creating constant [homoscedastic] residuals" in a regression model.

I undertook an empirical approach to find a complementary answer to the excellent theoretical answer by whuber. I decided to calculate the slope in degrees and average that using an angular average. Next, I calculated the arithmetic and harmonic means of the percent slope I created a set of sample points randomly located in the study area. I requested 2000 points with a minimum distance of 100m, which yielded 1326 points. I sampled the values of each mean slope raster at each point, and converted the percentage means to degrees using the formula Degrees = atan(percent/100). My assumption here is that the angular mean will produce the "correct" mean slope in degrees, and whichever percentage mean came closer to it would be the correct procedure.

Next, I compared all non-zero values using a Kruskal-Wallace test (the assumptions being that for most zero slope values, it would be zero in all three, and that zero values would mask the differences between methods). I found a significant difference between the three (chi-square = 17.9570, DF = 2, p = 0.0001), so I further examined the data using the Dunn's Procedure using alpha= 0.05 (Elliot and Hynan 2011). The end result is that the arithmetic and harmonic mean are significantly different from each other, but neigher is significantly different from the angular mean:

If my assumptions were all correct (they very well might not be), this means that while the harmonic and arithmetic means create different values from each other, they are both "close engough" to the angular mean to be acceptable. There are two other caveats here that I can think of (please add any others if you think of them):

A larger sample size might find a significant difference between the the percentage means and the angular mean. However, my sample size was ~1000 points for just the non-zero values.

Since my sample points were place without regard to drainage basins, there may be some pseudo-replication involved, as any mean slope is going to be related to mean slopes above it.

This is interesting (+1), but beware of the limitations. (1) Yes, if you choose a larger sample size, you will find that all differences are significant. It therefore makes no sense to conduct a statistical hypothesis test: you want to focus on the amount of difference among the procedures. (2) Your results depend entirely on the actual properties of your data. They will vary with other datasets. (3) The angular mean is useful as a reference but it is by no means a preferred value. Which to use as a reference depends entirely on how the mean will be used in further analyses or mapping.
–
whuber♦Mar 7 '13 at 22:27

Given the assumption that no parameters defining the slope are known, any statistician would say to use the slope that minimizes the RMS deviations of the data from it. (Of course, whuber's examples don't qualify since he's chosen mathematically-generated landforms, but for real landforms the no-known-parameters assumption should be valid.)

This reply is appreciated, but I think it misunderstands the situation. Most significantly, these slopes are not used to fit curves: the concept of "RMS deviations of the data" is just not applicable. Second, I have chosen qualitative landform types to span a wide spectrum of what will really be encountered, so I maintain they give useful information about what to expect. Real datasets don't contribute much to understanding what is going on here, because there is no such thing as a "true" average slope. The main question is what averages will be useful or informative.
–
whuber♦Mar 7 '13 at 22:31

1

BTW, I believe I have some qualifications as a statistician. That does not make my opinion about this matter any better or any worse: as with anyone else, I need to back it up as clearly and objectively as I can, and I am quite susceptible to being wrong and having to change my mind :-). I just offer this point as a counter to your "any statistician" remark.
–
whuber♦Mar 7 '13 at 22:34

The question of what fit is useful, I submit, depends on what the slope is to be used for. For land slump potential, for example, the steeper slopes would be weighted higher compared to mild slopes in accordance with a slump potential vs. slope model, then the RMS fit approach should be valid. Other weighting models would be used matching other uses. In short, model everything we know by weighting or other means, then rely on RMS as the model for everything we don't, is what I'm suggesting.
–
johnsankeyMar 20 '13 at 12:09

I agree with the premise of that comment, John, but I do not see how your conclusion follows. If the steeper slopes are to receive heavier weights, then it seems RMS is just what you do not want to do, because it weights all deviations equally, regardless of slope. Moreover, RMS, as a quadratic loss function, cannot be a universal replacement for what other techniques can achieve, including nonlinear re-expressions of the slope and the use of alternative loss functions (as exploited by robust fitting methods for instance).
–
whuber♦Mar 20 '13 at 14:44