Recent comments

The MAPE (mean absolute per­cent­age error) is a pop­u­lar mea­sure for fore­cast accu­racy and is defined as

where denotes an obser­va­tion and denotes its fore­cast, and the mean is taken over .

Arm­strong (1985, p.348) was the first (to my knowl­edge) to point out the asym­me­try of the MAPE say­ing that “it has a bias favor­ing esti­mates that are below the actual val­ues”. A few years later, Arm­strong and Col­lopy (1992) argued that the MAPE “puts a heav­ier penalty on fore­casts that exceed the actual than those that are less than the actual”. Makri­dakis (1993) took up the argu­ment say­ing that “equal errors above the actual value result in a greater APE than those below the actual value”. He pro­vided an exam­ple where and , so that the rel­a­tive error is 50÷150=0.33, in con­trast to the sit­u­a­tion where and , when the rel­a­tive error would be 50÷100=0.50.

Thus, the MAPE puts a heav­ier penalty on neg­a­tive errors (when ) than on pos­i­tive errors. This is what is stated in my text­book. Unfor­tu­nately, Anne Koehler and I got it the wrong way around in our 2006 paper on mea­sures of fore­cast accu­racy, where we said the heav­ier penalty was on pos­i­tive errors. We were prob­a­bly think­ing that a fore­cast that is too large is a pos­i­tive error. How­ever, fore­cast errors are defined as , so pos­i­tive errors arise only when the fore­cast is too small.

By that def­i­n­i­tion, the adjusted MAPE can be neg­a­tive (if ), or infi­nite (if ), although Arm­strong claims that it has a range of (0,200). Pre­sum­ably he never imag­ined that data and fore­casts can take neg­a­tive val­ues. Strangely, there is no ref­er­ence to this mea­sure in Arm­strong and Col­lopy (1992).

Makri­dakis (1993) pro­posed almost the same mea­sure, call­ing it the “sym­met­ric MAPE” (sMAPE), but with­out cred­it­ing Arm­strong (1985), defin­ing it

How­ever, in the M3 com­pe­ti­tion paper by Makri­dakis and Hibon (2000), sMAPE is defined equiv­a­lently to Armstrong’s adjusted MAPE (with­out the absolute val­ues in the denom­i­na­tor), again with­out ref­er­ence to Arm­strong (1985). Makri­dakis and Hibon claim that this ver­sion of sMAPE has a range of (-200,200).

Flo­res (1986) pro­posed a mod­i­fied ver­sion of Armstrong’s mea­sure, defined as exactly half of the adjusted MAPE defined above. He claimed (again incor­rectly) that it had an upper bound of 100.

Of course, the true range of the adjusted MAPE is as is eas­ily seen by con­sid­er­ing the two cases and , where , and let­ting . Sim­i­larly, the true range of the sMAPE defined by Makri­dakis (1993) is . I’m not sure that these errors have pre­vi­ously been doc­u­mented, although they have surely been noticed.

Good­win and Law­ton (1999) point out that on a per­cent­age scale, the MAPE is sym­met­ric and the sMAPE is asym­met­ric. For exam­ple, if , then gives a 10% error, as does . Either would con­tribute the same incre­ment to MAPE, but a dif­fer­ent incre­ment to sMAPE.

Anne Koehler (2001) in a com­men­tary on the M3 com­pe­ti­tion, made the same point, but with­out ref­er­ence to Good­win and Lawton.

Whether sym­me­try mat­ters or not, and whether we want to work on a per­cent­age or absolute scale, depends entirely on the prob­lem, so these dis­cus­sions over (a)symmetry don’t seem par­tic­u­larly use­ful to me.

They still called it a mea­sure of “per­cent­age error” even though they dropped the mul­ti­plier 100. At least they got the range cor­rect, stat­ing that this mea­sure has a max­i­mum value of two when either or is zero, but is unde­fined when both are zero. The range of this ver­sion of sMAPE is (0,2). Per­haps this is the def­i­n­i­tion that Makri­dakis and Arm­strong intended all along, although nei­ther has ever man­aged to include it cor­rectly in one of their papers or books.

As will be clear by now, the lit­er­a­ture on this topic is lit­tered with errors. The Wikipedia page on sMAPE con­tains sev­eral as well, which a reader might like to correct.

If all data and fore­casts are non-​​negative, then the same val­ues are obtained from all three def­i­n­i­tions of sMAPE. But more gen­er­ally, the last def­i­n­i­tion above from Chen and Yang is clearly the most sen­si­ble, if the sMAPE is to be used at all. In the M3 com­pe­ti­tion, all data were pos­i­tive, but some fore­casts were neg­a­tive, so the dif­fer­ences are impor­tant. How­ever, I can’t match the pub­lished results for any def­i­n­i­tion of sMAPE, so I’m not sure how the cal­cu­la­tions were actu­ally done.

Per­son­ally, I would much pre­fer that either the orig­i­nal MAPE be used (when it makes sense), or the mean absolute scaled error (MASE) be used instead. There seems lit­tle point using the sMAPE except that it makes it easy to com­pare the per­for­mance of a new fore­cast­ing algo­rithm against the pub­lished M3 results. But even there, it is not nec­es­sary, as the fore­casts sub­mit­ted to the M3 com­pe­ti­tion are all avail­able in the Mcomp pack­age for R, so a com­par­i­son can eas­ily be made using what­ever mea­sure you prefer.

Thanks to Andrey Kostenko for alert­ing me to the dif­fer­ent def­i­n­i­tions of sMAPE in the lit­er­a­ture.

Post navigation

I’d like a bet­ter under­stand­ing of how the heav­ier penalty MAPE puts on over fore­cast­ing is rel­e­vant for fore­cast eval­u­a­tion and model selection.

In some sense, I don’t see the asym­me­try– if we hold the actual value fixed, MAPE for over fore­cast­ing and under fore­cast­ing of the same absolute mag­ni­tude will be the same. E.g. for actual value 100, fore­casts of 50 and 150 give equiv­a­lent MAPE (50%). Doesn’t this imply that given an expected value for the actual obser­va­tion of the fore­cast hori­zon, MAPE treats over and under fore­cast­ing equally when­ever the mag­ni­tude of fore­cast error is the same?

We only get the asym­me­try, it seems, if we hold the mag­ni­tude of fore­cast error the same and vary the expected value for the actu­als, which doesn’t seem prac­ti­cally relevant.

It’s not true, in other words, that you can “cheat” by low-​​balling a fore­cast in order to improve fore­cast MAPE; as long as that’s the case, what is the prob­lem with using it, as it’s not going to favor mod­els that under fore­cast over those that over fore­cast? (I’m assum­ing here that we don’t need to worry about inter­mit­tent demand.)

Any direc­tion here would be most appre­ci­ated; your blog has been an invalu­able resource in my busi­ness fore­cast­ing education.

http://robjhyndman.com/ Rob J Hyndman

I agree that it makes more sense to con­sider the case where the actual stays the same and the fore­casts vary, because we can’t change actu­als we can only change forecasts.

Matt

Thanks, good to get some clar­ity here. It would be a shame to avoid a sim­ple met­ric like MAPE based on a mis­un­der­stand­ing. MASE is help­ful too, though in some cases one won’t have a naïve fore­cast to work with (e.g. for the first period of a new product’s sales).

Matt

I should add (and this is from your Arm­strong ref­er­ence) that it’s true that under fore­cast­ing has a max­i­mum MAPE 100% (in the case where the fore­cast is always zero), whereas over fore­cast­ing has no upper bound; this is assum­ing that the fore­cast is always pos­i­tive, of course. This still seems to have lim­ited sig­nif­i­cance to the ques­tion of whether one should use MAPE in assess­ing fore­casts, pro­vided that zero fore­casts are not com­mon in practice.

http://robjhyndman.com/ Rob J Hyndman

It’s zero (or very small) actu­als that is the issue, not zero fore­casts. They come up a lot. e.g., if you are try­ing to pre­dict stock returns.

Matt

Absolutely right, that was a slip on my part.

Chad Scher­rer

For most appli­ca­tions of this, the val­ues are pos­i­tive, and it makes sense to either use a model with a log link (as in a GLM) or to just log-​​transform the response. So is there any rea­son to pre­fer MAPE over some sta­tis­tic (MSE or MAE, per­haps) of the resid­u­als on the log scale? If the big deal is hav­ing them as per­cent­ages, I guess you could do some­thing weird like use a base 1.01 for the log. Still seems more sen­si­ble and less arbi­trary than MAPE, which has no con­nec­tion to the loss func­tion of any model I’ve ever seen.