This paper analyzes the origins, implications, and solutions for the Asian financial crisis. From the perspective of a member of the Executive Board of the IMF, as Asian problems were building, the IMF overlooked weaknesses in bank and corporate balance sheets in much of Asia: the IMF was unaware of the extraordinary leverage of Korean companies, which in some cases reached a ratio of 600/1 debt to equity. The IMF did not focus on the weak accounting and disclosure practices of banks and nonbanks or generous rollovers of banks to their key clients.

In recent years, a number of researchers have claimed success in systematically predicting which countries are more likely to suffer currency crises, most notably Kaminsky, Lizondo, and Reinhart (1998) and its extension in Kaminsky (1998). In this note, we assess the success of this approach, emphasizing out-of-sample testing. First, we try to answer the question: If we had been using the KLR model in late 1996, how well armed would we have been to predict the Asia crisis? Second, we analyze a more general probit-based model for predicting currency crises. In the process, we test several basic assumptions underlying the signals approach.

1. INTRODUCTION

In recent years, a number of researchers have claimed success in systematically predicting which countries are more likely to suffer currency crises. Perhaps the most prominent model for predicting currency crises is the signals approach of Kaminsky, Lizondo, and Reinhart (1998) (hereafter KLR), who monitor a large set of monthly indicators that signal a crisis whenever they cross a certain threshold.

Most recent claims of success in predicting crises have focussed on in-sample prediction, that is on formulating and estimating a model using data on a set of crises, then judging success by the plausibility of the estimated parameters and the size of the prediction errors for this set of crises. The key test is not, however, the ability to fit a set of observations after the fact, but the prediction of future crises.

Kaminsky (1998) is an important paper for at least two reasons. First, she makes substantial methodological advances compared to KLR in aggregating information from the various indicators into a composite estimate of probability of crisis. Second, she applies this approach to the 1997 crises in an out-of-sample fashion, that is using pre-1997 data to estimate the model. However, while she presents out-of-sample estimates of the probability of currency crisis, she does not provide tests of whether these forecasts are better than, for example, guesswork.

In this note, we assess the success of the KLR approach and its extension in Kaminsky (1998), emphasizing out-of-sample testing.1 First, we address the question: If we had been using the KLR model in late 1996, how well armed would we have been to predict the Asia crisis? Second, we estimate a set of alternative models (BP probit-based models) using the data and crisis definition of the KLR method but with a different approach to generating crisis probabilities from the data. In the process, we test several basic assumptions underlying the signals approach. These BP models did not exist prior to the crises they attempt to predict and to that extent do not generate pure out-of-sample forecasts. However, the methodological innovations were not inspired by events in 1997, nor’ did we use success or failure in predicting 1997 outcomes to aid in the specification of the alternative models.

2. PREDICTING 1997 WITH THE ORIGINAL KLR MODEL

We can, following Kaminsky (1998), calculate the weighted-sum-based probabilities of crisis.2 This produces a series of estimated probabilities of crisis for each country.3 How good are these forecasts? Kaminsky (1998) concentrates on plots of these probabilities against actual crisis dates. Clearly, some crises are called and some are missed, but beyond that it is difficult to draw conclusions from these figures.4 More generally, one of the challenges in assessing the predictive success of the model is that the estimated probability of crisis cannot be compared with the unobservable actual probability of crisis, but only with the occurrence or not of a crisis. For zero/one dependent variables, it is natural to ask what fraction of the observations are “correctly called.” First, a cut-off level for the predicted probability of crisis is defined such that a crisis is predicted if the predicted probability is above this threshold. An observation is then correctly called if either (i) the predicted probability of crisis is above the threshold (an alarm is issued) and a crisis in fact follows within twenty-four months, or (ii) the probability of crisis is below the threshold and no crisis in fact ensues. A false alarm occurs when the estimated probability is above the threshold and there is no crisis within twenty-four months. Such “goodness-of-fit” data are shown in the first two columns of Table 1 for two cutoffs: fifty percent and twenty-five percent.

1A pre-crisis period of correctly called when the estimated probability of crisis is above the cut-off probability and a crisis ensues within 24 months.

2A tranquil period is correctly called when the estimated probability of crisis is below the cut-off probability and no crisis ensues within 24 months.

3A false alarm is an observation with an estimated probability of crisis above the cut-off (an alarm) not followed by a crisis ensues within 24 months.

1A pre-crisis period of correctly called when the estimated probability of crisis is above the cut-off probability and a crisis ensues within 24 months.

2A tranquil period is correctly called when the estimated probability of crisis is below the cut-off probability and no crisis ensues within 24 months.

3A false alarm is an observation with an estimated probability of crisis above the cut-off (an alarm) not followed by a crisis ensues within 24 months.

What can we conclude? A natural question is whether the estimated probability of crisis is above fifty percent prior to actual crises. The goodness-of-fit rows show that only four percent of the time was the predicted probability of crisis above fifty percent in cases when there was a crisis within the next twenty-four months, during the 1995:5 to 1997:12 period.5

If we are more interested in predicting crises than predicting tranquil periods and are not so worried about calling too many crises, we may want to consider an alarm to be issued when the estimated probability of crisis is above a lower cutoff. Table 1 shows that the Kaminsky (1998) probability estimates are above twenty-five percent in twenty-five percent of the precrisis observations. Even with only one quarter of the crisis observations correctly called, sixty-three percent of alarms are false, in that no crisis in fact ensues within twenty-four months. We also generate weighted-sum based probabilities drawing on the fifteen KLR variables augmented with two additional variables, the level of M2 to reserves and the ratio of the current account to GDP.6 The addition of these variables improves out-of-sample performance slightly, as shown in the second column. In particular, thirty-two percent of the precrisis observations are called correctly. Most alarms, however, are still false.

This may sound like poor performance. It is worth noting, though, that these forecasts are significantly better than random guesses, both economically and statistically. The forecasts from the augmented KLR model in column (2), for example, suggest that the probability of a crisis within twenty-four months conditional on an alarm (using the twenty-five percent cut-off) is forty percent, which is somewhat higher than the unconditional probability of twenty-seven percent. A Pesaran-Timmermann test rejects, at the one percent level of significance, the hypothesis that the forecasts are no better than guesses based on the unconditional probability of crisis in the sample.

So far we have examined the ability of the models to predict the approximate timing of crises for each country. We can also evaluate the cross-sectional success of the models’ predictions in identifying which countries are vulnerable in a period of global financial turmoil such as 1997. The question here is whether the models assign higher predicted probabilities of crisis to those countries that had the biggest crises. Forecasting performance can be evaluated in this manner by comparing rankings of countries based on the predicted and actual crisis indices. Table 2 shows countries’ actual crisis index and predicted probability of crisis in 1997 for the various different forecasting methods.7 The table also shows the Spearman correlation between the actual and predicted rankings and its associated p-value, as well as the R2 from a bivariate regression of the actual rankings on the predictions.

Table 2Correlation of Actual and Predicted Rankings based on KLR Approach

1The KLR crisis index (a weighted average of percentage changes in the exchange rate and reserves) is standardized by subtracting the mean and dividing by the standard deviation. Values above three are defined as a crisis and are shown in bold.

2Based on average of noise-to-signal weighted probabilities from during 1996:1-12, using out-of-sample estimates.

3Augmented with the inclusion of the current account and M2/reserves in levels.

4All probit models probabilities are average predicted probabilities for 1996:1-12, where model was estimated up to 1995:4.

5Spearman Rank Correlation of the fitted values and the actual crisis index and its p-value. The R2 is from a regressio of fitted values on actual values.

1The KLR crisis index (a weighted average of percentage changes in the exchange rate and reserves) is standardized by subtracting the mean and dividing by the standard deviation. Values above three are defined as a crisis and are shown in bold.

2Based on average of noise-to-signal weighted probabilities from during 1996:1-12, using out-of-sample estimates.

3Augmented with the inclusion of the current account and M2/reserves in levels.

4All probit models probabilities are average predicted probabilities for 1996:1-12, where model was estimated up to 1995:4.

5Spearman Rank Correlation of the fitted values and the actual crisis index and its p-value. The R2 is from a regressio of fitted values on actual values.

The KLR-based forecasts are clearly somewhat successful at ranking countries by severity of crisis. The actual rankings of countries in 1997 by their crisis index are significantly correlated with forecasts from the weighted-sum of indicators-based probabilities. With the original KLR variables, twenty-eight percent of the variance is explained. The addition of the current account and the level of M2/reserves brings the R2 up to thirty-six percent.

In sum, the KLR approach shows some promise. In particular, the fitted probabilities from the weighted-sum of indicators are significant predictors of crisis probability in 1997. This suggests the model may be useful in identifying which countries are vulnerable in a period following a global financial shock. Still, the overall explanatory power is fairly low. In addition, the overall goodness-of-fit for the out-of-sample predictions illustrate the low predictive power of the weighted-sum-based probabilities in predicting the timing of crisis. Although the model does significantly better than guesses based on the unconditional probability of crisis, most crisis are still missed and most alarms are false.

3. A PROBIT-BASED ALTERNATIVE MODEL

3.1 Methodology

In this section, we depart from the entire “signals” methodology that looks for discrete thresholds and calculates noise-to-signal ratios. Instead, we apply a probit regression technique to the same data and crisis definition as in KLR. In the process we test some of the basic assumptions of the KLR approach. Specifically, we embed the KLR approach in a multivariate probit framework in which the independent variable takes a value of one if there is a crisis in the subsequent twenty-four months and zero otherwise. This has three advantages: we can test the usefulness of the threshold concept; we can aggregate predictive variables more satisfactorily into a composite index, taking account of correlations among different variables; and we can easily test for the statistical significance of individual variables and the constancy of coefficients across time and countries.

KLR assume that the probability of crisis in the subsequent twenty-four months is a step function of the value of the indicator, equal to zero when the indicator variable is below the threshold and one at or above the threshold. They assume, for example, that when the real exchange rate continues to appreciate after it is already above the threshold, this does not increase the probability of crisis. In general, the relationship between a given indicator variable and the probability of crisis could take many more forms than a simple step function. Figure 1 presents various possible relationships between the probability of crisis (on the vertical axis) and the value of a variable P, measured as in KLR in percentiles (on the horizontal axis).8 The KLR assumption, in terms of Figure 1, is that α1 and α3 are zero while α2 is equal to 1. Other possibilities are also plausible. For example if α1 is nonzero and equal to α3 while α2 is equal to zero, then there is a linear relationship between the indicator measured in percentiles and the probability of a crisis. That is, to continue the example, increases in the degree of overvaluation increase the risk no matter how overvalued the exchange rate already is.

Figure 1.Relationship Between Predictive Variable and Probability of Crisis

We propose to let the data resolve the question of whether a step-function is in fact a reasonable description of the relationship between indicator variables and the probability of a crisis. To this end, we run bivariate probit regressions on the pooled panel in which the dependent variable is the KLR variable that takes a value of one if there is a crisis in the subsequent twenty-four months and zero otherwise. For each indicator we estimate equations of the form:

where c24 = 1 if there is a crisis in the next twenty-four months, p(x) = the percentile of the variable x, and I = 1 if the percentile is above some threshold T and zero otherwise. Thus, α1, α2, and α3 in equation 1 correspond to the α’s in Figure 1. We use the thresholds T calculated from the KLR algorithm, since we are interested primarily in testing their approach against a more general alternative.9

Figure 2 graphs these estimates for three important predictive variables: deviations of the real exchange rate (RER) from trend, the current account deficit as a share of GDP, and the growth rate of the ration of M2/reserves. Consider, for example, the RER. The first panel of Figure 2 gives a richer view of the relationship between overvaluation and the probability of crisis. The choppy line in this figure presents the fraction of times the observation of a given percentile for RER deviations is followed within twenty-four months by a crisis in the pooled data. The other line represents the estimated relationship discussed above. The message of this figure is that while the jump at the threshold is significant, it does not capture an important part of the variation in the probability of crisis as a function of RER deviations.

Figure 2.Average No. of Crises in Next 24 Months by Percentile of Variable

Current Account as Percent of GDP

M2/Reserves Growth Rate

Although the outcome of this analysis varies somewhat across indicators, the general lesson is that although the jump in probability of crisis at the threshold is often statistically significant, the underlying percentile variable is usually also important in explaining the variation in crisis probability.

Multivariate probits are the natural extension to the bivariate probits discussed so far. We have estimated three types of probit models that explain whether a crisis occurs in the next twenty-four months (hereafter designated BP models). Model 1 uses the indicator form of the variables, where the indicator equals 1 above the threshold and zero otherwise. In Model 2 the variables enter linearly, expressed as percentiles of the country-specific distribution of observations. Model 3 is the result of a simplification starting with the most general piece wise-linear specification for all the variables. From a starting point that allowed the estimation, for each variable, of the slope below the threshold, the jump at the threshold, and the slope above the threshold, we used a general-to-specific procedure to simplify to the most parsimonious representation of the data.

The results show that in Model 1 the probability of crisis is increased when the following variables exceed their thresholds: RER deviations, the current account, reserve growth, export growth, and both the level and growth rate of M2/Reserves. These variables also increase the probability of crisis when entered linearly in Model 2, except for the growth rate of M2/reserves, while reserve growth itself is now significant. In the simplified piecewise-linear Model 3, two variables (RER deviations and current account) enter with a significant slope below the threshold, a jump at the threshold, and a steeper slope above the threshold; two variables enter linearly (reserve and export growth); and for two variables (M2/reserves and M2/reserve growth) only the jump at the threshold is significant.

3.2 Predicting 1997 with the BP Models

To test the various probit models out-of-sample, we use data through 1995:4 to estimate the regression coefficients, then extend the explanatory variables to generate predictions for the period 1995:5–1997:12. The estimated probabilities can be evaluated using the probability scores and goodness-of-fit measures discussed above.

Table 1 shows that on all the scoring measures,10 the probits perform better than the probabilities based on the weighted-sum of indicators signaling. The linear model has the best scores, though the piecewise-linear model is close behind. Using the standard whereby a probability of crisis above twenty-five percent is considered an alarm, the linear and piecewise-linear probits perform well, much better than the weighted-sum based probabilities. The linear probit generates a probability of crisis above twenty-five percent in eighty percent of the periods that precede a crisis. Reflecting their greater prediction success, the probit models have a lower share of false alarms (crisis calls not followed by a crisis as a share of total crisis calls), as low as forty-nine percent for the linear model. Putting it slightly differently, for this model the probability of crisis within twenty-four months conditional on an alarm (using the twenty-five percent cutoff) is fifty-one percent, much higher than the unconditional probability of twenty-two percent.11

The linear model performs much better out-of-sample than the more general piecewise-linear model that includes a role for discrete jumps in the risk of crisis at the KLR thresholds. This suggests that the threshold and indicator concept add little to the explanatory power of the simple linear model in predicting crisis timing, at least for 1997. The worse out-of-sample performance of the indicator and piecewise-linear models (and similar or better in-sample performance) is consistent with the greater risk of data-mining in the indicator and piecewise-linear approaches.

As with the KLR models, we can also evaluate the performance of the probit models in predicting the cross-country incidence of crisis in 1997. Table 2 shows that country rankings based on all the probit forecasts are significantly correlated with actual crisis rankings in 1997. Forecasts based on the indicator probit rank countries more accurately than the weighted-sum of indicators-based forecasts, with an R2 close to one half. This superior performance is consistent with previous results that the KLR weighted-sum-of-indicators forecasts are outperformed by the analogous probit model. Somewhat anomalously, the other two probit models perform worse than the indicator probit. In particular, the ranking based on the linear model that had the best goodness-of-fit has the lowest, though still significant, correlation with the actual ranking.12

4. CONCLUSION

This paper has examined the extent to which the KLR signals model, originally formulated and estimated prior to 1997, would have helped predict the 1997 currency crises. The KLR-based probabilities of crisis perform fairly well out-of-sample. When this model issued an alarm (predicted probability above twenty-five percent) during the 1995:5 to 1996:12 period, a crisis would actually have followed in 1997 thirty-seven percent of the time. This compares to a twenty-seven percent unconditional probability of crisis in 1997. Moreover, its forecasted cross-country ranking of severity of crisis is a significant predictor of the actual ranking, with an R2 of twenty-eight percent. The addition of two variables to the KLR model, the level of the current account and M2/reserves, improves performance somewhat.

We have also compared the predictions of this model with a probit-based alternative, which we dub the BP model. The KLR forecasts perform better than some of the probits on a few of the measures, so this comparison is not unambiguous. Overall, though, the BP probit models provide generally better forecasts than the KLR models. Moreover, in contrasting the BP probit methodology with the KLR probabilities, the most direct comparison involves the indicator probit, as it also uses indicator predictive variables. Here in particular the probit generally outperforms. We also examine other probit specifications that do not embody the KLR indicators assumption and find that, while the results are not unambiguous, the linear model is the most successful.13

The testing performed here may give insight into the nature and causes of these crises independent of the value of the models as predictors.

Both models make significant out-of-sample predictions despite the omission of some heavily emphasized phenomena such as poor banking supervision and weak corporate governance.

The alternative method reproduces most of the KLR conclusions regarding which variables are important predictors of crisis. In particular, both approaches demonstrate that the probability of a currency crisis increases when the bilateral real exchange rate is overvalued relative to trend, reserve growth and export growth are low, and the growth of M2/reserves is high. Our analysis suggests, in addition to KLR, that a large current account deficit and a high ratio of M2 to reserves are important risk factors.

The out-of-sample comparison of different approaches provides some insight into important issues in the empirical modeling of currency crises. Most importantly, the data do not clearly support one of the basic ideas of the KLR indicator approach: that it is useful to interpret predictive variables in terms of discrete thresholds, the crossing of which is particularly significant for signaling a crisis. Both direct statistical tests and the generally superior performance of the BP linear model suggest that a better simple assumption is that the probability of crisis goes up linearly with changes in the predictive variables. There is, however, some evidence for nonlinearities of the sort assumed in KLR.

The probabilities are based on a weighted average of the indicators with the weights based on the noise-to-signal ratios of each indicator. Note that in assessing the usefulness of individual indicators in predicting crises, our results are broadly similar to KLR both when we replicate their sample and for a modified sample (omitting the five European countries and adding other emerging market economies). The discussion in this note is based on a twenty-three country sample comprised Argentina, Bolivia, Brazil, Chile, Colombia, Indonesia, Israel, Jordan, Malaysia, Mexico, Pakistan, Peru, Philippines, South Africa, Sri Lanka, Taiwan, Thailand, Turkey, Uruguay, Venezuela, and Zimbabwe.

We use data through 1995:4 to estimate the weighted-sum based composite indices, then extend the explanatory variables to generate out-of-sample crisis probabilities for the 1995:5-1997:12 period. Unlike Kaminsky (1998), we use only the good indicators, i.e., those with noise-to-signal ratio less than one.

Kaminsky (1998) also reports various scores measuring the overall size of the errors, analogous to the mean-squared error measure (not specifically reported for the out-of-sample forecasts, however). These are more helpful for comparing different models than for assessing how well a given model works in an absolute sense.

At the time of collection (April 1998), data were not available for December 1997 for all countries. More recent analysis has confirmed that the results reported here both for the KLR and BP models are not significantly affected by the addition of complete data for 1997 (some partial exceptions are noted below).

The predicted crisis probability is the average of the probabilities during 1996:1–12, using the out-of-sample estimates. Averaging over for example 1996:1 to 1996:6 gives somewhat different results. The actual crisis index used to rank the countries for 1997 is the maximum value of the monthly crisis index for each country during 1997.

P is measured in percentiles in that the observations on the underlying predictive variable for a given country, for example twelve-month percent changes in real domestic credit, are expressed in terms of percentiles of the distribution of that variable for the country in question.

This procedure is biased in favor of finding significant jump coefficients. Since we use the data itself to identify the biggest jump (through the KLR method), the subsequent tests will tend to find that the jumps we have found are unusually large. The tests we perform thus overestimate the statistical significance of the jump coefficient a2.

Probability forecasts can also be evaluated with accuracy and calibration scores that are analogs of a mean-squared error measure. Lower numbers indicate better performance. See Kaminsky (1998) and Berg and Pattillo (1998) for more discussion.

The contrast between the results of the rankings and goodness-of-fit comparisons is somewhat surprising but not inexplicable. The two measures are somewhat different and they need not correspond. The goodness-of-fit measure examines only whether crisis calls are correct or not and ignores the size of errors. The rankings comparison considers whether the highest probabilities of crisis are associated with the largest crises; the magnitude of the crisis, however, as distinct from whether or not there is a crisis, is not a factor in any of the models. These results are sensitive to the exact sample of countries involved in the ranking comparison. For example, eliminating Israel (one of the largest outliers) from the sample increases the R2 of the rankings predictions of the percentile probit model from twenty-three to forty-two percent. The addition of December 1997 data (not available as of April 1998 when the data for these results were collected) reverses the order of the ranking correlations, with the linear BP model performing somewhat better than KLR.

Performance of both the BP and KLR models is somewhat better in-sample than out-of-sample. In terms of relative performance of the different models, the conclusions drawn from the out-of-sample experience remain, except that the piecewise-linear probit model tends to perform somewhat better than the others in-sample. This result is not surprising given that it is a generalization of the other probit models. Its relatively greater degradation in out-of-sample performance is also not surprising in light of the greater risk of overfitting inherent in the more general specification.

KaminskyGraciela and M.Carmen. 1996 “The Twin Crises: The Causes of Banking and Balance-of-Payments Problems,” Board of the Governors of the Federal Reserve System International Finance discussion paper No. 544March.