CAUSALITY (Social Science)

In economics and the social sciences the value of many variables (such as price of a product, crime rate, level of illiteracy, personal income, and consumption) is observed with great regularity. As a result, an empirical generating mechanism can be postulated that produces the observed values of the variable of interest. The investigation and understanding of this mechanism is one of the main tasks for social scientists and by doing so the issue of causation inevitably arises. Causation can be discussed in very general, abstract terms or the discussion can focus on the specific question of whether or not it is possible to test for causation using the data available.

The latter requires an operational procedure and definition (mechanisms and subsystems) and this formulation arises because of a lack of understanding of the working of a complex system. In this formulation, each mechanism, which might be represented by an equation, determines the value of a particular variable as a function of some others. The variable whose value is so determined (dependent or endogenous variable) is called the effect of the working of that particular mechanism, while the values of other variables entering into the mechanism (independent or exogenous variables) are the causes of that effect.

As a specific example, theoretical analysis may say that "y is a function of x," that is, changes in the independent variable x generate changes in the independent variable y. One might write this as y = f(x). Empirical analysis then attempts to estimate the actual strength of the relationship between y and x. So, empirical seeks to

Much of economics and social science is concerned with cause-and-effect propositions insofar as these disciplines pose causal relationships that postulate that a dependent variable’s movements are causally determined by movements in a number of specific independent variables. However, one should not be deceived by the words dependent and independent. Although many theoretical economic relationships are causal by their nature, statistical analysis, for example linear regression, cannot prove causality. All regression analysis can do is test whether a significant quantitative relationship exists, measure the strength of this relationship, and postulate the direction of the quantitative relationships involved. Regression analysis cannot confirm causality. Judgments referring to causality are made through various causality tests.

The objective of any causal analysis is to try to influence the degree of belief held by an individual about the correctness of some causal theory. Hence, the task of the analysis is not to be complete in itself, but rather to have enough value to make one consider one’s belief. There are basically two types of causal testing situations. In a cross-sectional causality analysis the question asked is why this variable behaves differently from the other. In a temporal causality analysis the question asked is why this variable changes behavior from period to period. Although many important economic questions can be phrased in the cross-section causal situation, they have received little causal testing in that context and many tests have been conducted for economic questions that can be stated as temporal causation. The definitions of causality and their interpretations may differ between cross-section and time-series cases. In all cases, however, the classification of variables into exogenous and endogenous and the causal structure of the mechanism (econometric model) are under scrutiny.

The relation between exogeneity and causality is the heart of any investigation into causal analysis. There are a number of definitions of exogeneity: weak, super, and strong exogeneity. A variable is said to be weakly exogenous for estimating a set of parameters if inference on the parameters conditional on this exogenous variable involves no loss of information. The concept of superexogeneity is related to the Lucas critique, which states that if a variable is weakly exogenous and the parameters in the equation remain invariant to changes in the marginal distribution of the variable, then the variable is said to be superexogenous. A variable is strongly exogenous if it is weakly exogenous and at the same time is not preceded by any of the endogenous variables of the model. The concept of strong exogeneity is linked to the concept of Granger causality, and should be considered as a test of precedence rather than causality as such. Hence, a variable is defined to be strongly exogenous if it is weakly exogenous and it is not caused by any of the endogenous variables in the Granger sense. However, in the usual simultaneous equations literature there is doubt as to what extent the test for Granger noncausality is useful as a test for exogeneity. Nevertheless, some argue that Granger noncausality is useful as a descriptive device for time-series data.

The Granger causality test is based on two axioms, that the cause will occur before the effect and that the cause contains unique information about the effect. In practice, Granger causation tests whether A precedes B, or B precedes A, or they are contemporaneous. It is not a causality analysis as it is usually understood and in this limited sense Clive Granger (1969) devised some tests which proceed as follows: consider two time series, xi and y.. The series x. fails to Granger cause y. if in a regression of y. on lagged y’s and x’s, the coefficients of the latter are zero. The lag length is, to some extent, arbitrary. An alternative test provided by Sims states that x. fails to cause y. in the Granger sense if in a regression of y. on lagged, current, and future x’s, the latter coefficients are zero. Although between the two tests there are some econometric differences, the two tests basically test the same hypothesis of precedence. This is the reason that many econometricians have suggested the use of the term precedence rather than Granger causality, since all one is testing is whether or not a certain variable precedes another and one is not testing causality as it is usually defined and understood.

The causality issue arises also in forecasting problems and techniques. Forecasting is the prediction of the behavior of future events and causal models are used to derive numerical forecasts. The causal models are regression and autoregression models used to produce numerical time series forecasts. The subject of a causal model is to identify one series as the main series of interest and to use another series as the predictor for the main series. It is argued that economic theory is necessary in order to provide the information needed to specify the causal relationships, because forecasts may not involve causal relationships.

In a general formulation of a causal time series model the predictor variable (exogenous) enters the equation at the same time as a contemporaneous variable and as a lagged independent variable. Even a simple causal model is fraught with difficulties. This is typically due to the problem of distinguishing the autocorrelation between the dependent and independent variables from the cross-sectional correlation between the two. Cross-sectional correlations that appear significant but are induced by autocorrelations are called spurious correlations. The problem of spurious correlation arises because in many instances, the predictor variable is stochastic and thus we need to forecast its time series. Several causal models have been developed to cope with this problem and the most common is the regression model with autoregressive disturbances.

The estimation of the causal effect arises also in the case of random experimentation. The central idea of an ideal randomized experiment is that the causal effect can be measured by randomly selecting observations from a population and then randomly giving some of the observations a treatment, the causal effect of which researchers then investigate. If the treatment is assigned at random then the treatment level is distributed independently of any of the other determinants of the outcome, thereby eliminating the possibility of omitted variable bias. The causal effect on Y of treatment level X is a difference in expected values and thus is an unknown characteristic of a population. One way to measure the causal effect is to use data from a randomized control experiment. Because the treatment is randomly assigned, the causal effect can be estimated by the difference in the sample average outcomes between the treatment and control groups.

Despite the advantages of randomized controlled experiments, their application to economics faces severe hurdles, including ethical concerns and cost. The insights of experimental methods can, however, be applied to quasi experiments that provide ecometricians with a way to think about how to acquire new data sets, how to manipulate instrumental variables in their analysis, and how to evaluate the plausibility of the exogeneity assumptions that underlie Ordinary Least Squares (OLS) and instrumental variables estimation. In a quasi-experiment technique there are special circumstances that make it seem "as if" randomization has occurred. In quasi experiments, the causal effect can be estimated using a differ-ences-in-differences estimator, possibly augmented with additional regressors; if the "as if" randomization only partly influences the treatment, then instrumental variables regression can be used instead. An important threat confronting quasi experiments is that sometimes the "as if" randomization is not really random, so the treatment (or the instrumental variable) is correlated with omitted variables and the resulting estimator of the causal effect is biased.

The issue of causality is very important in economic and social analysis but unfortunately not all analysts give the same meaning to this word. In discussing causal links, many economists emphasize the relevance of a sound economic theory in deriving causal propositions and they argue that caution should be applied in the inferences derived from the analysis.