Select language

Select language

Parsimony and operator methods for treatment of endogeneity and multiple sources of unobserved heterogeneity

Fact Sheet

Reporting

Results

Objective

"Unobserved heterogeneity and endogeneity are prevalent notions throughout econometrics. Most of the literature focuses on scalar unobserved heterogeneity. It implies strong restrictions on the heterogeneity of the behaviour of economic agents. This is the case in a binary treatment effect model where scalar unobserved heterogeneity and additive separability of the index in the selection equation are equivalent to the restrictive monotonicity assumption. Nonparametric random coefficients models allow for multiple sources of unobserved heterogeneity and are in line with structural economics. They are also benchmark nonseparable models and can be generalized in various ways. Due to unobserved heterogeneity, but also simultaneity or error in variables, structural models usually involve as well endogenous regressors.

Nonparametric models of unobserved heterogeneity and estimation by instrumental variables usually give rise to ill-posed inverse problems. High-dimensional methods are a new set of tools that are increasingly popular in econometrics and allow handling new data configurations with many more potential regressors than observations. They are based on convex relaxation, linear or conic programming ideas, or MCMC algorithms. When the model is well approximated by a parsimonious model where many coefficients are zero they can usually estimate the parameter as well as an oracle who would know the best sparse approximation. They also offer new tools for adaptive nonparametric estimation. Some recent developments are concerned with hidden structured sparsity (structural breakpoints or other patterns other than zeros). This research proposal is on the development of a general framework and new inference tools for flexible models – nonparametric or high-dimensional – with multiple sources of unobserved heterogeneity and endogeneity in various models from economics, in particular: programme evaluation, consumer demand, demand for differentiated products, games, etc."

The general purpose of this ERC funded research is to study two classes of large dimensional models : nonparametric and high-dimensional. The main reason for large dimensional models is that economic theory sometimes do not characterize functional forms, distributions of unobersables and their number, or the actual variables that have a direct effect on an outcome variable. Rather than using a model that would be simplistic, we allow for more flexible models which indeed are simple but without actually knowing the specific form of this simple relation. An outcome of interest is usually modeled as depending on some observed and unobserved factors. These unobservables are often modelled, for convenience, as if a single variable was missing. However, this has implications which are often undesirable. We have studied in detail the inclusion of multiple unobservables in treatment effect models and shown that treatment effect parameters can also be recovered in this case while allowing for so called nonmonotonic selection into treatment. This means that some variables having an effect on selection into treatment can shift individuals from both nontreatement to treatement and some from treatment to non treatement. Importantly, it is not necessary to restrict the distribution of these unobservables. These unobservables can be cost factors in a model where individuals choose their level of education partly because they have an information set which allows them to forecast that investing into education will be benefic for them in the future. This literature relies however too often on the requirement that the explanatory variables were able to vary sufficiently. With Christophe Gaillac, we are able to allow for explanatory variables (eg cost shifters) which have much less variation and obtain estimators which are optimal and adapt to the unknown distribution of the unobservables. A topic of huge importance in statistics over the last 15 years has been the estimation in models with many explanatory variables, possibly much more than the sample size, when many of them, which identity is unknown, do not have a direct effect (sparsity). With my coauthor Alexandre Tsybakov we have been the first to extend this litterature to many endogenous regressors, namely, regressors which are dependent on the unobservable error term but one has its disposal so-called instrumental variables, which is a classical subject of study in econometrics. We obtained confidence sets for the whole high-dimensional vector which adapt to the unknown number of regressors that actually have a direct effect, allow for some instrumental variables to have a direct effect (hence less moments than parameters) and be arbitrarily correlated with included endogenous variables (weak) and numerous. With my coauthor Christiern Rose, we have studied panel data models of networks with endogenous and exogenous variables and unobserved variables which are the same for all individuals. This could be viewed as a system with many equations with many endogenous variables. In this research we allow for cross equation restrictions, structured sparsity, and high-dimensional unobservables.