Can regularization be helpful if we are interested only in estimating (and interpreting) the model parameters, not in forecasting or prediction?

I see how regularization/cross-validation is extremely useful if your goal is to make good forecasts on new data. But what if you're doing traditional economics and all you care about is estimating $\beta$? Can cross-validation also be useful in that context? The conceptual difficulty I struggle with is that we can actually compute $\mathcal{L}\left(Y, \hat{Y}\right)$ on test data, but we can never compute $\mathcal{L}\left(\beta, \hat{\beta}\right)$ because the true $\beta$ is by definition never observed. (Take as given the assumption that there even is a true $\beta$, i.e. that we know the family of models from which the data were generated.)

Suppose your loss is $\mathcal{L}\left(\beta, \hat{\beta}\right) = \lVert \beta - \hat{\beta} \rVert$. You face a bias-variance tradeoff, right? So, in theory, you might be better off doing some regularization. But how can you possibly select your regularization parameter?

I'd be happy to see a simple numerical example of a linear regression model, with coefficients $\beta \equiv (\beta_1, \beta_2, \ldots, \beta_k)$, where the researcher's loss function is e.g. $\lVert \beta - \hat{\beta} \rVert$, or even just $(\beta_1 - \hat{\beta}_1)^2$. How, in practice, could one use cross-validation to improve expected loss in those examples?

Machine learning techniques ... provide a disciplined way to predict
$\hat{Y}$ which (i) uses the data itself to decide how to make the
bias-variance trade-off and (ii) allows for search over a very rich
set of variables and functional forms. But everything comes at a cost:
one must always keep in mind that because they are tuned for $\hat{Y}$
they do not (without many other assumptions) give very useful
guarantees for $\hat{\beta}$.

A ... fundamental challenge to applying machine learning methods such
as regression trees off-the-shelf to the problem of causal inference
is that regularization approaches based on cross-validation typically
rely on observing the “ground truth,” that is, actual outcomes in a
cross-validation sample. However, if our goal is to minimize the
mean squared error of treatment effects, we encounter what [11] calls
the “fundamental problem of causal inference”: the causal effect is
not observed for any individual unit, and so we don’t directly have a
ground truth. We address this by proposing approaches for constructing
unbiased estimates of the mean-squared error of the causal effect of
the treatment.

$\begingroup$Cross-validation is but one method in the data mining and machine learning toolkits. ML is seeing growing use in Economics -- see Susan Athey's website at Stanford (she's an academic interested in the integration of ML techniques into economics) or this paper Prediction Policy Problems by Kleinberg, et al., in an ungated version here: cs.cornell.edu/home/kleinber/aer15-prediction.pdf$\endgroup$
– Mike HunterMar 29 '16 at 16:50

9

$\begingroup$Please, folks, disambiguate: ML to many suggests machine learning and to many others suggests maximum likelihood. (Definition: you are on the machine learning side of the fence if ML automatically translates itself to you as machine learning.)$\endgroup$
– Nick CoxMar 30 '16 at 9:47

3

$\begingroup$@Aksakal my experience is that traditional econometrics, as it's taught to both undergrad and grad students, pays essentially zero attention to cross-validation. Look at Hayashi, which is a classic textbook. Sure, maybe cross-validation and the bias-variance tradeoff are mentioned in a course specifically on forecasting, but not in the core course that all students begin with. Does that sound right to you?$\endgroup$
– AdrianMar 30 '16 at 15:38

2

$\begingroup$@Adrian I see people are voting to close this question as too broad. It may be so, but as I see it you are basically asking: "Can CV be helpful if we are interested only in modeling, not in forecasting?" -- if I understand you correctly, your question can be easily edited and simplified, so it is clearer and certainly not too broad (even interesting!).$\endgroup$
– Tim♦Mar 31 '16 at 7:36

2

$\begingroup$@Adrian so it is very interesting question! I'm afraid you made it overtly complicated and the reference to econometrics is not crucial in here (as it is the same with other areas where statistical methods are used). I would encourage you to editing your question to simplify it.$\endgroup$
– Tim♦Mar 31 '16 at 7:45

If you check the plot gung made, you will be clear on why we need regularization / shrinkage. At first, I feel strange that why we need biased estimations? But looking at that figure, I realized, have a low variance model has a lot of advantages: for example, it is more "stable" in production use.

$\begingroup$Yes, but how do we select the regularization parameter? When the goal is to minimize prediction error, we can use a validation set. How can we make use of a validation set if we never observe the true model parameters?$\endgroup$
– AdrianFeb 1 '18 at 20:53

$\begingroup$See the quote about the "fundamental problem of causal inference" at the bottom of my question.$\endgroup$
– AdrianFeb 1 '18 at 22:33

Can cross-validation be helpful if we are interested only in modeling (i.e. estimating parameters), not in forecasting?

Yes, it can.
For instance, the other day I was using parameter importance estimation through Decision Trees. Every time I build a tree, I check the cross-validation error. I try to decrease the error as much as I can, then I will go to the next step of estimating the parameters' importance. It is possible that if the first tree that you build is very bad and you don't check the error, you will have less accurate (if not wrong) answers.

The main reason I believe is due to the many number of control variables that each technique has. Even slight change in one control variable will provide a different result.

How to improve your model after you check the cross-validation error? Well, it depends on your model. Hopefully, after trying a few times you will get some idea of the most important control variables and can manipulate them in order to find a low error.