Can somebody help me to understand a topic? Say in my analysis I have the true variable education ($E$) which is measured by a proxy variable number of years in school ($S$). Here, the error is measurement is $E-S$.

I was told that, this error is independent of $S$ not of $E$.

I can not understand how it can be the case? Should not be that error dependent on both $S$ and $E$?

3 Answers
3

You may be thinking that it might be possible that the length of time somebody is in school may affect how uncertain we are about their education. I suspect that what you were told stems from a different way of thinking about this.

If the number of years in school is measured accurately, for example from reliable administrative records, then there should be no error in $S$. Education is a more elusive concept: how much does a person learn? And so the error in $E-S$ comes from $E$.

The statement that the error is independent of S sounds like this was an example and you have measured both S and E and done a regression model with S as the predictor and E as the response. In that case the errors/residuals from the model are independent (if the model is correct) of the predictor variable (S in this case), but not E. Those statents are facts based on the theory and assumptions of fitting regression models, if you need to understand that better then I would suggest a good book on linear models.

If I have misunderstood where the statement comes from, then more background on the context might help us to help you.

I know I'm very late to the party, but one thing that might help: You appear to be under the impression that there are three sources of randomness in your model: $S$, $E$, and $u = S - E$, where $u$ is the proxy measurement error. This is not correct. There are only two sources of randomness in your model. They are (in a typical setup), $E$, your latent variable, and $u$, your proxy measurement error.

$S$ is a random variable, in the sense that it is equal to $E + u$, ie the sum of two random variables. However, because $S$ is uniquely determined by $E$ and $u$, it follows that $S$ is not a source of randomness. So to discuss the dependency between $u$, $S$, and $E$ is not really meaningful, because at most only two of these three random variables can be a source of randomness in your model.

Note, in a typical framework, a common assumption is that $u$ (proxy measurement error) is independent of $E$ (latent variable). Your framework appears odd in that someone has told you that your proxy measurement error is independent of $S$. We could, of course, set things up this way, by assuming that $u$ and $S$ are the sources of randomness and that $E$ is uniquely determined by the two of them. But, as I said, this is an odd way of doing things.

The OP framework is called Berkson measurement error model. @Henry described in a different answer as to why this might be a reasonable model to entertain for this situation: four years in Stanford arguably give you more than four years in a local community college.
–
StasKFeb 4 '13 at 14:13

@StasK Interesting stuff (I just had a quick scan of the abstract and intro of the original paper). Thanks for the comment. By the way, if you edit for Wikipedia at all, feel free to improve this entry on Berkson error as it is probably the most uninformative Wikipedia page I've ever seen :-)
–
Colin T BowersFeb 5 '13 at 6:02

I do edit Wikipedia, but I am not an expert on measurement error models. I wanted to improve the SEM entry for a very long time (introduce matrix notation and such), but never got to doing it.
–
StasKFeb 5 '13 at 13:29