I’ve been hard at work revising my DisMod-MR book, and one thing that has been fun is recognizing the jargon that is so embedding my working language that I’ve never thought about it before. What is a “covariate”? It is any data column in my tabular array that is not the most important one. But how do I know that? This word is not actually English.

A simple example occurs in crop-cutting experiments. In the Indian Statistical Institute the weight of green plants of jute or the weight of paddy immediately after harvesting are recorded on an extensive scale. In only a small fraction of cases (of the order of 10 %) the jute plant is steeped in water, retted and the dry fibre extracted and its weight determined directly, or the paddy is dried, husked and the weight of rice measured separately. These auxiliary measurements serve to supply the regression relation between the weight of green plants of jute or the weight of paddy immediately after harvesting and the yield of dry fibre of jute or of husked rice, respectively, which can then be used to estimate the corresponding final yields from the more extensive weights taken immediately after harvesting. Such a procedure simplifies the field work enormously without any appreciable loss of precision in the final results.

Such methods, in which the estimates made in later surveys are based on correlations determined in earlier surveys, may perhaps be called “covariate sampling”.