In order to accurately and consistently estimate group-level statistics and to provide useful data for secondary analysis, NAEP creates plausible values. These plausible values are based on a latent regression model that contains both measurement and population-structure models. The combination of measurement models, in the form of Item Response Theory (IRT) models, and population-structure models, in the form of latent regression models with student group indicators as independent variables, provides an estimated distribution of underlying performance for the population and subgroups of interest. This distribution is

where θ is a vector of s domain specific scales within a subject area, x is the matrix of item responses, y is the matrix containing group membership information, α is the matrix of IRT item parameters, and Γ and Σ are parameters from the population-structure models. The goal is to summarize characteristics of the θ distributions as a function of group membership(s).

Once the model parameters α, Γ and Σ are estimated (see Estimation of NAEP Score Scales and Estimation of Parameters of the Population-Structure Models for more details), a predictive conditional distribution of possible score values is constructed for each student in the sample. The predictive conditional distribution expresses the probability for each value of θ based on both a) the corresponding students performance on the items x and b) membership in the population groups y.

Any statistic, t, of interest can be calculated directly from this estimated distribution of underlying performance. However, to allow secondary analyses of NAEP data to be conducted with software available in most statistical packages, twenty plausible values are sampled as independent random draws from the predictive conditional distribution for each student and saved in the NAEP data files for secondary analysts (see Creation of Plausible Values for more details). The plausible values can be used in standard statistical equations for many statistics of interest and can be used to estimate the standard errors for those statistics, as long as the population-structure model sufficiently represents the groups for which statistics are calculated.