which uses iteratively reweighted least-squares to generate a maximum-likelihood estimate. The data for all three judges were pooled because an analysis of variance (ANOVA) determined that the effect of judges was not significant at the 5% level. In the ANOVA, judges, images, and bit rates were taken to be fixed effects.

Instead of doing a fit directly to the expectation of the response, a second way to approach this problem looks for the probability p{ of obtaining the response i for each of the five possible responses (i = 1,..., 5). The expectation can then be calculated from the probabilities. We can transform the responses y into binary outcomes:

Exponentiating both sides and rearranging terms yields

For that value of SNR, similar equations can be found for p2, p3, p4, and p5. Additionally, we know that ^ ipi = 1. This system can be solved and the expectation calculated from these scaled probabilities:

For some of the pi there are slight edge effects from the spline fit. For example, p1 dips very slightly below zero at SNR = 24.9, and then becomes slightly positive again for SNRs>27.2, although there are no further reponses of 1 at those SNRs. Until we have made a further study of these edge effects, they are dealt with simply by setting pi identically equal to zero beyond the point where it first crosses zero. The expectation is then calculated from these windowed probabilities. The expectation is almost indistinguishable from the curve of Fig. 4, thereby validating the Poisson model.

Having established the appropriateness of the Poisson model, we use it to compare SNR against segmental SNR in their ability to predict subjective quality ratings. Segmental SNR, often used in speech quality evaluation, compensates for the underemphasis of weak-signal performance in conventional SNR. An image is divided into blocks, the SNR is calculated for each block on a log scale, thresholded below at 0 and above at 45, and the values are averaged. By converting component SNR values to decibel values prior to averaging,

The binary response variables yi can then each be fitted using the logit link:

which guarantees that ^ is in the interval [0,1]. The logit link together with the binomial variance function ^(1 — defines the logistic regression model. For each yi the predictor x was a quadratic spline in SNR, with the knots located in each case at the mean value of the SNRs that produced that response (18.2, 20.12, 22.57, 24.61, 25.56). The probabilities pi are shown in Fig. 5 with vertical offsets so they are not superimposed.

As the five probabilities have been determined from separate regressions, they need to be scaled so that they add to one

very high SNR values corresponding to well-coded large-signal segments do not camouflage coder performance with the weak segments, as in conventional SNR. We examined block sizes of all powers of 2 between 2x2 and 256 x 256. Since the images are of size 256 x 256, the segmental SNR for that block size equals the conventional SNR. The usefulness of the computable metric in predicting subjective quality was examined as follows: For n = 20 times, the 30 MR images were put in a different random order. Each time, a 10-fold cross-validation was performed in which three images at a time were left out, and the other 27 images were used to fit the model. All judges and levels corresponding to those 27 images were used. The three images not involved in determining the parameters of the fit comprise 45 data points (3 images x 3 judges x 5 compression levels). For these data we compute the mean outcome and the sum of squared deviation from this overall mean. This value is called S1. Then we calculate the fitted values for these data, and take the sum of squared deviations of observed and fitted, called S2. If the model is good and the test set of three images is not unlike the set of 27 images used to fit the model, we expect S2 to be smaller than S1. The percent reduction in mean squared error that owes to fitting the model (beyond fitting an overall constant) is a statistic that summarizes the model's predictive power:

This statistic is a cross-validated analogue of the multiple correlation. The results are presented in Table 5.

It appears that segmental SNR at several different block sizes outperforms conventional SNR. The best of these (on 8x8 blocks) produced a 48% reduction compared to the 43% reduction for SNR. One could examine the statistical significance of these differences by sampling from the permutation distribution, and it would be of interest to compare SNR against perceptually based computable quality measures.

In studies like ours, one frequently wants a measure of the predictive power of the model, as well as measures of its goodness of fit. One diagnostic as to the appropriateness of the Poisson regression model (how median-biased it is) is z0