We introduce a new CMB temperature likelihood approximation called the
Gaussianized Blackwell-Rao (GBR) estimator. This estimator is derived by
transforming the observed marginal power spectrum distributions obtained by the
CMB Gibbs sampler into standard univariate Gaussians, and then approximate
their joint transformed distribution by a multivariate Gaussian. The method is
exact for full-sky coverage and uniform noise, and an excellent approximation
for sky cuts and scanning patterns relevant for modern satellite experiments
such as WMAP and Planck. A single evaluation of this estimator between l=2 and
200 takes ~0.2 CPU milliseconds, while for comparison, a single pixel space
likelihood evaluation between l=2 and 30 for a map with ~2500 pixels requires
~20 seconds. We apply this tool to the 5-year WMAP temperature data, and
re-estimate the angular temperature power spectrum, $C_{\ell}$, and likelihood,
L(C_l), for l<=200, and derive new cosmological parameters for the standard
six-parameter LambdaCDM model. Our spectrum is in excellent agreement with the
official WMAP spectrum, but we find slight differences in the derived
cosmological parameters. Most importantly, the spectral index of scalar
perturbations is n_s=0.973 +/- 0.014, 1.9 sigma away from unity and 0.6 sigma
higher than the official WMAP result, n_s = 0.965 +/- 0.014. This suggests that
an exact likelihood treatment is required to higher l's than previously
believed, reinforcing and extending our conclusions from the 3-year WMAP
analysis. In that case, we found that the sub-optimal likelihood approximation
adopted between l=12 and 30 by the WMAP team biased n_s low by 0.4 sigma, while
here we find that the same approximation between l=30 and 200 introduces a bias
of 0.6 sigma in n_s.

This paper investigates approximations for the CMB temperature likelihood function at low l, in particular an approximation that can be calibrated from Gibbs sampling. They claim a surprisingly large shift in some cosmological parameters (ns by 0.6 sigma) when using the new approximation compared to WMAP. If correct this is quite interesting and surprising.

Some comments:

* Compressing the data into power spectrum estimators, as WMAP do at high l, should be suboptimal but unbiased as long as a valid likelihood approximation is used. I'm therefore surprised to see apparent shifts in parameters rather than changes in the error bar. When testing with simulations, I've found that the WMAP-like likelihood approximations work just fine. Even evident deviations from the assumed likelihood model has almost no effect because the errors tend to cancel between l (see 0804.3865). Indeed just using a pure-Gaussian likelihood approximation works fine in almost all realisations. I'm sure WMAP have also extensively tested their method for biases in simulations. I wonder if the authors reproduce such shifts in idealized simulations?

* This being the case, could the shifts be due to something else, e.g. differences in beam modelling? The paper doesn't comment on what they do about the beams at low l, even though the beam transfer function is not unity even at l<200.

* In the conclusions they comment that their approach allows seamless propagation of systematic effects such as beam errors. I'm not convinced by this: a beam error essentially shifts the entire spectrum up and down. The approximation used in the paper fits the maginalized Cl distributions at each l separately, which in the case of beam errors are actually strongly correlated between l. Is there any reason to expect this to work?

* Compressing the data into power spectrum estimators, as WMAP do at high l, should be suboptimal but unbiased as long as a valid likelihood approximation is used. I'm therefore surprised to see apparent shifts in parameters rather than changes in the error bar. When testing with simulations, I've found that the WMAP-like likelihood approximations work just fine. Even evident deviations from the assumed likelihood model has almost no effect because the errors tend to cancel between l (see 0804.3865). Indeed just using a pure-Gaussian likelihood approximation works fine in almost all realisations. I'm sure WMAP have also extensively tested their method for biases in simulations. I wonder if the authors reproduce such shifts in idealized simulations?

No, we haven't compared to simulations. Instead, we have checked that the approximation matches the exact likelihood, which of course is a an even better approach – not only is it statistically unbiased (which is all you can check with MC simulations), but it gives the right answer in each particular realization. The loophole, of course, is the point that the validation was done at low resolution, while the WMAP5 analysis is at high resolution. But it's reasonable to assume that if it works at low l's, it works at high l's, I think, since there are less correlations there, and the distributions are intrinsically more Gaussian. But there's a loophole there, yes.

Still, the observed shift is somewhat surprising, yes, but no more so than it was when we saw the same thing in WMAP3: In that case, we found a shift of 0.4 sigma in ns when increasing lmax from 12 to 30 for the exact part. Then, at first we thought the shift was due to diffuse foregrounds and processing errors, and it took some time before Eiichiro Komatsu and we figured out that it was actually the likelihood approximation that caused this: Switching from the approximate MASTER-likelihood to the exact likelihood between l=12 and 30 increased ns by 0.4 sigma. And this is why the WMAP team adopted lmax=30 in their 5-year analysis.

And now we find a very similar effect by increasing lmax from 30 to 200...

Also, note that statistical unbiasedness does not mean "the same as the exact likelihood answer in every realization".

Antony Lewis wrote:

* This being the case, could the shifts be due to something else, e.g. differences in beam modelling? The paper doesn't comment on what they do about the beams at low l, even though the beam transfer function is not unity even at l<200.

Note that we do take into account the actual DA specific beams for each map (V1 and V2). The analysis is done at full WMAP resolution of Nside=512 (not a degraded version of them), so this is straightforward to do in the Gibbs sampler.

Antony Lewis wrote:

* In the conclusions they comment that their approach allows seamless propagation of systematic effects such as beam errors. I'm not convinced by this: a beam error essentially shifts the entire spectrum up and down. The approximation used in the paper fits the maginalized Cl distributions at each l separately, which in the case of beam errors are actually strongly correlated between l. Is there any reason to expect this to work?

It's at least as likely as for the WMAP approach, which also does this quadratically. Note that the assumption in the GBR estimator is that only 2-point correlations in l are important, not 3-point. Essentially, what beam uncertainties mean is that there will be large-scale correlations in the correlation matrix, and we can either compute these by MC (as in the Gibbs sampler), or by converting the known beam covariance matrix to Gaussianized x-space and then add it to C. Both should work quite well, I think, but there is of course always a question of convergence, and needs to be tested. But I think there's good reasons to expect that it will work, yes – and it should at least work better than the current WMAP approach.

* Compressing the data into power spectrum estimators, as WMAP do at high l, should be suboptimal but unbiased as long as a valid likelihood approximation is used. I'm therefore surprised to see apparent shifts in parameters rather than changes in the error bar. When testing with si

If this was the case, you could just state by how much you are suboptimal and descrease the errorbars accordingly... :) Unbiasedness in my understanding means that it is unbiased over ensemble of realizations and on every given sky, the true value will be within error, but better methods can produce smaller errors and shift values around... (but otherwise I agree with your comments)

Yes, I agree that in any given realization a likelihood approximation can be significantly wrong without being biased on average. But shifts in non-pathological cases should be of the order of the change in the error bar, and I'm surprised if pseudo-Cl methods are suboptimal at the level of the shift in ns the paper claims.

But if the claim is that our particular sky is very unusual, giving a large difference when in almost all simulations the difference is small, this may be indicating that something is wrong with the model, e.g. breakdown of statistical isotropy, Gaussianity, etc. If that is that case then all methods based on statistically isotropic Gaussian assumptions are wrong.
So I'd find it more compelling as a likelihood-modelling issue if you can reproduce similar shifts in idealized simulations.

Incidentally, in the paper it's not very clear what is done with the noise (WMAP high-l has no noise bias), and how the polarization is included in the two cases. I also wonder if there's some issue with how different likelihood regimes are patched together (e.g. leakage of parameter-dependent high-l into the low l; my thoughts on how to do it here).

But if the claim is that our particular sky is very unusual, giving a large difference when in almost all simulations the difference is small, this may be indicating that something is wrong with the model, e.g. breakdown of statistical isotropy, Gaussianity, etc. If that is that case then all methods based on statistically isotropic Gaussian assumptions are wrong.
So I'd find it more compelling as a likelihood-modelling issue if you can reproduce similar shifts in idealized simulations.

Hmm... I think perhaps implementing a high-l likelihood code for uniform noise and symmetric sky cut is an even better approach. It's always much nicer with exact analytic comparisons than with MC results.

However, when it comes to "our sky being unusual", the first thing that comes to mind are the outliers. We spent some time on checking out multipoles like l=21, 40, 121 and 181, since these are quite far out in the tails in the WMAP spectrum. So one question is how well the offset log-normal+Gaussian WMAP likelihood performs in the far tails, and how much these multipoles affect the overall results. Right now, at least, I trust our approach more than the analytic fit used by WMAP in these cases – the univariate BR estimator works great even in the tails of a marginal distribution.. But it would perhaps be interesting to implement a test likelihood where these outliers are removed, and see how the resulting parameters shift around.

Antony Lewis wrote:

Incidentally, in the paper it's not very clear what is done with the noise (WMAP high-l has no noise bias), and how the polarization is included in the two cases. I also wonder if there's some issue with how different likelihood regimes are patched together (e.g. leakage of parameter-dependent high-l into the low l; my thoughts on how to do it here).

Well.. As stated in the paper, the V-band is strongly cosmic variance dominated at l<200, so the noise isn't terribly important here. And even if there are uncertainties in the overall noise level, it's certainly not off by some 5%.. :-)

(Or are you actually asking how the Gibbs sampler handles noise in general here..? If so, I recommend taking a look at one of the earlier Gibbs method papers, but in short, it does the same as a brute-force likelihood evaluation does.)

Perhaps we weren't explicit about polarization, but it is of course unchanged in the two cases – as stated in the paper, all we did was change lmax for the low-l part from l=30 to 200, nothing else. Remember also that the polarization likelihood sector in the WMAP code is completely independent of the TT sector.

Finally, as far as patching between high and low l's goes, this should be *easier* at l=200 than at l=30, because the correlations are better behaved here. And, of course, a potential mis-characterization of the correlations among those four multipoles does anyway not contribute *that* much to the overall likelihood..