Abstract

We describe an exact, flexible, and computationally efficient algorithm for a joint estimation of the large-scale structure and its power-spectrum, building on a Gibbs sampling framework and present its implementation ARES (Algorithm for REconstruction and Sampling). ARES is designed to reconstruct the 3D power-spectrum together with the underlying dark matter density field in a Bayesian framework, under the reasonable assumption that the long wavelength Fourier components are Gaussian distributed. As a result ARES does not only provide a single estimate but samples from the joint posterior of the power-spectrum and density field conditional on a set of observations. This enables us to calculate any desired statistical summary, in particular we are able to provide joint uncertainty estimates. We apply our method to mock catalogs, with highly structured observational masks and selection functions, in order to demonstrate its ability to reconstruct the power-spectrum from real data sets, while fully accounting for any mask induced mode coupling.

Throughout cosmic history a wealth of information on the origin and evolution of our Universe has been imprinted to the large scale structure via the gravitational amplification of primordial density perturbations. Harvesting this information from probes of the large scale structure, such as large galaxy surveys, therefore is an important scientific task to further our knowledge and to establish a conclusive cosmological picture. In recent years great advances have been made, both in retrieving huge amounts of data and increasing sensitivity in galaxy redshift surveys. Especially the recent galaxy surveys, the 2dF Galaxy Redshift Survey [8] and the Sloan Digital Sky Survey [1] provide sufficient redshifts to probe the 3D galaxy distribution on large scales. In particular, the two point statistics of the matter distribution contains valuable information to test standard inflation and cosmological models, which describe the origin and evolution of all observed structure in the Universe. Measuring the power-spectrum from galaxy observations therefore has always attracted great interest. Precise determination of the overall shape of the power-spectrum can for instance place important constraints on neutrino masses, help to identify the primordial power-spectrum, and break degeneracies for cosmological parameter estimation from CMB data [28]. In addition, several characteristic length scales have been imprinted to the matter distribution throughout cosmic history, which can serve as new standard rulers to measure the Universe. A prominent example of these length scales is the sound horizon, which yields oscillatory features in the power-spectrum, the so called baryonacoustic oscillations (BAO) [51]. Since the physics governing these oscillatory features is well understood, precise measurements of the BAO will allow us to establish a new, precise standard ruler to measure the Universe through the distance redshift relation [5]. Precision analysis of large scale structure data therefore is a crucial step in developing a conclusive cosmological theory.

Unfortunately, contact between theory and observations cannot be made directly, since observational data is subject to a variety of systematic effects and statistical uncertainties. Such systematics and uncertainties arise either from the observational strategy or are due to intrinsic clustering behavior of the galaxy sample itself [48]. Some of the most prominent uncertainties and systematics are:

survey geometry and selection effects

close pair incompleteness due to fiber collisions

galaxy biases

redshift space distortions

The details of galaxy clustering, and how galaxies trace the underlying density field are in general very complicated. The bias between galaxies and mass density is most likely non-linear and stochastical, so that the estimated galaxy spectrum is expected to differ from that of the mass [12]. Even in the limit where a linear bias could be employed, it still differs for different classes of galaxies [6]. In addition, the apparent density field, obtained from redshift surveys, will generally be distorted along the line-of-sight due to the existence of peculiar velocities.

However, the main cause for the systematic uncertainties in large scale power-spectrum estimations is the treatment of the survey geometry [57]. Due to the survey geometry the raw power-spectrum yields an expectation value for the power-spectrum, which is the true cosmic power-spectrum convolved with the survey mask [6]. This convolution causes an overall distortion of the power-spectrum shape, and drastically decreases the visibility of the baryonic features.

The problems, mentioned above, have been discussed extensively in literature, and many different approaches to power-spectrum analysis have been proposed. Some of the main techniques to recover the power-spectrum from galaxy surveys are Fourier transform based, such as the optimal weighting scheme, which assigns a weight to the galaxy fluctuation field, in order to reduce the error in the estimated power [17]. Alternative methods rely on Karhunen-Loève decompositions [59] or decompositions into spherical harmonics, which is especially suited to address the redshift space distortions problematic [18]. In addition, to these deconvolution methods there exists a variety of likelihood methods to estimate the real space power-spectrum [2]. In order to not just provide the maximum likelihood estimate but also conditional errors, [43] proposed a Markov Chain Monte Carlo method to map out the likelihood surface.

Nevertheless, as the precision of large scale structure experiments has improved, the requirement on the control and characterization of systematic effects, as discussed above, also steadily increases. It is of critical importance to propagate properly the uncertainties caused by these effects through to the matter power-spectrum and cosmological parameters estimates, in order to not underestimate the final uncertainties and thereby draw incorrect conclusions on the cosmological model.

We therefore felt inspired to develope a new Bayesian approach to extract information on the two point statistics from a given large scale structure dataset. We prefer Bayesian methods to conventional likelihood methods, as the yield more general and profound statements about measurements. In example, conventional likelihood methods can only answer questions of the inner form like :“ Given the true value s of a signal, what is the probability distribution of the measured values d?” A Bayesian method, on the other hand, answers questions of the type :“Given the observations d, what is the probability distribution of the true underlying signal s?” For this reason, Bayesian statistics answers the underlying question to every measurement problem, of how to estimate the true value of the signal from observations, while conventional likelihood methods do not [38].Since the result of any Bayesian method is a complete probability distribution they permit fully global analyses, taking into account all systematic effects and statistical uncertainties. In particular, here, we aim at evaluating the power-spectrum posterior distribution P({P(ki)}|{di}), with P(ki) being the power-spectrum coefficients of the kith mode and di=d(→xi) is an observation at position →xi in three dimensional configuration space. This probability distribution would then contain all information on the two point statistics supported by the data. In order to explore this posterior distribution we employ a Gibbs sampling method, previously applied to CMB data analysis [61].

Since direct sampling from P({P(ki)}|{di}) is impossible or at least difficult, they propose instead to draw samples from the full joint posterior distribution P({P(ki)},{si}|{di}) of the power-spectrum coefficients P(ki) and the 3D matter density contrast amplitudes si conditional on a given set of data points {di}. This is achieved by iteratively drawing density samples from a Wiener-posterior distribution and power-spectrum samples via an efficient Gibbs sampling scheme (see figure for an illustration). Here, artificial mode coupling, as introduced by survey geometry and selection function, is resolved by solving the Wiener-filtering equation, which naturally regularizes inversions of the observational response operator in unobserved regions. In this fashion, we obtain a set of Monte Carlo samples from the joint posterior, which allows us to compute any desired property of the joint posterior density, with the accuracy only limited by the sample size. In particular, we obtain the power spectrum posteriorP({P(ki)}|{di}) by simply marginalizing the joint posteriorP({P(ki)},{si}|{di}) over the auxiliary density amplitudes si, which is trivially achieved by ignoring the si samples.

The Gibbs sampler also offers unique capabilities for propagating systematic uncertainties end-to-end. Any effect, for which we can define a sampling algorithm, either jointly with or conditionally on other quantities, can be propagated seamlessly through to the final posterior.

It is worth noting, that our method differs from traditional methods of analyzing galaxy surveys in a fundamental aspect. Traditional methods consider the analysis task as a set of steps, each of which arrives at intermediate outputs which are then fed as inputs to the next step in the pipeline. Our approach is a truly global analysis, in the sense that the statistics of all science products are computed jointly, respecting and exploiting the full statistical dependence structure between various components.

In this paper we present ARES (Algorithm for REconstruction and Sampling), a computer algorithm to perform a full Bayesian data analysis of 3D redshift surveys. In Section 3 we give an introduction to the general idea of the large scale structure Gibbs sampling approach, followed by Section 4 and Section 5, where we describe and derive in detail the necessary ingredients to sample the 3D density distribution and the power-spectrum respectively. The choice of the prior and the relevance for the cosmic variance are discussed in Section 6. Details concerning the numerical implementation are discussed in Section 7. We then test ARES thouroughly in Section 8, particularly focussing on the treatment of survey masks and selection functions. In Section 9 we demonstrate the running median filter, and use it as an example to demonstrate how uncertainties can be propagated to all inferences based on the set of Gibbs samples. Finally we conclude in Section 10, by discussing the results of the method and giving an outlook for future extensions and application of our method.

In this section, we describe the basic notation used throughout this work. Let the quantity ρi=ρ(→xi) be the field amplitude of the three dimensional field ρ(→x) at position →xi. Then the index i has to be understood as a multi index, which labels the three components of the position vector:

→xi=[x1i,x2i,x3i],(1)

where xji is the jth component of the ith position vector. Alternatively one can understand the index i as a set of three indices {r,s,t} so that for an equidistant grid along the three axes the position vector can be expressed as:

→xi=→xr,s,t=[Δxr,Δys,Δzt],(2)

with Δx, Δy and Δz being the grid spacing along the three axes. With this definition we yield:

ρi≡ρr,s,t.(3)

Also note that any summation running over the multi index i is defined as the three sums over the three indices r, s and t:

∑i≡∑r∑s∑t.(4)

Further, we will frequently use the notation {ρi}, which denotes the set of field amplitudes at different positions →xi. In particular:

As already described in the introduction, we seek to sample from the joint posterior distribution P({P(ki)},{si}|{di}) of the power-spectrum coefficients P(ki) and the 3D matter density contrast amplitudes si given a set of observations {di}.

In principle, this joint posterior distribution could be mapped out over a grid in the multi-dimensional space of the signal amplitudes si and power-spectrum coefficients P(ki). But since the number of grid points required for such an analysis scales exponentially with the number of free parameters, this approach cannot be realized efficiently. For this reason, we propose a Gibbs sampling approach to this problem.

The theory of Gibbs sampling [22] states, that if it is possible to sample from the conditional densities P({si}|{P(ki)},{di}) and P({P(ki)}|{si},{di}), then iterating the following two sampling equations will, after an initial burn-in period, lead to samples from the joint posterior P({si},{P(ki)}|{di}):

{si}(j+1)↶P({si}|{P(ki)}(j),{di}),(6)

{P(ki)}(j+1)↶P({P(ki)}|{si}(j+1),{di}),(7)

where the symbol ↶ denotes a random draw from the probability density on its right.

Once a set of samples from P({si},{P(ki)}|{di}) has been obtained, the properties of this probability density can be summarized in terms of any preferred statistic, such as its multivariate mean, mode or variance.

As our approach probes the joint distribution, we are able to quantify joint uncertainties of the signal amplitudes and the power-spectrum conditional just on the data. For this reason, the Gibbs sampling approach should not be considered as yet another maximum likelihood technique, although it certainly is able to produce such an estimate.

In the following we are going to describe the necessary methods and procedures required for iterating the processes Equation 6 and Equation 7 of signal and power-spectrum sampling.

Assuming a Gaussian signal posterior distribution P({si}|{P(ki)},{di}), the task of drawing a random signal sample can be split into two steps.

First, we estimate the maximum a postiori values for the signal amplitudes si, which in the Gaussian case coincide with the mean values. Then a fluctuation term, being consistent with the correct covariance, is added to the mean signal. The sum of the mean and the fluctuation term will then yield a sample from the conditional posterior.

The most challenging procedure in this signal sampling step is to calculate the a postiori values for the signal amplitudes si. Assuming a Gaussian posterior will directly lead to a Wiener filtering procedure, described below. However, this method requires to invert huge matrices which consists of the sum of the inverse signal S and inverse noise N covariance matrices. The matrix inversion is a numerically very demanding step, and at the same time presents the bottleneck for our method, as it defines the computational speed with which a signal sample can be produced. The efficient implementation of these matrix inversion step, as described by [32], allows for the production of many thousands of samples in computational feasible times. In the following sections we will describe the details of the signal sampling procedure.

As already described above, the main task for the signal sampling step is to derive the maximum a postiori values for the signal amplitudes si. According to Bayes’ theorem the conditional signal posterior can be written as the product of a signal prior and a likelihood normalized by the so called evidence. Further, here we will use the signal covariance matrix S rather than the power-spectrum {P(ki)}. It is well known, that the power-spectrum is just the Fourier transform of the signal covariance in configuration space. Since the Fourier transform is a basis transformation with a unitary transformation matrix, the signal covariance matrix S and the power-spectrum {P(ki)} can be used interchangeably for a normalized Fourier transform (see Section 5.1 and Appendix B for more details).

where we assume that the data amplitudes di are conditionally independent of the signal covariance matrix S, once the signal amplitudes si are given. Following [3], we describe the signal prior for the large scale matter distribution as a multivariate Gaussian, with zero mean and the signal covariance S. We can then write:

P({si}|S)=1√det(2πS)e−12∑i∑jsiSij−1sj.(9)

The Fourier transform of the signal covariance matrix S has an especially appealing form in Fourier space. It is well known, that in an homogeneous and isotropic universe the Fourier transform of the signal covariance is a diagonal matrix, with the diagonal elements being the power-spectrum. Hence, we can express the Fourier representation of the signal covariance as:

^^Skl=δKklPk(10)

where the ^-symbol denotes a Fourier transform, δKij is the Kronecker delta and Pk=P(kk) is the power-spectrum coefficient at the Fourier mode →kk in three dimensional pixel space [?].

The choice of the Gaussian prior can be justified by inflationary theories, which predict the matter field amplitudes to be Gaussian distributed in the linear regimes at scales k≲0.15h/Mpc[41].

At nonlinear scales the Gaussian prior does not represent the full statistical behavior of the matter field anymore. During nonlinear gravitational structure formation the statistics of the initial density field has evolved from a Gaussian distribution towards a log normal distribution as commonly assumed in literature [7].

However, note that in this case the Gaussian prior still describes the two point statistics of the underlying density field even in the nonlinear regime. The Gaussian prior should therefore be regarded as our a priori knowledge of the matter distribution, which is only formulated up to two point statistics. Next, we discuss the likelihood P({di}|{si}) given in equation (Equation 8).

As we seek to recover the maximum a postiori signal si from the set of observations di we must assume a model which relates these both quantities. The most straight forward data model is linear, and can be written as:

di=∑kKiksk+ϵi,(11)

where Kij is an observation response operator and ϵi is an additive noise contribution, which will be defined in more detail in the next section. If we assume the noise ϵi to be Gaussian distributed, with zero mean and covariance N, we can express the likelihood as:

P({di}|{si})=1√det(2πN)e−12(∑i∑j[di−∑kKiksk]Nij−1[dj−∑lKjlsl]),(12)

where we simply inserted the data model given in equation (Equation 11) into the Gaussian noise distribution.

With these definitions the signal posterior is a multivariate Gaussian distribution and can be written as:

P({si}|S,{di})∝e−12(∑i∑jsiSij−1sj+[di−∑kKiksk]Nij−1[dj−∑lKjlsl]),(13)

where we omitted the normalization factor, which is of no interest in the following.

Note, that omitting the normalization of the likelihood, requires that the additive noise term is independent of any signal contribution, as otherwise the noise covariance matrix would carry signal information and could not be neglected in the following. This assumption, however, is in general not true for the Poissonian noise, as described, in the next section.

The maximum of this signal posterior can then easily be found by either completing the square in the exponent of equation (Equation 13), or by extremizing P({si}|S,{di}) with respect to the signal amplitudes si. The latter approach allows us to directly read of the Wiener filter equation from equation (Equation 13), by simply differentiating the exponent with respect to si and setting the result to zero. The result is the famous Wiener filter equation which is given as:

∑j[S−1ij+∑m∑lKmiN−1mlKlj]mj=∑m∑lKmiN−1mldl,(14)

where we denoted the variable mj as a Wiener mean signal amplitude, to clarify that this reconstruction is the mean and not a typical sample of the distribution. The solution of this equation requires to invert the matrix:

Dij=S−1ij+∑m∑lKmiN−1mlKlj,(15)

which leads to the solution for the signal amplitudes

mi=∑jD−1ij∑m∑lKmjN−1mldl.(16)

This result demonstrates that estimating the maximum a postiori values mi for the signal amplitudes si involves inversions of the Wiener filter operator D. Therefore, the signal-sampling operation is by far the most demanding step of our Gibbs sampling scheme, as it requires the solution of a very large linear system.

Formally speaking, in practice, this corresponds to inverting matrices of order ∼106×106 or larger, which clearly is not computationally feasible through brute-force methods. For example, matrix inversion algorithms, based on usual linear algebra methods, have a numerically prohibitive O(N3pix) scaling, in order to transform to the eigenspace of the system, which bars sampling from the signal posterior.

This is the situation in which [32] proposed a particular operator based inversion technique to allow for computationally efficient calculation of the Wiener filter equation in three dimensional space.

In this implementation, the system of equations (Equation 16) can be solved by means of conjugate gradients (CGs). The computational scaling of this method is thus reduced to the most expensive step for applying the operator on the right-hand side of the equations, which in our case is the Fast Fourier transform, which scales as O(Npixlog(Npix)) .

In order to adapt the Wiener filter procedure for the specific application to galaxy observations, we are going to present the galaxy data model together with the according Poissonian noise covariance matrix.

It is possible to describe the observed galaxy distribution as a realization of an inhomogeneous Poissonian process [36]. We can therefore assume the observed galaxy numbers NOi=NO(→xi) at position →xi in three dimensional configuration space to be drawn from a Poissonian distribution [36].

NOi↶P(NOi|λOi)=λOiNOie−λOiNOi!,(17)

where the arrow denotes a random draw from the probability distribution and λOi is the mean observable galaxy number at position →xi. We can then write the observed galaxy numbers at discrete positions as:

NOi=⟨NOi⟩+ϵOi=λOi+ϵOi,(18)

where the noise term ϵOi denotes the difference between the observed galaxy number and the mean observable galaxy number. The Poissonian noise covariance matrix can then easily be obtained by:

NPij=⟨ϵOiϵOj⟩=⟨[NOi−⟨NOi⟩][NOj−⟨NOj⟩]⟩=δKij⟨NOi⟩=δKijλOi,(19)

where we simply calculated the Poissionian variance for the observed galaxy number assuming the galaxies to be independent and identically distributed (i.i.d). The mean observable galaxy number can be related to the true mean galaxy number λi by applying the observation response operator Rij as:

λOi=∑jRijλj,(20)

The true mean galaxy number, on the other hand, can be related to the dark matter over density field, the signal si, by introducing a physical model in the form of a bias operator Bij, e.g. a scale dependent bias:

For the case of galaxy redshift surveys the response operator Rij is the product of the sky mask and the selection function, which are both local in configuration space, and hence the response operator turns to:

Rij=δKijMiFi,(23)

where Mi is the value of the sky mask and Fi is the value of the selection function at position i. We therefore arrive at the data model already described in equation (Equation 11), which reads:

di=MiFi∑kBiksk+ϵOi¯λ=∑kKiksk+ϵi,(24)

where we introduced the effective observation response operator Kij=MiFiBij and the noise contribution ϵi=ϵOi/¯λ. This is the galaxy data model which we derived from the assumption of the Poissonian distribution of galaxies.

The Wiener filter operator requires the definition of the noise covariance matrix N, which for the Poissonian noise can be expressed as:

Nij=⟨ϵiϵj⟩=⟨ϵOiϵOj⟩¯λ2=δKijλOi¯λ2,(25)

where we used the Poissonian noise covariance matrix given in equation (Equation 19).

which immediately reveals, that there is a correlation between the underlying signal amplitudes si and the level of shot noise produced by the discrete distribution of galaxies [?].

Nevertheless, as pointed out in the previous section, the Wiener filter relies on the fact, that the additive noise contribution is uncorrelated with the signal. Hence, we have to assume the noise covariance as uncorrelated with the signal, but it may have some structure.

Therefore, we provide two approaches to effectively approximate the noise covariance matrix given in equation (Equation 26).

In the first approach we calculate an effective noise covariance matrix by averaging the noise covariance matrix given in equation (Equation 26) over the signal. We then obtain:

¯Nij=⟨Nij⟩s=δKij1¯λ[∑kRik(1+∑lBkl⟨sl⟩s)]=δKij1¯λ[∑kRik],(27)

where we used the fact, that the ensemble mean of the signal amplitudes for the density contrast vanishes. Note, that this model also arises when persuing a least squares approach to matter field reconstructions rather than the Bayesian approach as described in this work [32].

In the other approach we introduce a noise structure function nSFi given as:

nSFi=λOi¯λ2.(28)

With this definition the noise is approximated as being uncorrelated to the signal, but nonuniform. The noise covariance matrix then reads:

NSFij=δKijnSFi.(29)

In order to use this noise structure function we have to estimate λOi from the observed galaxy numbers NOi. By applying Bayes’ theorem to the Poissonian distribution given in relation (Equation 39) we yield:

P(λOi|NOi)=P(λOi)P(NOi|λOi)P(NOi).(30)

In the absence of any further a priori knowledge on λOi we assume a flat prior and search for the maximum of:

P(λOi|NOi)=λOiNOie−λOiΓ(NOi+1),(31)

which is normalized to yield unity when integrated over all λOi.

The noise structure function nSFi can then be estimated by searching the most likely value for λOi from equation (Equation 31). This yields:

nSFi=NOi¯λ2.(32)

Another estimator for nSFi is based on evaluating the mean of the probability distribution given in equation (Equation 31). The ensemble mean is calculated as:

⟨λOi⟩=∫∞0dλOiλOiλOiNOie−λOiΓ(NOi+1)=NOi+1.(33)

in this case the noise structure function nSFi can be written as:

nSFi=NOi+1¯λ2.(34)

A more thorough discussion on Poissonian noise models and their numerical implications for matter field reconstructions can be found in [33].

In the previous sections, we described the Wiener filter and the galaxy data model, which are required to estimate the mean of the signal posterior. However, this mean signal is no sample from the signal posterior yet, neither does it represent a physical density field, as it lacks power in the low signal to noise regions. To create a true sample from the signal posterior, one must therefore add a fluctuation term yi, which compensates the power lost due to noise filtering. The signal sample can then be written as the sum of the signal mean and a fluctuation term:

si=mi+yi.(35)

In our approach we realize the fluctuation term by generating a mock signal s∗i and a mock observation d∗i consistent with the data model given in equation (Equation 24). This kind of mock observation generation is well known in literature and has been applied to various scientific applications, as for instance the generation of constrained initial conditions for Nbody simulations [?]. The fluctuation term can then simply be calculated as:

yi=s∗i−∑jD−1ij∑m∑lKmjN−1mld∗l.(36)

The interpretation of this equation is simple. In the high signal to noise regime, the Wiener filter is nearly a pass-through operator, meaning the reconstructed signal is nearly identical to the true underlying signal. Therefore, as the variance is low, the fluctuation term tends towards zero. In the low signal to noise regime, on the other hand, the Wiener filter will block, and no signal can be reconstructed. The fluctuation term will therefore be nearly identical to the underlying mock signal s∗i.

In this fashion we add the correct power to the Wiener mean reconstruction. The effect of adding the fluctuation term to the Wiener mean is presented in figure ?, where we see the Wiener mean reconstruction, the fluctuation term and the sum of both.

The mock data d∗i is generated to obey the data model described in equation (Equation 24) and the Wiener variance.

We therefore first draw a mock signal s∗i with correct statistics from the multivariate Gaussian signal prior given in equation (Equation 9). Such a mock signal is best generated in Fourier space following the description of [?]. One first draws two Gaussian random numbers, χa and χb, with zero mean and unit variance and then calculates the real and imaginary part of the signal in Fourier space as:

RE(^sk)=√Pk2χaIM(^sk)=√Pk2χb,(37)

where Pk is the power-spectrum coefficient at the kth position in Fourier space. Note, that the mock signal s∗i is supposed to be a real quantity, and therefore hermiticity has to be imposed in Fourier space before performing the inverse Fourier transform [?].

Next we have to generate the additive noise contribution. In order to draw a noise term with the correct Poissonian statistics, we first draw a random number N∗i from the Poissonian distribution:

N∗i↶P(N∗i|λ∗i),(38)

where we choose the mean observed galaxy number to be λ∗i=nSFi¯λ2. According to equations (Equation 18) and (Equation 24) the mock noise term ϵ∗i can be calculated as:

ϵ∗i=N∗i−nSFi¯λ2¯λ.(39)

It is clear by construction that this mock noise term has vanishing mean and the correct noise covariance matrix. Then, according to equation (Equation 24) the mock observation is given as:

d∗i=∑kKiks∗k+ϵ∗i.(40)

The proof, that the fluctuation term yi as generated by equation (Equation 36) truly generates the correct variance is given in Appendix C.

Note, that the application of the Wiener operator is a linear operation, and we can therefore rewrite equation (Equation 35) as:

si=s∗i+∑jD−1ij∑m∑lKmjN−1ml(dl−d∗l),(41)

where the Wiener operator is applied to the true data di and the mock observation d∗i simultaneously. This greately reduces the CPU time required for the generation of one signal sample.

As described above, the signal sampling step provides a noise-less full sky signal sample si consistent with the data. The next step in the Gibbs sampling iteration scheme requires to draw power-spectrum samples from the conditional probability distribution P(S|{si},{di}). Since in this Gibbs sampling step the perfect sky signal amplitudes si are known, the power-spectrum is conditionally independent of the data amplitudes di. Hence, in this Gibbs sampling step, we can sample the power-spectrum from the probability distribution P(S|{si}). In the following we will show that the power-spectrum can easily be drawn from an inverse gamma distribution.

According to Bayes’ theorem, we can rewrite the conditional probability P(S|{si}) as:

P(S|{si})=P(S)P({si})P({si}|S),(42)

where P(S) is the prior for the signal covariance, P({si}|S) is given by equation (Equation 9) and P({si}) is a normalization constant in this Gibbs sampling step.

More specifically, we are interested in the set of matrix coefficients {Sij} of the covariance matrix S. As already pointed out in Section 4.1 the signal covariance matrix of an homogeneous and isotropic universe, has an especially appealing form in Fourier space, where it takes a diagonal form. In our application the real space covariance matrix coefficients {Sij} are related to their Fourier representation via the fast Fourier transform, as defined in Appendix A. We can therefore write:

Then we can express the conditional distribution for the Fourier signal covariance coefficients ^^Skl as:

P({^^Skl}|{si})=P({Sij}|{si})∣∣
∣∣∂{Sij}∂{^^Skl}∣∣
∣∣,(44)

where

∣∣
∣∣∂{Sij}∂{^^Skl}∣∣
∣∣=∣∣det(J(ij)(kl))∣∣,(45)

is the Jacobian determinant for this coordinate transformation. As the discrete Fourier transform is proportional to a unitary matrix, this Jacobian determinant only amounts to a normalization constant, as has been demonstrated in Appendix B.

With this definition we can rewrite the conditional probability in equation (Equation 42), by replacing all the real space covariance matrix coefficients Sij by their Fourier representation ^^Skl, and normalizing it with the constant obtained from the coordinate transformation. We can therefore write:

where we used the discrete Fourier transform definition, given in Appendix A, to replace the real space signal amplitudes si by their Fourier counterparts ^sk. Introducing equation (Equation 10) then allows us to rewrite equation (Equation 46) in terms of the power-spectrum coefficients Pk as:

where the determinant factorizes due to the diagonal form of the signal covariance matrix in Fourier space.

Note, that due to isotropy the power-spectrum is independent of direction in Fourier space, meaning the power-spectrum coefficients only depend on the modulus of the mode vector →kk:

Pk=P(→kk)=P(|→kk|).

For this reason, the angular dependence in Fourier space can be summed over.

To do so we remark that the mode vector →kk, as a geometrical object, will not change if we express it in the basis of cartesian coordinates →kk=→kk(k1k,k2k,k3k), or if we describe it in the basis of spherical coordinates →kk=→kk(|→kk|,φk,ϑk). We can therefore split the multi index summation into the summation over the three spherical coordinates as:

where we introduced σ(|→kk|)=∑φk∑ϑkC2/^C2∣∣^s(|→kk|,φk,ϑk)∣∣2, which is the summed signal power on spherical shells around the origin in Fourier space, and the index m labels each of the M shells belonging to the different mode vector modulus |→kk| in the Fourier box.

Several different mode vectors →kk may have the same vector modulus |→kk|, and therefore belong to the same shell. To account for this we introduce the number nm, which counts the number of different mode vectors →kk, belonging to the mth shell in Fourier space. This number nm, therefore counts the degrees of freedom for each of the M modes. We can then express the product in equation (Equation 47) in terms of m as:

When ignoring the power-spectrum prior P({Pk}) in the above equation (Equation 50), we see that the probability distribution factorizes in the different Pm, meaning they could be sampled independently.

If also the prior for the different Pm would factorize as:

P({Pk})=M−1∏m=0P(Pm),(51)

then it is possible to sample each mode of the power-spectrum independently.

On large scales, or in the linear regime, the theory of gravitational structure formation tells us that the different Fourier modes evolve independent of each other. In these regimes the proposed power-spectrum prior would be the adequate choice. However, we also know that nonlinear structure formation couples the different Fourier modes, the stronger the deeper we reach into the nonlinear regime. In these regimes another prior would be more adequate, but also harder to sample.

Anyhow, as already described in Section 3 the entire power-spectrum sampling method requires two steps. While the different power-spectrum modes are assumed to be independent in the power-spectrum sampling step, they are not in the signal sampling step. There the different modes are coupled via the observation mask and selection function, and furthermore, the physical coupling of the different modes is represented in the data.

Therefore, in the following we assume a power-spectrum prior, as proposed in equation (Equation 51), and defer a more thorough investigation of adequate prior choices in the nonlinear regime to future work.

With this prior choice each mode can be sampled independently from the following probability density distribution:

P(Pm|{si})=P(Pm)(CN2P({si}))1/M(2πPm)−nm/2e−12σmPm.(52)

Further, we will assume a power-law behavior for the individual mode prior P(Pm)∝P−αm where α is a power law index. Note, that a power-law index α=0 describes the flat prior, while α=1 amounts to the Jeffrey’s prior. The Jeffrey’s prior is a solution to a measure invariant scale transformation of the form P(Pm)dPm=P(γPm)γdPm[61], and therefore is a scale independent prior, as different scales have the same probability.

Inserting this power law prior in equation (Equation 52) and imposing the correct normalization, reveals that the power-spectrum coefficients have to be sampled from an inverse gamma distribution given as:

P(Pm|{si})=(σm2)(α−1)+nm/2Γ((α−1)+nm2)1P(α+nm/2)me−12σmPm.(53)

By introducing the new variable xm=σm/Pm and performing the change of variables we yield the χ2-distribution as:

P(xm|{si})=xβm/2−1mΓ(βm/2)(2)βm/2e−xm2,(54)

where βm=2(α+nm/2−1). Sampling the power-spectrum coefficients is now an easy task, as it reduces to drawing random samples from the χ2-distribution. A random sample from the χ2-distribution for an integer βm can be drawn as follows.

is χ2-distributed, and →zm is a βm element vector, with each element being normally distributed. The power-spectrum coefficient sample is then obtained by:

Pm=σm|→zm|2.(56)

It is easy to see that each spectrum coefficient sample is a positive quantity, this ensures that the signal covariance matrix is positive definite as it has to be by definition.

To summarize, we provide an optimal estimator for the power-spectrum coefficients, and their uncertainties.

It is also worth mentioning, that the inverse gamma distribution is a highly non-Gaussian distribution, and that for this reason, the joint estimates of signal amplitudes si and power-spectrum coefficients Pm are drawn from a non-Gaussian distribution.

where {si}j are the signal Gibbs samples, and NGibbs is the total number of Gibbs samples.

This result is known as the Blackwell-Rao estimator of P({Pm}|{di}) which is guaranteed to have a lower variance than a binned estimator [61].

It is worth noting, that P({Pm}|{si}) has a very simple analytic form, and therefore equation (Equation 61) provides an analytic approximation to P({Pm}|{di}) based on the Gibbs samples. All the information on P({Pm}|{di}) is therefore contained in the σm of the individual Gibbs steps, which generate a data set of size \O(mmaxNGibbs), where mmax is the maximal number of independent modes. In addition, to being a faithful representation of P({Pm}|{di}) the Blackwell-Rao estimator is also a computationally efficient representation, which allows to calculate any moment of P({Pm}|{di}) as:

The Blackwell-Rao estimator also allows us to demonstrate another remarkable property of the Gibbs sampling approach. Although a specific power-spectrum prior has to be employed during the Gibbs analysis of the data, a post processing analysis of the power-spectrum can be performed with any desired power-spectrum prior. Lets assume one prefers to perform a post processing analysis with a power-spectrum prior P′({Pm}), rather than with the prior P({Pm}), which was employed during Gibbs sampling. We therefore want to estimate the power-spectrum from the following posterior:

where we simply made use of the Bayes theorem. Since P({si}|{Pm}) is a simple Gaussian distribution, and therefore given analytically, the posterior P′({Pm}|{di}) can be calculated with any desired power-spectrum prior in a post-processing step.

The Gibbs sampling procedure consists of the two basic steps, of first sampling perfect noise-less full sky signal samples si and then sampling the power-spectrum coefficients Pm given si.

Therefore, the probability distribution P(Pm|{si}), given in equation (Equation 53), encodes our knowledge on the power-spectrum coefficients Pm, if we had perfect knowledge of the true signal amplitudes si.

It is clear, that in the case of perfect observations the full posterior distribution for the power-spectrum coefficients P(Pm|{di}) would reduce to that one given in equation (Equation 53).

This is, because in the case of perfect full-sky and noise-less observations, the signal posterior would collapse to a Dirac delta distribution, due to the vanishing noise covariance matrix. This means, that in this case the signal amplitudes si can be estimated with zero variance.

However, measuring the Pm to arbitrary precision will never be possible. The power-spectrum coefficients depend on the data through the σm, which measure the actual fluctuation power in the observed Universe. It is clear that the probability distribution function (Equation 53) for the Pm will not reduce to a Dirac delta distribution, even though the σm have been measured perfectly.

Owing to this fact, there will always remain some uncertainty in the power-spectrum estimation, even in the case of perfect measurements. This residual uncertainty is well known as cosmic variance, which is the direct consequence of only observing just one specific matter field realization.

The Gibbs sampling approach, as proposed here, takes this cosmic variance into account, by drawing samples from the probability distribution P(Pm|{si}), which obey the correct statistical properties.