Comments on: An alternative to MCMC?http://hea-www.harvard.edu/AstroStat/slog/2007/an-alternative-to-mcmc/
Weaving together Astronomy+Statistics+Computer Science+Engineering+Intrumentation, far beyond the growing bordersFri, 01 Jun 2012 18:47:52 +0000hourly1http://wordpress.org/?v=3.4By: vlkhttp://hea-www.harvard.edu/AstroStat/slog/2007/an-alternative-to-mcmc/comment-page-1/#comment-134
vlkFri, 07 Dec 2007 20:08:34 +0000http://hea-www.harvard.edu/AstroStat/slog/2007/an-alternative-to-mcmc/#comment-134Thanks for those very illuminating comments, Tom. You are of course spot on that what MCMC does is get error bars -- in fact my favorite game with astronomers is to shock them by pointing out how you can't ever really get a definitive "best-fit" with MCMC and yet you will not lack for realistic characterization of parameter uncertainty. Also thanks for clarifying the Lebesgue integral approach -- that was what I was trying to say while flailing about with the theta(L) business.
btw, during one of our CHASC meetings recently, Xiao-li described a fairly general method to get comparative goodness-of-fits out of MCMC. So even though MCMC doesn't give you a normalization constant, it is possible to work around that to compare nested models. More on that later.Thanks for those very illuminating comments, Tom. You are of course spot on that what MCMC does is get error bars — in fact my favorite game with astronomers is to shock them by pointing out how you can’t ever really get a definitive “best-fit” with MCMC and yet you will not lack for realistic characterization of parameter uncertainty. Also thanks for clarifying the Lebesgue integral approach — that was what I was trying to say while flailing about with the theta(L) business.

btw, during one of our CHASC meetings recently, Xiao-li described a fairly general method to get comparative goodness-of-fits out of MCMC. So even though MCMC doesn’t give you a normalization constant, it is possible to work around that to compare nested models. More on that later.

]]>By: TomLoredohttp://hea-www.harvard.edu/AstroStat/slog/2007/an-alternative-to-mcmc/comment-page-1/#comment-133
TomLoredoThu, 06 Dec 2007 18:57:13 +0000http://hea-www.harvard.edu/AstroStat/slog/2007/an-alternative-to-mcmc/#comment-133Vinay: First, thanks for posting about the Feroz & Hobson paper; I hadn't seen it and I certainly
will give it a look.
Second, I'm not sure you have correctly characterized MCMC or nested sampling. MCMC does not
hunt for the mode; in fact, the MCMC sample with the highest prior*likelihood may not be
a very good estimator for the mode, especially as dimension grows (but it may be a good
starting point for input into an optimizer). MCMC doesn't give you error bars "for free"---that's
what it's aiming for in the first place! MCMC correctly wanders around the posterior
via an algorithm that does not require one to normalize prior*likelihood; an
unfortunate consequence is that it can't tell you what the normalization constant
is (thought there are tricky ways to fudge around this, with various degrees of
complexity and success depending on the problem). Nested sampling targets that
normalization constant; it gives you the posterior samples (the "error bars") "for free."
A good way to understand nested sampling is to think of it as building an approximation
to the *Lesbegue* integral of prior*likelihood, as opposed to its Riemann integral. For
the familiar Riemann integral of f(x), you divide the x axis into "bins" and add up the
areas of the resulting narrow vertical "bars" of height f(x). For the Lesbegue version,
you divide the *ordinate* into bins, and for each bin, find the size ("measure") of the x
region corresponding to the intersection of the f-axis bins with f(x). This builds
the integral out of a bunch of wide but vertically short *horizontal* bars. Nested
sampling does this by using an algorithm for finding vertical bins (defined by likelihood
values) that has a cute statistical property: the associate x measures are, statistically,
distributed as order statistics. So even though you can't expect to find those measures
exactly (they would be the areas/volumes between neighboring likelihood contours, over the
*whole* space), by construction, the algorithm lets you "guess" those measures reasonably
accurately. It's truly ingenious.
There is no theta(L) step involved; indeed, there can't be---there is no unique solution
(it's a contour or surface or hypersurface). The algorithm works in theta space directly,
and you can use the resulting thetas as posterior samples.
But you are right about this hard part: "sampling to discard the tail of L(theta)." There
is no general way to do it (well, maybe the new paper has one!). The few problems that
have had good success with NS have had a special structure that allowed this sampling to
be done cleverly and accurately. The one previous cosmological example used an approximate
method, and to my mind (I only quickly read it) did not make a compelling case that
the results were accurate; and to the extent they were, it's because the problem was
not that challenging. We (my collaboration with some statisticians at Duke on
exoplanet problems) had a grad student spend a semester trying to get NS to work on a
simple exoplanet problem (with a complex, multimodal posterior), and it's the sampling
"above the tail" where we got hung up. He (Floyd Bullard) could
get it to work, but it was not as efficient as more common, less clever methods
(e.g., importance sampling).
Maybe the new paper will force us to re-examine NS.
-TomVinay: First, thanks for posting about the Feroz & Hobson paper; I hadn’t seen it and I certainly
will give it a look.

Second, I’m not sure you have correctly characterized MCMC or nested sampling. MCMC does not
hunt for the mode; in fact, the MCMC sample with the highest prior*likelihood may not be
a very good estimator for the mode, especially as dimension grows (but it may be a good
starting point for input into an optimizer). MCMC doesn’t give you error bars “for free”—that’s
what it’s aiming for in the first place! MCMC correctly wanders around the posterior
via an algorithm that does not require one to normalize prior*likelihood; an
unfortunate consequence is that it can’t tell you what the normalization constant
is (thought there are tricky ways to fudge around this, with various degrees of
complexity and success depending on the problem). Nested sampling targets that
normalization constant; it gives you the posterior samples (the “error bars”) “for free.”

A good way to understand nested sampling is to think of it as building an approximation
to the *Lesbegue* integral of prior*likelihood, as opposed to its Riemann integral. For
the familiar Riemann integral of f(x), you divide the x axis into “bins” and add up the
areas of the resulting narrow vertical “bars” of height f(x). For the Lesbegue version,
you divide the *ordinate* into bins, and for each bin, find the size (“measure”) of the x
region corresponding to the intersection of the f-axis bins with f(x). This builds
the integral out of a bunch of wide but vertically short *horizontal* bars. Nested
sampling does this by using an algorithm for finding vertical bins (defined by likelihood
values) that has a cute statistical property: the associate x measures are, statistically,
distributed as order statistics. So even though you can’t expect to find those measures
exactly (they would be the areas/volumes between neighboring likelihood contours, over the
*whole* space), by construction, the algorithm lets you “guess” those measures reasonably
accurately. It’s truly ingenious.

There is no theta(L) step involved; indeed, there can’t be—there is no unique solution
(it’s a contour or surface or hypersurface). The algorithm works in theta space directly,
and you can use the resulting thetas as posterior samples.

But you are right about this hard part: “sampling to discard the tail of L(theta).” There
is no general way to do it (well, maybe the new paper has one!). The few problems that
have had good success with NS have had a special structure that allowed this sampling to
be done cleverly and accurately. The one previous cosmological example used an approximate
method, and to my mind (I only quickly read it) did not make a compelling case that
the results were accurate; and to the extent they were, it’s because the problem was
not that challenging. We (my collaboration with some statisticians at Duke on
exoplanet problems) had a grad student spend a semester trying to get NS to work on a
simple exoplanet problem (with a complex, multimodal posterior), and it’s the sampling
“above the tail” where we got hung up. He (Floyd Bullard) could
get it to work, but it was not as efficient as more common, less clever methods
(e.g., importance sampling).

Maybe the new paper will force us to re-examine NS.

-Tom

]]>By: David van Dykhttp://hea-www.harvard.edu/AstroStat/slog/2007/an-alternative-to-mcmc/comment-page-1/#comment-130
David van DykSun, 18 Nov 2007 16:51:56 +0000http://hea-www.harvard.edu/AstroStat/slog/2007/an-alternative-to-mcmc/#comment-130You can definitely use the burn-in to set the Metropolis-Hastings jumping
rules used after burn-in. You can learn about the posterior correlations
and variances to optimize the jumping rule variances and blocking schemes.
The hitch is that you have to take it with a grain of salt because the
burn in is just burn in---it is likely more dispersed than the actual
posterior---at least if you pick good starting values!
<blockquote><b>[Response:</b> Yeah, that is what I had understood. But nested sampling seems to get around that by somehow weighting the likelihoods. Not quite sure how it works or how well it works in real situations.
-vinay</b>]</b></blockquote>
You can definitely use the burn-in to set the Metropolis-Hastings jumping
rules used after burn-in. You can learn about the posterior correlations
and variances to optimize the jumping rule variances and blocking schemes.
The hitch is that you have to take it with a grain of salt because the
burn in is just burn in—it is likely more dispersed than the actual
posterior—at least if you pick good starting values!

[Response: Yeah, that is what I had understood. But nested sampling seems to get around that by somehow weighting the likelihoods. Not quite sure how it works or how well it works in real situations.
-vinay]