Comments on: Should we worry about rigged priors? A long discussion.http://andrewgelman.com/2017/10/04/worry-rigged-priors/
Wed, 13 Dec 2017 21:32:13 +0000hourly1https://wordpress.org/?v=4.9.1By: Georgette Ashermanhttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-583127
Tue, 10 Oct 2017 12:42:06 +0000http://andrewgelman.com/?p=35373#comment-583127In industrial settings this can be the case. There is a small effect, essentially zero, with a relatively large known mean and low variance. However that ‘essentially zero’ effect can be meaningful in terms of production cost or other concerns. That is why equivalence and non-inferiority testing is commonly used.
]]>By: Anoneuoidhttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-580856
Sun, 08 Oct 2017 04:55:56 +0000http://andrewgelman.com/?p=35373#comment-580856I’m not sure if that is the same point I am making, but don’t disagree with it. I’m saying the NHST paradigm is based on the principle that correlations/effects are rare. Thus it is somehow exceptional to find such, and studies are designed for this purpose. Instead the principle should be everything is correlated with everything else, and studies should be designed based on that principle.
]]>By: Jorgehttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-579562
Fri, 06 Oct 2017 17:30:59 +0000http://andrewgelman.com/?p=35373#comment-579562Hi Bob,

But shouldn’t it be also informative to run the same model with different priors based on previous findings? That is, using only the effects (summary statistics, not posterior draws) from similar studies and just see how much models change/robustness/reliability.

]]>By: Carlos Ungilhttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-579382
Fri, 06 Oct 2017 12:30:00 +0000http://andrewgelman.com/?p=35373#comment-579382“For a fully informative prior for δ, we might choose normal with mean 0 because we see no prior reason to expect the population difference to be positive or negative and standard deviation 0.001 because we expect any differences in the population to be small, given the general stability of sex ratios and the noisiness of the measure of attractiveness.”

To be fair, I see that narrowing the prior can be justified from a purely probabilistic point of view. If you have the “correct” prior for the “clean” case, for example the effect of true beauty on sex ratio is effectively sampled from a N(0,0.002) distribution, knowing that there is a certain level of attenuation you can easily derive the effect of measured beauty on sex ratio. At least if the “measured beauty” is only partially correlated to the “true beauty” and is not correlated at all to any other factors that could affect the sex ratio. If it is partially measuring beauty and partially measuring something else, the net effect is not trivial to determine. If the “noise” is completely random, you will have in the extreme case (measured beauty uncorrelated to true beauty) a prior equal to zero.

In summary, it’s not impossible that you chose your prior by assuming first a precise prior for the effect of true beauty and then a precise amount of classification error. I guess I cannot accuse from over-precision, given that you said that’s a “fully informative prior”.

Sure, if there is an effect it will be smaller. The attenuation will result in weaker data and the likelihood will move towards zero. Even if you don’t change the prior, the posterior will change as expected. I guess that if you had a prior centered at some value other than zero it would make sense to move the prior accordingly (to reflect the attenuation in the expected effect). I’m not so sure about changing the variance of the prior.

> In answer to your second point: No, I don’t know there’s no difference.

Ok, let me rephrase it. You know that the difference is small (much lower than 1%) and even the most extreme outcome wouldn’t provide enough evidence to suggest otherwise.

Why would the prior depend on the noisiness of the measure of attractiveness? Say I have a prior for some experimental setting. If I had a similar setting with more noise I think I would still use the same prior for the parameter of interest (but maybe there would be a nuisance parameter related to the noise).

I also find that prior very strong. If the beautiful parents had *only girls*, you would estimate the population difference to be just 0.1%. Maybe that’s your point, that the whole study makes no sense because you know that there is no difference and even in the most extreme outcome you wouldn’t really change your mind?

In answer to your first point: noise in x will attenuate the correlation between x and y. Suppose, for example, that there’s some precisely measured “beauty” variable x for which the more beautiful parents are 0.1% more likely to have girls. Now suppose you don’t observe x, instead you observe z, a noisy measure of x, and then you compare the proportion of girls among parents who have high and low values of z. This difference will then be less than 0.1%. It’s called attenuation in econometrics and it’s easy to show analytically or by simulation.

In answer to your second point: No, I don’t know there’s no difference. There is a difference, it’s not zero. Older mothers and younger mothers have (small) differences in Pr(girl), white mothers and black mothers have differences in Pr(girl), etc. Take any two groups and you’ll get different probabilities. But, given all the empirical research on sex ratios (and there’s a lot, because N is huge and the data are just out there for free in birth records), we know that these differences are small. Not zero. Small.

]]>By: Martha (Smith)http://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-578954
Thu, 05 Oct 2017 22:00:22 +0000http://andrewgelman.com/?p=35373#comment-578954+1
]]>By: Huw Llewelynhttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-578944
Thu, 05 Oct 2017 21:39:05 +0000http://andrewgelman.com/?p=35373#comment-578944I was assuming that each study used in the ‘meta-analysis’ was based on random selection from the same population with a single unknown true mean or proportion. Each study would be performed separately and would have different means or proportions but could be regarded as part of one large study and their data pooled to give a better estimate of the true mean or proportion. If this could not be assumed (at least to be roughly true) then I agree it would not work.
]]>By: Daniel Lakelandhttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-578887
Thu, 05 Oct 2017 19:47:00 +0000http://andrewgelman.com/?p=35373#comment-578887Blog post discussing the accuracy of said video:

You could suppose for illustration purposes that say the “feet” of this protein could absorb microwaves selectively because they “walk” at some 1000Mhz or whatever (or the microwave energy is a 1st 2nd or 3rd harmonic of whatever they do). If you add microwave energy, perhaps they vibrate back and forth rather than moving forward, hence a certain thing doesn’t get transported to its appropriate place as quickly, and so some chemical reaction does or does not occur fast enough to prevent some naturally occurring damage. This is more of a heuristic than anything else, obviously I have no particular candidate process in mind, just the idea that the intricate mechanical processes that large bio-molecules undergo could be selectively disrupted due to resonance at microwave frequencies. The more I learn about biology the more impressed I am at how complex it is, but also robust.

Nevertheless, I agree with you about your skepticism that economics will begin to do this. I just don’t think this is because doing it is hard, or wrong, or anything like that, it’s because of politics etc.

]]>By: Huw Llewelynhttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-578864
Thu, 05 Oct 2017 19:00:15 +0000http://andrewgelman.com/?p=35373#comment-578864A fresh unbiased study performed meticulously will continue to converge on the true mean as the number of observations increase. However, unless the prior probability distribution shares the same mean, it will bias the fresh study and delay its convergence on the true mean and thus be counter-productive. It would have a similar biasing effect as ‘P-hacking’. Ideally, the prior data should have been a pilot study for the ‘fresh’ study so that it could be regarded as part of it. In other words, the ‘prior data’ would have to be chosen very carefully. Others reading the study might prefer for the fresh data to be ‘normalized’ on its own to create a ‘fresh’ posterior probability distribution and to use the author’s prior probability as a guide to suggesting his or her own for personal use e.g. deciding to perform another study to replicate or contradict it.
]]>By: Daniel Lakelandhttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-578863
Thu, 05 Oct 2017 18:59:21 +0000http://andrewgelman.com/?p=35373#comment-578863ack, of course the blog ate the angle brackets Stan uses for bounds on p0… sigh.
]]>By: Daniel Lakelandhttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-578862
Thu, 05 Oct 2017 18:58:32 +0000http://andrewgelman.com/?p=35373#comment-578862Bob, a useful construct when you have a region where you really are pretty indifferent, like say +- 1000 but you want to include some weight on the whole real line is something like

parameters{

real p0;
real dp;
}

transformed parameters {
real p;

p = p0 + dp; // a convolution of a uniform with a normal
}

model{

dp ~ normal(0,some_scale);

}

thereby giving you a nice flat plateau in -1000,1000 but convolved with some gaussian to give an infinitely smooth prior over the whole real line.

taking their abstract at face value (possibly not a good idea, but a starting point since I don’t have access to full text) they suggest that resonant absorption of microwaves can certainly affect proteins selectively. It’s at least plausible. Yet, I fully agree with you in the basic point you’re making that policy is being made by people with strong but uninformed priors.

When it comes to power-lines at 60Hz I think the results are completely different, such small objects as proteins are likely to see 60Hz as essentially DC, resonance absorption should be up in the range of microwave ovens certainly above 500Mhz etc.

]]>By: Daniel Lakelandhttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-578854
Thu, 05 Oct 2017 18:35:33 +0000http://andrewgelman.com/?p=35373#comment-578854Yes, I think Huw though is imagining “making up” a dataset that you a-priori think might be representative of the range of stuff you expect to see, then do Bayesian inference on this fake dataset, and see which parameter values are consistent with this fake data thereby backing out a prior for a parameter from what you think the data ought to look like… I like this idea a lot as a way to get informative priors, and since it’s not a weighted average of crappy studies it might be more reasonable, certainly doesn’t suffer from file-drawer and poor research practices etc.
]]>By: Oren Cheyettehttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-578850
Thu, 05 Oct 2017 18:28:11 +0000http://andrewgelman.com/?p=35373#comment-578850Hi Daniel – I knew I recognized your name from somewhere.

Regarding the effect of EM on cells: the problem is that, not only is the radiation non-ionizing, it’s not even comparable to thermal energy. So any effect involving some activation barrier being surmounted by the radiation would already be blown past by ambient thermal noise. Robert Adair (Yale physicist) treated this issue at great length in the early ’90s (back when there were scares about power lines), albeit focusing on lower frequencies where the issue is even more clear cut. (My physics chops are a little rusty, and I don’t have a strong intuition about the resonance idea, except that it seems unlikely at those energies. Cell phone frequencies are below the blackbody peak at room temperature and I’m pretty sure there are a gazillion energy levels accessible to pretty much any large molecule in those ranges, particularly in a liquid environment.)

But at any rate, this is just to emphasize my mechanistic prior, which is evidently different from that of the Chronicle’s health writer and the Berkeley city council, who seem ready to use the uninformed (ha!) prior that every modern technology is carcinogenic unless proven otherwise, and also the studies showing otherwise should be ignored (because they disagree with said prior too strongly).

]]>By: Dale Lehmanhttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-578843
Thu, 05 Oct 2017 18:06:26 +0000http://andrewgelman.com/?p=35373#comment-578843I’ll put it in more traditional economic terms. The textbook Keynesian model says that if the economy is not at full employment, then the multiplier (the effect on GDP of increasing the gov’t budget deficit by $1 = 1/(1-MPC) where MPC is the marginal propensity to consume (the derivative of total consumption spending with respect to income). Since the MPC is around .8, the multiplier would be around 5. Somewhat more sophisticated models incorporate taxes and imports and these will reduce the size of the multiplier somewhat. This model prevailed until the 1970s. Since them, a portion of the economics discipline would claim that the multiplier is 0: any increase in government spending will squeeze out private investment dollar for dollar. Some other extremists would go so far as to make it negative, claiming nefarious influences on the private sector and worrying about what the government spend money on. And, of course, there are some ideas that the multiplier is not at all stable and it depends on many other things, such as consumers going on strike, etc.

But the point is that these differences are deeply rooted in philosophical differences in how people believe the economy works. There is no consensus. We could establish several priors corresponding to different schools of thought and then examine the same evidence in each case. That would be instructive and I would support that. But I don’t think you will see that any time soon – it makes these schools of thought less “scientific” and more “subjective.” If you want to claim these beliefs are wrong, I’m in agreement with you. But I think it is part of the fundamental reason why economists, at least, would resist the advice in Andrew’s post (of course, I could be wrong, since I can’t really speak for most economists).

The different likelihood distributions multiplied together is approximately a weighted average and if the likelihoods are quadratic it is exactly equal to the inverse variance weighted average.

Something more thoughtful is advisable and if such can’t be discerned – flatten the multiplied together likelihood to reflect more uncertainty e.g. raise it to some number less than one (called something like fractional likelihood).

There also will be a related post later this afternoon.

]]>By: Bob Carpenterhttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-578835
Thu, 05 Oct 2017 17:37:29 +0000http://andrewgelman.com/?p=35373#comment-578835How about beauticians, dieticians, and musicians vs. physicists, hypnotists, and barist(a)s?
]]>By: Bob Carpenterhttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-578826
Thu, 05 Oct 2017 17:18:47 +0000http://andrewgelman.com/?p=35373#comment-578826The usual problem is that you only get summary statistics, not Bayesian posteriors. It’d be great if you could just include data from other studies in one big meta-analysis, but that’s rarely possible.

If you did get some kind of Bayesian posterior downstream, there’s the problem of how to compute with it if it’s not conjugate. That’s one of the reasons working directly with other data is easier.

]]>By: Bob Carpenterhttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-578825
Thu, 05 Oct 2017 17:15:30 +0000http://andrewgelman.com/?p=35373#comment-578825If we take a uniform prior over the range plus or minus one million, what does it say probabilisitically?

1. The probability the parameter is in (-1000, 1000) is only 0.1%

2. The probability the parameter is outside of (-1000, 1000) is 99.9%.

That’s probably not the information you want to provide to your Bayesian model if you don’t expect the parameter to have values outside of (-1000, 1000). I keep meaning to write a case study that shows how this works (along with the truncation you get that Daniel Lakeland describes above if you err on the other side and make the boundaries too tight). Andrew’s already written papers showing how the original diffuse inverse gamma priors suggested in the original BUGS examples led to overinflated variance estimates.

if the point estimate obtained from a flat prior is ok but the posterior distribution is too wide maybe the problem is with the likelihood function and not with the flat prior. In any case, I don’t think the problem is that the prior specifies a 99.9% chance an effect size has absolute value greater than 10^305.

Andrew,

I agree and I think I said something similar myself (“What matters is what may be the effect on the inference when this prior is used in the context of the model once we include the data.”). Regarding the paper you link to:

“For a fully informative prior for δ, we might choose normal with mean 0 because we see no prior reason to expect the population difference to be positive or negative and standard deviation 0.001 because we expect any differences in the population to be small, given the general stability of sex ratios and the noisiness of the measure of attractiveness.”

Why would the prior depend on the noisiness of the measure of attractiveness? Say I have a prior for some experimental setting. If I had a similar setting with more noise I think I would still use the same prior for the parameter of interest (but maybe there would be a nuisance parameter related to the noise).

I also find that prior very strong. If the beautiful parents had *only girls*, you would estimate the population difference to be just 0.1%. Maybe that’s your point, that the whole study makes no sense because you know that there is no difference and even in the most extreme outcome you wouldn’t really change your mind?

]]>By: Daniel Lakelandhttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-578817
Thu, 05 Oct 2017 17:05:28 +0000http://andrewgelman.com/?p=35373#comment-578817Also Carlos: from a probabilistic perspective, the flat prior assumes the value *is* enormous. Flat on +- 10^308 has 99.9% probability outside the +- 10^305 region. As soon as you add in an assumption of non-probabilistic estimation (ie. point estimation) the prior has a different effect, which is to not change the location of the maximum, so you might argue that pure maximization based point estimation has no real probabilistic content (in the Bayesian sense of probability on the parameter space). The question I have is if the flat prior results in a Bayesian posterior that makes no sense, why would you necessarily think it would be a good idea to take the maximum a posteriori value from this posterior and call it a good estimate. Later we can get into James-Stein estimation and the inadmissibility of this flat-prior point estimator.
]]>By: Anonymoushttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-578815
Thu, 05 Oct 2017 17:04:13 +0000http://andrewgelman.com/?p=35373#comment-578815Just breaking the math down a bit here in the Bayesian case may help. Suppose you have a prior on a parameter

.

Just a simple uniform prior on an interval. That prior says it’s very unlikely that the value of is small, because

.

]]>By: Daniel Lakelandhttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-578811
Thu, 05 Oct 2017 16:54:27 +0000http://andrewgelman.com/?p=35373#comment-578811Oren, re the cell phone stuff, I basically agree with you, but the idea that non-ionizing radiation could cause cancer is I think a little more nuanced. Any enzyme associated with DNA repair or oxidative stress or whatnot that could be activated or inactivated by selective absorption of microwave type radiation could cause cancer over time through this indirect method, basically inhibiting the ability of the cells to cope with naturally occurring processes, or increasing the rate at which those naturally occurring processes occur. If I wanted to study such things I’d be looking at molecular resonances of the proteins to see if their chemical kinetics or protein folding configurations could be affected by absorption of certain wavelengths…

Also, hi from an ex colleague, assuming there aren’t too many Oren Cheyettes in the SF Bay area.

]]>By: Daniel Lakelandhttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-578805
Thu, 05 Oct 2017 16:48:36 +0000http://andrewgelman.com/?p=35373#comment-578805This is a really useful way to get nuanced priors.
]]>By: Oren Cheyettehttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-578793
Thu, 05 Oct 2017 16:27:46 +0000http://andrewgelman.com/?p=35373#comment-578793Perhaps I’ve missed earlier posts on this, but it seems to me that there is ever more attention in media to dubious health studies afflicted with unstated priors and forking paths. Two recent instances that come to mind are the IARC designation of a common herbicide as a “probable carcinogen” based on a handful of cases of one type of rare cancer in one study of ag workers (forking paths), and the recent NIH/NTP assessment that cell phone radiation may be carcinogenic, based on a rat study with low single-digit counts of two rare cancers and unexplained differences between male & female rates and also signal type (GSM vs. CDMA). (So that’s forking paths plus priors – at least for people with physical science backgrounds, there’s a pretty strong prior against the idea that non-ionizing radiation at levels too low to cause measurable heating could cause any genetic damage.)

Both studies continue to get media attention – out here in the Bay Area, we were just treated to an alarmist story by the SF Chronicle’s health writer on the risk of smart watches, quoting heavily from two go-to figures in the “cell phones will give us all cancer” community and mentioning the NIH/NTP report. Particularly at the local level, a lot of questionable policy gets made based on these sorts of reports – e.g., Berkeley on cell phone warnings and Petaluma on herbicides used by public maintenance staff.

]]>By: Huw Llewelynhttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-578765
Thu, 05 Oct 2017 15:47:12 +0000http://andrewgelman.com/?p=35373#comment-578765PS. If there are no real prior data sets, then a pseudo-data set would could be ‘imagined’ subjectively based on informal experience or theories, its subjective likelihood distribution arrived at and normalized to give a prior non-baseline prior probability distribution.
]]>By: Huw Llewelynhttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-578758
Thu, 05 Oct 2017 15:28:09 +0000http://andrewgelman.com/?p=35373#comment-578758I agree with Andrew about taking prior evidence into account in a measured and carefully reasoned way. My understanding is that it is the same as doing a pre-study meta-analysis in a Bayesian way of thinking to arrive at a Bayesian prior probability distribution (and including it in a paper’s introduction). The new data (i.e. its likelihood distribution) is then interpreted against this prior background and used to update the meta-analysis to create the Bayesian posterior probability distribution (for the discussion).

The prior probability can be based on a series of data sets each being assumed to share the same ‘true’ mean but each with its own likelihood distribution. The likelihood densities of the different likelihood distributions can be multiplied together to form a joint likelihood distribution and then ‘normalising’ the latter so that all the posterior probabilities sum to 1 (normalisation always assumes that the ‘baseline prior is uniform or flat for random sampling, which is correct – see my blog: https://blog.oup.com/2017/06/suspected-fake-results-in-science/). The resulting posterior probability becomes the prior probability distribution for the new study. This is multiplied by the likelihood distribution of the new study data and normalised again to give the latest updated posterior probability distribution (to be discussed in the ‘discussion’ section of the paper).

Talking about resistance, I spent the morning trying to figure out how to convince an action editor that a bunch of low-powered big effects is not as convincing as a small effect from a large-sample study. First I have to demonstrate how Type M error arises… the news has apparently not reached psychology.

Following on some thoughts on priors for economic “multiplier effects” but we’d run out of reply room above.

Let’s let t be defined in years, and the “one year future total consumption per capita” function be

C1(t) = integrate(C(t+s)ds,s,0,1)

Where C(t) is the sum of all transactions that occur on a given day divided by the population N divided by 1/365 to put C(t) in units of dollars per person per year. C(t) is a piecewise constant function over each day.

Now, I take the 1 year multiplier effect to be

C1(t) if we have the government spent G dollars per capita (Call this C1_G(t)), where G dollars is any number between 0.001 times GDP/capita and 0.01 times GDP/capita (we assume an intermediate asymptotic stability of the effect for these moderately small spending levels)

minus

C1(t) if we don’t spend the G dollars per capita

divided by G

M = (C1_G(t) – C1(t))/G

Now clearly, this quantity depends on our choice of 1 year as the time period of interest, but we might expect that we’d get a similar effect for a range of window lengths from say 1/2 year to 2 years and so it’s *not extremely sensitive* to the window length. This is partly due to the fact that we average over 320 million people, and that we integrate our function over a full year or so, thereby smoothing out short term fluctuations quite a bit.

Next we note that logically we can in fact get quite large negative values, as I say if everyone in the country goes on strike because the Nazi party comes into power and whatnot… then C1_G(t) could go to zero, while C1(t) the counterfactual would have been something like 57000 $/person but… it’s extremely unlikely

In fact, for the most part, we’d expect this number to be something like 1 as the increase in GDP caused by spending G dollars per person would be something like G dollars per person, divided by G we’d get 1. So probably the peak of the prior density should be 1.

Furthermore it also seems like we could easily get 0, where each dollar spent by the govt causes someone to withhold a dollar of spending. This would be the case where we’re pretty much just doing a straight transfer from one group of people to another…. So the prior should be wide enough that 0 has density that is not so much lower than the density at 1. Finally, it’s reasonable that you might activate a lot of activity by your government spending, if it’s targeted properly (maybe you stimulate the economy of a depressed region, where lots of labor is available but little free cash for example). So you should be considering quantities out into the range of 2 or 3.

With all this in mind… an initial prior seems like normal(1.0,2.0) would be a good place to start, including values well into the negative range, and well above 1.0 but giving 1.0 the peak.

]]>By: Robert Krausehttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-578691
Thu, 05 Oct 2017 13:17:40 +0000http://andrewgelman.com/?p=35373#comment-578691I am a regular reader of the blog, but I am neither a mathematician, statistician or econometrician (or any other -ician), maybe I am best described as an (aspiring) methodologist*. Priors are quite important in my work but I am not sure how to apply this discussion. My work is about Multiple Imputation procedures for missing data in network models (exponential random graph models -ERGM, and stochastic actor based models – SAOM) and I started to use Bayesian estimation methods because of they allow to create “proper” imputations in the sense of Rubin. My initial models had very vague priors, Normal(0,100), as it was the default. Obviously these were “bad” priors, because these models are comparable to logistic regression and given the statistics that are multiplied with the parameters, values outside say +/-10 are absolutely unrealistic. I went with it, because I did not know enough about Bayesian statistics.
However, once you create enough missing data you cannot estimate the models anymore, because some of the statistics are not observed enough anymore to estimate the parameters (e.g. I had posteriors from -100 to +150). Luckily I came across a youtube video of one of Andrew’s presentations about weakly informative priors, where he discussed a similar issue that parameter could not be estimated, because there was (nearly?) no data for it. Now, using these priors, Normal(0,4), the models converge nicely with 50% of the data missing (which in networks means that for many statistics you have 75% of the data missing). My point is that my main reason to choose this prior is pragmatical, you cannot run the model with a flat prior. I therefore wonder how much of this discussion applies directly to my choice of prior?

*I am not a native English speaker (as you might have guessed), but is there an difference between -icians and -ists? It seems to me the -icians (statist-, econometr-, psychometr-, mathemat-,…) have a better understanding of what they are doing compared to the -ists (psycholog-, sociolog-, biolog-,…).

]]>By: Daniel Lakelandhttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-578683
Thu, 05 Oct 2017 12:59:28 +0000http://andrewgelman.com/?p=35373#comment-578683* the basic idea makes sense… Editing on phone, sentence fragments…
]]>By: Daniel Lakelandhttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-578681
Thu, 05 Oct 2017 12:51:14 +0000http://andrewgelman.com/?p=35373#comment-578681Aha I see we are secretly pointing out the same thing. The truth is that although the basic idea that spending money can induce growth in the economy “the multiplier” really doesn’t exist as a well defined thing. So it’s not surprising that the numerical value is controversial ;-)
]]>By: Waynehttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-578680
Thu, 05 Oct 2017 12:44:07 +0000http://andrewgelman.com/?p=35373#comment-578680Andrew,

Actually, I phrased that last sentence poorly. I hear “let the data speak for itself” a lot, and like you I disagree with it, in two ways:

In a Bayesian/Frequentist context I prefer Bayesian which says that we need to make prior knowledge (common wisdom, our assumptions, etc) explicitly part of the model and then let the data push things around, speaking more loudly or more softly depending on how much and how strong it is.

In a general Data Science context, the methods and models we use will find a signal, if that’s possible. But the signal may not be what we hope it is. It could be a “leak from the future” in the data, which is very common. It could be a bot “clicking” on links rather than a potential customer. Heck, almost every engagement I go into doesn’t have a data dictionary and that data doesn’t speak for itself. (In fact, when I make the mistake of thinking I hear it talking based on the name of a field, I’m often deceived because that name doesn’t mean what I think it means.) So the data doesn’t actually speak for itself in this context either.

Only in the narrow sense of “don’t necessarily believe what ‘experts’ say about the data” does “let the data speak for itself” make sense to me.

]]>By: Dale Lehmanhttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-578678
Thu, 05 Oct 2017 12:41:49 +0000http://andrewgelman.com/?p=35373#comment-578678You are trying to build a dynamic model – a worthy goal, but not what the multiplier was designed to represent. It is a comparative statics result: if we increase government spending by $1 (without increasing taxes), what is the final increase in GDP after the system equilibrates. There is still a time dimension, as many economists will give different answers if you ask what the change in GDP will be after 6 months or after 1 year, etc. Also, the answer will vary, depending on the initial state of the economy (extent of unemployment, etc.). We can put more detail in and I have no doubt you can provide a prior distribution that will be defensible (as well as open to criticism). But this misses my point. I don’t believe you can provide a prior that can be said to represent “consensus” because there is none. And, while I do think the effort would be worthwhile, I think you will find great resistance to this approach for the very reasons I am trying to convey. I think the resistance to specifying priors is, in large part, a resistance to revealing that the emperor has no clothes. After all, economists pride themselves on being more scientific than the other social sciences.
]]>By: Keith O'Rourkehttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-578666
Thu, 05 Oct 2017 12:10:39 +0000http://andrewgelman.com/?p=35373#comment-578666I think so – and in general that is well argued here – Calestous Juma. From coffee to tractors: Why fear of loss inspires resistance to new technology. And today’s pragmatic Bayesian approaches are new technology – just the theorem is old.

]]>By: Daniel Lakelandhttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-578661
Thu, 05 Oct 2017 12:07:54 +0000http://andrewgelman.com/?p=35373#comment-578661I get the basic / intuitive idea behind the “multiplier” effect, my big issue is that I don’t see how it can be defined *precisely* to give a universal way of calculating it. Let me explain

Suppose we take C(t) to be the total consumption by all members of the US at time t, a continuous function of time. Well, of course we know, like in the stock market, that consumption is not continuous. When I buy a sandwich a few dollars is transferred all at once. This is not the same thing as saying that all day long I spent a few pennies each hour…

You might think this is pedantic, but it seems to me the “multiplier” effect is some kind of derivative, how much total consumption changes when some particular amount of consumption by a certain party occurs. d something / d something

But the derivative is an unbounded operator, and it doesn’t even exist for a discrete series of transactions… and so we can really only discuss this in terms of taking the real series of discrete transactions, smoothing them in some way, and then defining our derivative of this smoothed thing… Fine, but then the result we get is dependent on the way in which we do the smoothing… Is there a way to define all of this in such a way that the result is largely independent of our choice of smoothing method for a wide range of smoothing methods? If so, we’re in the same situation as we get when trying to represent a steel bar using continuum mechanics, sure it’s atoms, but if we smooth the atoms by a smoothing kernel of width greater than 100 atomic distances and less than 1mm which is quite a few orders of magnitude… the results are nearly the same.

It’s less obvious to me how this would work for consumption. First off, consumption clearly has a very strong daily oscillation. I buy very little at midnight, and quite a bit more at noon. So any smoothing we do must be over a timescale large with respect to a day. But, there’s also clearly seasonal effects in consumption, christmas is big for retail, summer is big for travel… so smoothing seems to need to be large with respect to a year! But over decades technology and policy and things all change a lot. So I don’t think we’re ever in any regime where a smoothing based view of what’s going on really applies very well.

Now of course we’re interested in a causal effect, spending G government dollars causes some change in something, over some time period relative to what it would have been if the G event hadn’t occurred…. So it’s not a simple derivative in time, it’s a counterfactual about how much consumption would occur in some time period after the G event compared to what would have happened in the absence of G… But defining this in a way that is insensitive to the choice of time period still seems impossible. You could for example do a truncated Laplace transform (ie. discount all future consumption out to some window according to some discount rate) but then you’ll wind up with a result that’s very sensitive to the discount rate and the truncation window.

So, if you want to do a particular analysis, and you want to choose a particular way of doing the calculation, then I can give some particulars of the appropriate prior. All this is to back-up the assertion that Andrew made in a recent paper: The choice of prior is intimately connected to the choice of likelihood / data model.

]]>By: Keith O'Rourkehttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-578660
Thu, 05 Oct 2017 12:07:19 +0000http://andrewgelman.com/?p=35373#comment-578660Strange that this is not commonly done – the technical challenges are not that hard http://andrewgelman.com/wp-content/uploads/2011/05/plot13.pdf

(Actually that was the reason the journal editor gave for rejecting the paper – not enough technical innovation to justify publication in my prestigious journal)

]]>By: Keith O'Rourkehttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-578656
Thu, 05 Oct 2017 12:00:54 +0000http://andrewgelman.com/?p=35373#comment-578656> Constructing a prior is work.
It was the original motivation for the work I did in meta-analysis (to get prior for cost/benefit analysis of funding for clinical trials).

A little bit of thought about this soon suggests you don’t want some weighted average of the (mostly crappy) studies that happened to get published. Or maybe it takes more than a little thought…

As we discuss in this paper, the prior can often only be understood in the context of the likelihood. In particular, a sample average or maximum likelihood estimate can be “quite reasonable” in some contexts but not in others. In a setting where measurements are accurate and plentiful and the goal is an estimate of a simple parameter whose value is not near the boundary of parameter space, then, sure, the flat prior can work. In a setting where measurements are noisy, sample size is not huge, and the goal is something more specific, then maximum likelihood or Bayesian inference with a flat prior can give bad answers: estimates with bad frequency properties, with high bias, high variance, high type M errors, high type S errors, the whole deal.

]]>By: Daniel Lakelandhttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-578647
Thu, 05 Oct 2017 11:27:14 +0000http://andrewgelman.com/?p=35373#comment-578647Carlos, this is intimately tied up in the insistence on a point estimate though. The behavior of a point estimate of course is far less affected by the clearly wrong tails of the prior because the location of the point estimate is determined by the location of the optimum which is totally insensitive to the tails.

This is of course by design for the person who distrusts priors, nevertheless as soon as you want to construct a measure of uncertainty or a risk and utility based decision you have a different story.

The risks associated with point estimation when outcomes and their consequences can vary widely are significant. If a posterior distribution is tightly peaked near your point estimate then things are ok, if there is nontrivial width then that flat prior can be deadly for your decisions as you wind up considering possibilities well outside what anyone actually thinks might happen, simply because no one wants to be in charge of justifying a prior choice. Walds theorem applies whether the user of statistics likes it or not.

]]>By: Dale Lehmanhttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-578640
Thu, 05 Oct 2017 11:07:51 +0000http://andrewgelman.com/?p=35373#comment-578640Daniel
You can start here (http://marginalrevolution.com/?s=multiplier). Of course, that is not an authoritative source and it represents the more right wing side of economics – Krugman would have a somewhat different take. But I have no doubt you can generate a prior – or even two or three. And, I believe doing that would be superior to conducting a new study using some data and declaring a confidence interval for the *true* size of the multiplier from that single study. I am not disagreeing with the post or your comments here – I am providing my view for much of the underlying resistance to change and clinging to these frequentist methods. If our estimates for the size of the multiplier shift depending on which prior you choose – and I believe they would – then it exposes the entire enterprise to be a sort of mathematical trick, a way to couch a subjective belief as “scientific.” And, who wants to do that? (only real scientists perhaps).
]]>By: a readerhttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-578611
Thu, 05 Oct 2017 10:34:54 +0000http://andrewgelman.com/?p=35373#comment-578611+1
]]>By: Carlos Ungilhttp://andrewgelman.com/2017/10/04/worry-rigged-priors/#comment-578603
Thu, 05 Oct 2017 10:23:09 +0000http://andrewgelman.com/?p=35373#comment-578603> it automatically assumes the value in question is ridiculously enormous.

A flat prior doesn’t assume that it *is* enormous, it assumes that it *could be* enormous. An informative prior may be better, but an uninformative prior is not obviously stupid. What matters is what may be the effect on the inference when this prior is used in the context of the model once we include the data.

If you say that the flat prior means that you expect the value of interest to be greater than 10^305 you make it look stupid.

If you say that the flat prior means that you will take the mean of the data to estimate the value of interest it looks much less stupid, actually it looks quite reasonable.

Let’s say you measure the height of a sample of people to estimate the average height in the population and you get mean=170cm. Maybe you have reasons to think you should correct it a bit in either direction, but taking the 170cm at face value is not obviously stupid. If you get mean=512km there are issues with your model or experimental setup much worse than the fact that the prior doesn’t rule out that value.

Of course nothing is normal, all models are wrong, etc. Everyone understands that if we say that the height in a population is normally distributed with such and such mean and standard deviation this is just an approximation. The median and the mode might be different from the mean, the shape of the distribution around the mean might be far from normal, and surely there are no negative heights or heights larger than 10^305.