Thursday, August 8, 2013

How much of a Bayesian posterior distribution falls inside a region of practical equivalence (ROPE)

The posterior distribution of a parameter shows explicitly the relative credibility of the parameter values, given the data. But sometimes people want to make a yes/no decision about whether a particular parameter value is credible, such as the "null" values of 0.0 difference or 0.50 chance probability. For making decisions about null values, I advocate using a region of practical equivalence (ROPE) along with a posterior highest density interval (HDI). The null value is declared to be rejected if the (95%, say) HDI falls completely outside the ROPE, and the null value is declared to be accepted (for practical purposes) if the 95% HDI falls completely inside the ROPE. This decision rule accepts the null value only when the posterior estimate is precise enough to fall within a ROPE. The decision rule rejects the null only when the posterior exceeds the buffer provided by the ROPE, which protects against hyper-inflated false alarms in sequential testing. And the decision rule is intuitive: You can graph the posterior, its HDI, its ROPE, and literally see what it all means in terms of the meaningful parameter being estimated. (For more details about the decision rule, its predecessors in the literature, and alternative approaches, see the article on this linked web page.)

But how do we set the limits of the ROPE? How big is "practically equivalent to the null value"? There is typically no uniquely correct answer to this question, because it depends on the particular domain of application and the current practice in that domain. But the rule does not need a uniquely correct ROPE, it merely needs a reasonable ROPE that is justifiable to the audience of the analysis. As a field matures, the limits of practical equivalence may change, and therefore what may be most useful to an audience is a description of how much of the posterior falls inside or outside the ROPE as a function of the width of the ROPE. The purpose of this blog post is to provide some examples and an R program for doing that.

Consider a situation in which we want to estimate an effect size parameter, which I'll call δ. Suppose we collect a lot of data so that we have a very precise estimate, and the posterior distribution shows that the effect size is very nearly zero, as plotted in the left panel below:

We see that zero is among the 95% most credible values of the effect size, but we can ask whether we should decide to "accept" the value zero (for practical purposes). If we establish a ROPE around zero, from -0.1 to +0.1 as shown above, then we see that the 95% HDI falls entirely within the ROPE and we accept zero. What if we think a different ROPE is appropriate, either now or in the future? Simply display how much of the posterior distribution falls inside the ROPE, as a function of the ROPE size, as shown in the right panel above. The curve plots how much of the posterior distribution falls inside a ROPE centered at the null value, as a function of the radius (i.e., half width) of the ROPE. As a landmark, the plot shows dashed lines at the 95% HDI limit farthest from the null value. From the plot, readers can decide for themselves.

Here is another example. Suppose we are spinning a coin (or doing something more interesting that has dichotomous outcomes) and we want to estimate its underlying probability of heads, denoted θ. Suppose we collect a lot of data so that we have a precise estimate, as shown in the left panel below.

We see that the chance value of 0.50 is among the 95% most credible values of the underlying probability of heads, but we can ask whether we should decide to "accept" the value 0.50. If we establish a ROPE around zero, from 0.49 to 0.51 as shown above, we can see that the 95% HDI falls entirely within the ROPE and we would decide to accept 0.50. But we can provide more information by displaying the amount of the posterior distribution inside a ROPE centered on the null value as a function of the width of the ROPE. The right panel, above, allows the readers to decide for themselves.

The plots can be used for cases of rejecting the null, too. Suppose we spin a coin and the estimate shows a value apparently not at chance, as shown in the left panel below.

Should we reject the null value? The 95% HDI falls entirely outside the ROPE from 0.49 to 0.51, which means that the 95% most credible values are all not practically equivalent to 0.5. But what if we used a different ROPE? The posterior distribution, along with the explicit marking of the 95% HDI limits, is all we really need to see what the biggest ROPE could be and still let us reject the null. If we want more detailed information, the panel on the right, above, reveals the answer --- although we need to mentally subtract from 1.0 to get the posterior area not in the ROPE (on either side).

Here is the R script I used for creating the graphs above. Notice that it defines a function for plotting the area in the ROPE.

# From http://www.indiana.edu/~kruschke/DoingBayesianDataAnalysis/Programs/ get# openGraphSaveGraph.R, plotPost.R, and HDIofMCMC.R. Put them in the working# directory of this script, or specify the path in the next lines' source()# commands:source("./openGraphSaveGraph.R") source("./plotPost.R") source("./HDIofMCMC.R")

13 comments:

Interesting to read more about ROPE! It does feel a little bit like a Bayesian twist on hypothesis testing. A large ROPE results in a higher chance of accepting the null while a tight ROPE (tightrope?) gives a higher chance of rejecting the null. It would be interesting to see a paper where the ROPE decision rule is compared to other possible decision rules (Neyman–Pearson, Bayes factor, etc.).

By the way, since you call it a ROPE, would it be ok to introduce some extra rope related nomenclature? For example, when the ROPE null can't be rejected nor accepted this could be a "tie", a reasonably tight ROPE could be a "tightrope", an unreasonable tight ROPE could be a "knot", etc.

1. Which do you think more appropriate as a decision criterion a) require the HDI interval to be within ROPE, or b) set a lower limit on the posterior probability contained within ROPE.

2. What about a confidence level? For instance consider the philosophy behind a Bayesian fixed-in-advance beta-content tolerance interval (e.g., see Wolfinger) in which both a content (proportion) and a confidence level are specified. Specifying a confidence level provides a level of assurance that the result is repeatable. I think this question may apply mostly to a posterior predictive situation.

> 1. Which do you think more appropriate as a decision criterion a) require the HDI interval to be within ROPE, or b) set a lower limit on the posterior probability contained within ROPE.

I take the posterior density seriously, which is why it makes sense to me to consider the relation of the HDI to the ROPE. The decision is to "accept" the null value is based on the 95% most probable values all being practically equivalent to the null value. The decision to "reject" he null value is based on the 95% most probable values all not being practically equivalent to the null value.

Suppose instead that we accept a ROPEd value if (say) 95% of the posterior mass falls within the ROPE, and we reject a ROPEd value if (say) 95% of the posterior mass falls outside the ROPE. Then we would get weird conclusions when distributions are strongly skewed. Consider Figure 12.2, p.342, of DBDA2E, which shows a skewed distribution with its 95% HDI and its 95% equal-tailed interval (ETI). Suppose the ROPE boundaries are at the ETI boundaries. Then we would accept the ROPEd value despite the fact that there are parameter values with high posterior probability that are not practically equivalent to the ROPEd value. Or, suppose that the right side of the ROPE is at the left edge of the ETI in Figure 12.2. Then we would reject the ROPEd value despite the fact that there are parameter values with high posterior probability that are practically equivalent to the ROPEd value.

> 2. What about a confidence level? For instance consider the philosophy behind a Bayesian fixed-in-advance beta-content tolerance interval (e.g., see Wolfinger) in which both a content (proportion) and a confidence level are specified. Specifying a confidence level provides a level of assurance that the result is repeatable. I think this question may apply mostly to a posterior predictive situation.

I don't have much to say about this. I think, though, that you're leaning toward a frequentist criterion applied to a Bayesian posterior distribution. There's nothing inherently wrong with that, but it's a different animal than what's being pursued with the HDI and ROPE. With a frequentist criterion you need to make assumptions about your sampling intention --which you can do-- but then the criterion is linked to your specific sampling assumptions.

I think that question is already answered (perhaps implicitly) in my last comment. I like a decision rule that treats the densities seriously. To illustrate with an extreme contrived example, suppose that 98% of the distribution is outside the ROPE, but it is so spread out that its density is actually lower than the 2% of the distribution inside the ROPE. Then the most credible (highest density) values are inside the ROPE, even if most of the distribution is outside the ROPE.

That's why I like a decision rule that actually uses the highest-density values. This isn't a uniquely correct decision rule, it's just one that makes a lot of sense if you treat densities seriously.

Do you think that it would be informative if you defined the ROPE as the 95% HDI of the same data randomized? You could include it as another possibly interesting way point in your plot of the proportion of the posterior inside ROPE as a function of ROPE width.

The ROPE should be defined by considering what it means to be a negligible effect size, what it means for an effect to be "practically equivalent" to the landmark value. That's separate from whatever data you happen to have in the current study.

the decision rules that you mentioned are: accept the null-hypothesis if the HDI is a subset of the ROPE and reject it if HDI and ROPE are non-overlapping. But what happens if they only partially overlap? Do you think it would be meaningful to condense these three cases into two cases, like in frequentist testing: (i) null-hypothesis can be rejected and (ii) null-hypothesis cannot be rejected (intersection of HDI and ROPE is nonempty)? In words, the null-hypothesis could then *not* be rejected if the 95% most credible values do *not* fall outside of the ROPE. My final goal is to compare several means and derive a letter coding, similar in spirit to Fisher's LSD, to indicate which pairs are not "significantly different".