Finally, a way to do easy randomization inference in Stata!

Randomization inference has been increasingly recommended as a way of analyzing data from randomized experiments, especially in samples with a small number of observations, with clustered randomization, or with high leverage (see for example Alwyn Young’s paper, and the books by Imbens and Rubin, and Gerber and Green). However, one of the barriers to widespread usage in development economics has been that, to date, no simple commands for implementing this in Stata have been available, requiring authors to program from scratch.

This has now changed with a new command ritest written by Simon Hess, a PhD student who I met just over a week ago at Goethe University in Frankfurt. This command is extremely simple to use, so I thought I would introduce it and share some tips after playing around with it a little. The Stata journal article is also now out.

How do I get this command?
Simply type findit ritest in Stata.
[edit: that will get the version from the Stata journal. However, to get the most recent version with a couple of bug fixes noted below, type

One of the outcomes I consider is truncated profits in the second round follow-up, for firms that were existing at the time . My stata command for doing this in the paper was:

areg s_prof_trunc assigntreat, a(strata) robust

where s_prof_trunc was the profit outcome, and assigntreat my treatment assignment variable, and I am controlling for randomization strata. The p-value from this regression is 0.051.

The code for using this new command is then:ritest assigntreat _b[assigntreat], reps(1000) seed(125) strata(strata): areg s_prof_trunc assigntreat, a(strata) robust

where here I am telling it what the treatment variable is (assigntreat), and what coefficient I want the p-value for, telling it to do randomization inference within the randomization strata, and do use 1000 replications or draws. The output looks like this:

You can see the p-value is 0.047 here, very similar to what I got from the regression. This is individual-level randomization, with a sample size of 497, so we might expect randomization inference to give similar results to regression.
[edit: after the latest update to ritest which now enables replicability, I get 0.058 as the p-value using the seed of 125 in my code.]

2. Individual-level randomization, but with multiple time periods and clustered standard errors
Here I have re-shaped the Nigeria data so that I am estimating a pooled effect over the second and third rounds of follow-up data together. The p-value when I use regression to do this is 0.096. The command is then very similar to that above, except now telling it also to take into account the clustered nature when doing the permutations, by using the cluster option (with uid being my individual identifier):

The p-value I get is 0.100, which is again very similar to the regression value I estimate. In contrast, if I had not specified the cluster() option, it would give me the artificially low p-value of 0.06.

3. Clustered randomization with stratification
A third example I tried comes from ongoing work in Colombia on shortening supply chains. Here we have firms in 63 market blocks, which were formed into 31 strata (pairs and one triplet), with clustered randomization occurring at this market block level. An outcome of interest here is how many days a week firms shop at the central market. The p-value I get in the regression with clustered standard errors is 0.024. Randomization inference is meant to make more of a difference with clustered randomizations with relatively few clusters, so I was curious to see what difference it makes here:ritest b_treat _b[b_treat], reps(5000) cluster(b_block) strata(b_pair) seed(546): areg dayscorab b_treat b_dayscorab miss_b_dayscorab round2 round3, cluster(b_block) a(b_pair)

So here randomization inference does make a big difference – increasing my p-value from 0.024 to 0.106. So if you are doing clustered randomization, randomization inference will be particularly important to use.

Some observations and caveats
This is fantastically easy to use for the simple randomization cases, and should lead to many more people using randomization inference, at least as a check. Here are some observations from my initial uses to keep in mind:

Despite the ability to set the seed for randomization, it currently does NOT seem to give replicable responses. For example, in case 1 above, I ran the code with 1000 replications and the same seed 5 times in a row, and got different p-values each time: 0.047, 0.054, 0.039, 0.053, and 0.057. These are all close, but obviously it is not ideal that the same code will produce different p-values each time it is run.[edit: Simon has now fixed this in the latest version available on his github https://github.com/simonheb/ritest/blob/master/README.md]

This raises the second issue of how many replications you should use. The default is 100 replications. This is likely to be way too low in many settings. Alwyn Young uses 10,000 draws, but says he finds little difference once he goes beyond 2000 draws. Once you are doing enough draws, point 1 will become less of an issue.

Time taken: in my simple case above, it takes my laptop about 35 seconds to run this with 1000 draws, and 3 minutes to do 5000 draws. However, this is after I have dropped all other variables from my dataset. When I had my full dataset, it took more than 2 minutes to do 1000 draws. So if you want this to run relatively fast, drop all the other data from your dataset apart from what is needed for your regression.

The command doesn’t allow for xi to be used in the regression you are running – so if your regression had been using this, you should remove this from your command. Jason Kerwin notes that you can still just put e.g. i.groupvariable to add group variable fixed effects, without using xi.

Jason Kerwin notes to me that if your strata variable is a string rather than numeric, then the code ignores your strata and will give incorrect standard errors. So make sure any strata variables are numeric. [edit: Simon has now fixed this in his latest version, available on his github https://github.com/simonheb/ritest/blob/master/README.md]

More complicated randomizations such as two-level clustered randomization

When the randomization becomes more complicated, you need to do a little more work. As an example, consider my Kenya business training paper, in which there is a double randomization: first markets are randomized into treatment or control markets within market strata, and then within the treated markets, firms are individually randomized to be assigned to training or not within individual strata. The command handles this by either calling on a program you write that shows how this randomization is done, or by using a file of valid permutations that you create. This is still on my to do list to play around with.

7. Multiple treatments:
The standard textbook explanation of randomization inference always seems to involve just a single binary treatment, so that a unit is either treatment or control. Then under the sharp null that all treatment effects are zero, the outcomes for each unit should be the same whether or not they are assigned to treatment or control, and so we can just estimate the treatment effect under all the possible permutations and see in how many of these permutations do we get a coefficient at least as large as the one in the actual data.
However, suppose now that we assign firms to control (T=0), traditional business training (T=1), or personal initiative training (T=2) as in this recent paper. Then we have at least three hypotheses we might want to test: no effect of traditional training, no effect of personal initiative training, and no effect of either training. It seems to me that we then need to use three separate sets of permutations to test these, and that you will need to program up each of these as a file of permutations for the program to then call on. For example, if I had 10 observations and my treatment assignment variable has values (0, 1, 0, 2, 1, 2, 0, 1, 2, 0) then to test no effect of traditional training, I would want to only permute the 0s and 1s with one another, without changing the assignments of those getting personal initiative training. I have yet to program this.

If anyone has examples they want to share of using this command to do randomization inference with multiple treatments or multi-level randomizations, please link to them in the comments.

I was running code as I was reading through and only now saw your comment about replicability...I can at least now replicate the p= 0.058 with repeated commands using the same seed. Changing the seed (to 550) changed the p=0.059 with 1000 reps, so that's encouraging. Also, your Colombia example replicated perfectly...

-ritest- and -permute- are similar. In fact, the code is based on -permute-. Unfortunately -permute- is very restricted in terms of how it permutes the variables of interest, which limits the hypotheses you can test with it. For example if your treatment assignment was clustered, the treatment itself continuous, or the assignment procedure in any other way non-standard (e.g. repeated rerandomizations, branching treatment arms), you will soon hit the limits of what permute can do.

If you use -ritest- with nothing but the strata()-option you can just as well take -permute-.

Hi David, thanks for this useful blog-post. One overall question which I don't think you have discussed before (apologies if I missed it): what do you think about how randomization inference should be operationalized, so to speak, in the randomized experiments literature? Should we start reporting two sets of p-values in results tables, as we did with robust and cluster(-bootstrapped) after that approach gained prominence a decade ago? And if there are significant differences then interpret result as not robust?

I think the norm is still evolving here. If sample sizes are small (either in terms of units or clusters for clustered trials), then just reporting the randomization inference p-values seems a reasonable approach. But I've also seen both sets reported (because they are testing different nulls).

thanks for posting this. It is really helpful and super important!
One thing that currently keeps me busy is the randomization inference with a Difference in Difference estimator.
Idealy I would like to present the single difference estimator, the single difference estimator controlling for previous outcomes (ancova), and the DiD estimator. However, for the latter, of course the treatment group variable changes from 0 to 1 when being in the time during the treatment. This then leads to the error: "b_treat doesnt seem to be constant within clusters".
Any ideas?

It depends a bit on how you specify your regression exactly and I have to guess here because I can't see your code. But your problem seems to be that you're instructing -ritest- to permute the treatment-period-interaction---instead of the treatment-assignment---which you shouldn't do. This interaction is of course not constant within clusters. If that's the case reformulating your Stata-statement following this example should solve your problem:

Hey Simon,
Thanks for putting this together as it is really helpful!
I have a similar question. I'm working with school data that is structured at the student x year level for ten years. I have a treatment that was introduced to a number of schools at different time points (ex: some got it in 2010, some got it in 2015), with the treatment variable as dichotomous (1 during treatment years, 0 during non-treatment years and schools that never received the treatment). I'm currently trying to run an ritest with clustering at the school level. The code looks something like:

This is an excellent resource and write-up. Thanks David and Jason for blogging, and thanks Simon for writing the code.

I have also been struggling with how to test equality of coefficients with two treatments, using randomization inference.

I think the p-value should be constructed by calculating the proportion of times where the *difference* in observed (re-randomized) treatment impacts is larger than the difference in the true treatment impacts.

For example, take a dataset where “b1_estimated” and “b2_estimated” are the stored coefficients of the two treatment dummies, after running the main regression thousands of times, each time randomly re-assigning clusters to treatment. The code to calculate the randomization inference p-value would be:

where “true_difference” is a constant of the true difference in treatment impacts; and "obs" is the number of repetitions. With this code I get a very similar p-value to the simple t-test of equality of coefficients.

But I am not sure if this can be done with the ritest command. At first I thought I could take a short-cut by setting one of the treatment arms as omitted category and throw in a dummy for the control (e.g. gen T0 = treat == 0)

But this under-estimates the p-value. In my data, p2>p1. (Intuitively, the variance of the difference between two normally distributed random variables is higher than the variance of each of the random variable.)

What the right approach is depends very much on your original randomization, about which you did not provide all details. I can however give you some general hints which might help you compute what you want.

(1) if the two treatments (and control) are mutually exclusive, i.e. they can be described by a single treatment variable taking on the values 0, 1, or 2, you might be able to run:

(2) If your treatments are overlapping, it is only slightly more complex to let the re-randomization follow the original procedure. For this you can use the samplingprogram()-option. A user-defined rerandomization program is allowed to alter more than just a single treatment-indicator variable and can thus rerandomize your data in the way you want. You can than still specify "(_b[1.treat]-_b[2.treat])" as the statistic of interest. You can always double-check what ritest does using the saveresampling()-option.

Please let me know if I misunderstood your question or if you have any further questions.

I nevertheless want to allow for correlations at the regional level (or even higher) because two close municipalities cannot be considered as having independent weather conditions, ie: I would like ritest to assign to each commune in one cluster the same year of observation.

Is there a simple way of doing it using your package or should I create my own permutation programme?

You used "ritest ..., ... cluster(municipality)", which lets ritest think that treatment (weather) has been assigned at that level. For two reasons this is not what you seem to want to be doing: (1) you argued for dependence of weather across municipalities, (2) your data has a panel structure, so you have multiple observations per municipality that have different [by assumption independent] weather conditions.

I assume you want to use resample the weather variables, ignoring that there might be inter-temporal dependence but leaving the spacial correlation structure as is. That could be achieved by shuffling the weather data across years. The easiest way to do so would be via your own permutation program. What the easiest approach for that would be depends a lot on your data structure and how exactly you want the resampling, but it could be something like this:

program randomyearmerge
syntax, * //this "swallows" whatever ritest passes on to the program
sort year municipality, stable //to ensure reproducibility
//draw a random year
by year: gen randomyear = round(1980+runiform()*20) if _n==1
//use the same random year for all municipalities
by year: replace randomyear = randomyear[1]
merge m:1 year municipility using weatherdata.dta, nogen //load the weatherdata
end
. ritest mean_temp _b[mean_temp] , samplingprogram(randomyearmerge): xtreg ...
(obviously you'll have to adapt the code to your needs and I can give no guarantee that it does what you want it to do)