Sampling weights matter for RCT design?

One of the most important things while designing an intervention is to try to ensure that your study will have enough statistical power to test the hypotheses you're interested in. Picking a large enough sample is one of a variety of things to increase power. Another is block stratified randomization, of which paired randomization is the extreme. Here is some advice from Guido Imbens in a recent paper on Experimental Design:

It is particularly helpful to stratify or pair based on cluster size if (i) cluster size varies substantially, (ii) interest is in average effects for the entire population and (iii) cluster size is potentially correlated with the average effect of the intervention by cluster.

David has linked to this blog by Cyrus Samii a couple of weeks ago, which provides an excellent, albeit technical, discussion of some of the important points from Imbens’ paper and presentations at the 3ie conference in Cuernavaca. I recommend it strongly, he provides particularly a good discussion of Imbens’ other advice that between the full extreme of pair matching on the one hand and complete randomization on the other, you should choose the middle ground of block stratification (because estimators in pair-matched randomization have to use conservative estimate of the variance, which eliminates the gains from pair-matching – particularly in the presence of heterogeneous impacts). There is also some interesting discussion about re-randomization, etc. Like I said, better to click on the link than linger here on these other points.

However, there was less discussion on the issue of blocking on cluster size. For those of us who design (or help others design) clustered experiments, this is a very pertinent issue, not at all an academic one. Suppose that your colleagues asked you to help with evaluating a school-based intervention, in which the stakeholders are all amenable to randomization at the school level. Let’s go through the three conditions Imbens has laid out for the desirability of using cluster size as a blocking variable.

First, schools do come in all sorts of sizes: check. Second, even if you drew a (hopefully representative) sample of schools in Karnataka, your colleague is presumably interested in the effect on all students in Karnataka, and not just the sample: check. Third, and finally, it is entirely plausible that the effect will be different in small schools, compared with larger ones. This could be because school size is correlated with baseline characteristics prognostic of the outcomes, say some test score, or because the propensity to benefit from a certain intervention really varies by school size: check. So, in many cases, all three criteria are likely to be satisfied.

Imbens suggests that, in that case, you should consider block stratification using cluster size. In some sense, criteria (i) and (iii) are ones we have been using before: stratify on baseline characteristics that are known to predict outcomes well (and this will only be the case if there is some decent variation in that variable). So, in the case of the school-based intervention, you might block on baseline test scores, or anthropometrics, or mother’s characteristics, etc. It’s just that I, personally, usually did not consider using cluster size along these variables: as long as it is not perfectly correlated with other variables used from block stratification, it seems that your study might gain some power from employing this.

Furthermore, I hadn’t thought much about criterion (ii). Usually, I try to design experiments where the sample is representative of a larger target population, than use a proper sampling frame to draw a representative sample and utilize sampling weights in the analysis (there are, obviously, research questions for which this is not necessary and the estimand of interest in the average effect in the sample). Use of sampling weights in analysis is fairly straightforward. But, I had not thought hard enough about the implication that if you will be using sampling weights to calculate average population impacts, that might actually have an effect on your initial experimental design. Of course, it makes sense now...

A final point is the possible trade-off from using cluster size as a blocking variable if it means that (due to sample size constraints) I am unable to block on some other variable that I think is more important -- like baseline test scores. So, you may have all three of Imbens' criteria satisfied, but still decide to block on other variables, meaning that satisfying these criteria is not a sufficient condition to block on cluster size.