We review the statistical models applied to test for heterogeneous treatment effects in the recent empirical literature, with a particular focus on data from randomized field experiments. We show that testing for heterogeneous treatment effects is highly common, and likely to result in a large number of false discoveries when conventional standard errors are applied. We demonstrate that applying correction procedures developed in the statistics literature can fully address this issue, and discuss the implications of multiple testing adjustments for power calculations and experimental design.