Clustered standard errors are popular and very easy to compute in some popular packages such as Stata, but how to compute them in R?

With panel data it's generally wise to cluster on the dimension of the individual effect as both heteroskedasticity and autocorrellation are almost certain to exist in the residuals at the individual level. In fact, Stock and Watson (2008) have shown that the White robust errors are inconsistent in the case of the panel fixed-effects regression model. Interestingly, the problem is due to the incidental parameters and does not occur if T=2. Stata has since changed its default setting to always compute clustered error in panel FE with the robust option.

There have been several posts about computing cluster-robust standard errors in R equivalently to how Stata does it, for example (here, here and here). However, the bloggers make the issue a bit more complicated than it really is.

We can very easily get the clustered VCE with the plm package and only need to make the same degrees of freedom adjustment that Stata does. In Stata, the t-tests and F-tests use G-1 degrees of freedom (where G is the number of groups/clusters in the data). The plm package does not make this adjustment automatically.

The advantage is that only standard packages are required provided we calculate the correct DF manually . One could easily wrap the DF computation into a convenience function. We probably should also check for missing values on the cluster variable. Extending this example to two-dimensional clustering is easy and will be the next post.