Here I want to focus on a common but subtle problem when trying to estimate these models and how to solve it with a simple trick. Although I was somewhat aware of this trick for quite some time it just recently clicked for me. We will use the same hierarchical linear regression model on the Radon data set from the previous blog post, so if you are not familiar, I recommend to start there.

I will then use the intuitions we've built up to highlight a subtle point about expectations vs modes (i.e. the MAP). Several talks by Michael Betancourt have really expanded my thinking here.

Usually, hierachical models are specified in a centered way. In a regression model, individual slopes would be centered around a group mean with a certain group variance, which controls the shrinkage:

In [2]:

withpm.Model()ashierarchical_model_centered:# Hyperpriors for group nodesmu_a=pm.Normal('mu_a',mu=0.,sd=100**2)sigma_a=pm.HalfCauchy('sigma_a',5)mu_b=pm.Normal('mu_b',mu=0.,sd=100**2)sigma_b=pm.HalfCauchy('sigma_b',5)# Intercept for each county, distributed around group mean mu_a# Above we just set mu and sd to a fixed value while here we# plug in a common group distribution for all a and b (which are# vectors of length n_counties).a=pm.Normal('a',mu=mu_a,sd=sigma_a,shape=n_counties)# Intercept for each county, distributed around group mean mu_ab=pm.Normal('b',mu=mu_b,sd=sigma_b,shape=n_counties)# Model erroreps=pm.HalfCauchy('eps',5)# Linear regressionradon_est=a[county_idx]+b[county_idx]*data.floor.values# Data likelihoodradon_like=pm.Normal('radon_like',mu=radon_est,sd=eps,observed=data.log_radon)