Four model intercomparisons were run and evaluated using the TWP-ICE field campaign, each involving different types of atmospheric model. Here we highlight what can be learnt from having single-column model (SCM), cloud-resolving model (CRM), global atmosphere model (GAM) and limited-area model (LAM) intercomparisons all based around the same field campaign. We also make recommendations for anyone planning further large multi-model intercomparisons to ensure they are of maximum value to the model development community. CRMs tended to match observations better than other model types, although there were exceptions such as outgoing long-wave radiation. All SCMs grew large temperature and moisture biases and performed worse than other model types for many diagnostics. The GAMs produced a delayed and significantly reduced peak in domain-average rain rate when compared to the observations. While it was shown that this was in part due to the analysis used to drive these models, the LAMs were also driven by this analysis and did not have the problem to the same extent. Based on differences between the models with parametrized convection (SCMs and GAMs) and those without (CRMs and LAMs), we speculate that that having explicit convection helps to constrain liquid water whereas the ice contents are controlled more by the representation of the microphysics.