Title:
Cumulo: A Dataset for Learning Cloud Classes

Abstract: One of the greatest sources of uncertainty in future climate projections
comes from limitations in modelling clouds and in understanding how different
cloud types interact with the climate system. A key first step in reducing this
uncertainty is to accurately classify cloud types at high spatial and temporal
resolution. In this paper, we introduce Cumulo, a benchmark dataset for
training and evaluating global cloud classification models. It consists of one
year of 1km resolution MODIS hyperspectral imagery merged with pixel-width
'tracks' of CloudSat cloud labels. Bringing these complementary datasets
together is a crucial first step, enabling the Machine-Learning community to
develop innovative new techniques which could greatly benefit the Climate
community. To showcase Cumulo, we provide baseline performance analysis using
an invertible flow generative model (IResNet), which further allows us to
discover new sub-classes for a given cloud class by exploring the latent space.
To compare methods, we introduce a set of evaluation criteria, to identify
models that are not only accurate, but also physically-realistic. CUMULO can be
download from
this https URL .