WEMC Tech Blog #4: Calculating EU Country Averages With ERA5 and NUTS

WEMC Tech Blog #4: Calculating EU Country Averages With ERA5 and NUTS

As in previous blogs, this focuses on the Python programming language. For information on installing Python and packages, please see previous technical blogs #1 and #2.

This example will look at how to mask ERA5 data from the European region, downloaded from ECMWF Climate Data Store (CDS). The procedure heavily relies on functions provided by the scitools Iris package, although similar results could be obtained using a combination of packages like xarray and regionmask.

Read in the NUTS (Nomenclature of Territorial Units for Statistics) shapefile using geopandas. NUTS shapefiles are obtained from eurostat, who provide various resolutions and formats to use in your code. For this example, a scale 1:20 Million is used, in the SHP (shapefile) format. Level 0 is the lowest resolution and is a collection of geometric polygons at country level.

# read the NUTS shapefile and extract the polygons for a individual countries
nuts=gpd.read_file('ref-nuts-2016-20m.shp/NUTS_RG_20M_2016_4258_LEVL_0.shp')

A quick check of the NUTS shapefile reveals the geometry values, with some associated identifiers.

nuts

Select a country by its abbreviation (aka ‘NUTS_ID’). For example, ‘FR’ will select the France shapefile.

# set country by NUTS_ID
country = nuts[nuts['NUTS_ID'] == 'FR']

Load in a NetCDF file using Iris. Iris has many built in functions for dealing with NetCDF. As discussed in previous blogs, NetCDF files are a file format for storing multidimensional data. Iris represents this data as ‘cubes’, a convenient way to manipulate NetCDF data, whilst keeping things cf compliant. Here, a NetCDF containing 1 year (2000) of hourly 2 metre temperature, covering the EU region, is loaded.

Once loaded, a quick check of the cube shows it has three dimensions; time, latitude and longitude. With the variable 2 metre temperature (K).

cube

Next, create a function to check whether the data is within the boundaries of the shapefile. Surprisingly, Iris does not have a function to do this out-of-the-box, so I borrowed this good example from a post on stack overflow (credit to user DPeterK).

# function to check whether data is within shapefile - source: http://bit.ly/2pKXnWa
def geom_to_masked_cube(cube, geometry, x_coord, y_coord,
mask_excludes=False):
"""
Convert a shapefile geometry into a mask for a cube's data.
Args:
* cube:
The cube to mask.
* geometry:
A geometry from a shapefile to define a mask.
* x_coord: (str or coord)
A reference to a coord describing the cube's x-axis.
* y_coord: (str or coord)
A reference to a coord describing the cube's y-axis.
Kwargs:
* mask_excludes: (bool, default False)
If False, the mask will exclude the area of the geometry from the
cube's data. If True, the mask will include *only* the area of the
geometry in the cube's data.
.. note::
This function does *not* preserve lazy cube data.
"""
# Get horizontal coords for masking purposes.
lats = cube.coord(y_coord).points
lons = cube.coord(x_coord).points
lon2d, lat2d = np.meshgrid(lons,lats)
# Reshape to 1D for easier iteration.
lon2 = lon2d.reshape(-1)
lat2 = lat2d.reshape(-1)
mask = []
# Iterate through all horizontal points in cube, and
# check for containment within the specified geometry.
for lat, lon in zip(lat2, lon2):
this_point = gpd.geoseries.Point(lon, lat)
res = geometry.contains(this_point)
mask.append(res.values[0])
mask = np.array(mask).reshape(lon2d.shape)
if mask_excludes:
# Invert the mask if we want to include the geometry's area.
mask = ~mask
# Make sure the mask is the same shape as the cube.
dim_map = (cube.coord_dims(y_coord)[0],
cube.coord_dims(x_coord)[0])
cube_mask = iris.util.broadcast_to_shape(mask, cube.shape, dim_map)
# Apply the mask to the cube's data.
data = cube.data
masked_data = np.ma.masked_array(data, cube_mask)
cube.data = masked_data
return cube

The function can now be applied to the cube, in this case checking whether the data is within the France polygon.

Before getting the area average, it’s important to note the current grid is a euclidean space, a flat grid of equal dimensions. To account for the curvature of the earth, create area weights utilising the Iris ‘cosine_latitude_weights’ function.

# use iris function to get area weights
grid_areas = iris.analysis.cartography.cosine_latitude_weights(masked_cube)

These weights can be used in conjunction with the ‘iris_analysis_MEAN’ function to collapse the cube into a 1-dimensional array.

Opening the .csv file should produce a dataset containing the time series and the area averages for that country.

In the future I will revisit this topic with a follow up blog looking at applying a land-sea mask to the dataset. This improves accuracy where land sea boundaries occupy grid areas and applies weights based on this.