data (sequence of ndarrays or 2-D ndarray) – The vectors of functions to create a functional boxplot from. If a
sequence of 1-D arrays, these should all be the same size.
The first axis is the function index, the second axis the one along
which the function is defined. So data[0,:] is the first
functional curve.

ncomp (int, optional) – Number of components to use. If None, returns the as many as the
smaller of the number of rows or columns in data.

If an array, it is a fixed user-specified bandwidth. If None, set to
normal_reference. If a string, should be one of:

normal_reference: normal reference rule of thumb (default)

cv_ml: cross validation maximum likelihood

cv_ls: cross validation least squares

xdata (ndarray, optional) – The independent variable for the data. If not given, it is assumed to
be an array of integers 0..N-1 with N the length of the vectors in
data.

labels (sequence of scalar or str, optional) – The labels or identifiers of the curves in data. If not given,
outliers are labeled in the plot with array indices.

ax (Matplotlib AxesSubplot instance, optional) – If given, this subplot is used to plot in instead of a new figure being
created.

use_brute (bool) – Use the brute force optimizer instead of the default differential
evolution to find the curves. Default is False.

seed ({None, int, np.random.RandomState}) – Seed value to pass to scipy.optimize.differential_evolution. Can be an
integer or RandomState instance. If None, then the default RandomState
provided by np.random is used.

Returns:

fig (Matplotlib figure instance) – If ax is None, the created figure. Otherwise the figure to which
ax is connected.

hdr_res (HdrResults instance) –

An HdrResults instance with the following attributes:

’median’, array. Median curve.

’hdr_50’, array. 50% quantile band. [sup, inf] curves

’hdr_90’, list of array. 90% quantile band. [sup, inf]

curves.

’extra_quantiles’, list of array. Extra quantile band.

[sup, inf] curves.

’outliers’, ndarray. Outlier curves.

Notes

The median curve is the curve with the highest probability on the reduced
space of a Principal Component Analysis (PCA).

Outliers are defined as curves that fall outside the band corresponding
to the quantile given by threshold.

The non-outlying region is defined as the band made up of all the
non-outlying curves.

Behind the scene, the dataset is represented as a matrix. Each line
corresponding to a 1D curve. This matrix is then decomposed using Principal
Components Analysis (PCA). This allows to represent the data using a finite
number of modes, or components. This compression process allows to turn the
functional representation into a scalar representation of the matrix. In
other words, you can visualize each curve from its components. Each curve
is thus a point in this reduced space. With 2 components, this is called a
bivariate plot (2D plot).

In this plot, if some points are adjacent (similar components), it means
that back in the original space, the curves are similar. Then, finding the
median curve means finding the higher density region (HDR) in the reduced
space. Moreover, the more you get away from this HDR, the more the curve is
unlikely to be similar to the other curves.

Using a kernel smoothing technique, the probability density function (PDF)
of the multivariate space can be recovered. From this PDF, it is possible
to compute the density probability linked to the cluster of points and plot
its contours.

Finally, using these contours, the different quantiles can be extracted
along with the median curve and the outliers.

Create a functional boxplot. We see that the years 1982-83 and 1997-98 are
outliers; these are the years where El Nino (a climate pattern
characterized by warming up of the sea surface and higher air pressures)
occurred with unusual intensity.