DataArray.values and .data now always returns an NumPy array-like
object, even for 0-dimensional arrays with object dtype (GH867).
Previously, .values returned native Python objects in such cases. To
convert the values of scalar arrays to Python objects, use the .item()
method.

DataArray and Dataset method where() now supports a drop=True
option that clips coordinate elements that are fully masked. By
Phillip J. Wolfram.

New top level merge() function allows for combining variables from
any number of Dataset and/or DataArray variables. See Merge
for more details. By Stephan Hoyer.

DataArray and Dataset method resample() now supports the
keep_attrs=False option that determines whether variable and dataset
attributes are retained in the resampled object. By
Jeremy McGibbon.

Better multi-index support in DataArray and Dataset sel() and
loc() methods, which now behave more closely to pandas and which
also accept dictionaries for indexing based on given level names and labels
(see Multi-level indexing). By
Benoit Bovy.

Round trip boolean datatypes. Previously, writing boolean datatypes to netCDF
formats would raise an error since netCDF does not have a bool datatype.
This feature reads/writes a dtype attribute to boolean variables in netCDF
files. By Joe Hamman.

2D plotting methods now have two new keywords (cbar_ax and cbar_kwargs),
allowing more control on the colorbar (GH872).
By Fabien Maussion.

Attributes were being retained by default for some resampling
operations when they should not. With the keep_attrs=False option, they
will no longer be retained by default. This may be backwards-incompatible
with some scripts, but the attributes may be kept by adding the
keep_attrs=True option. By
Jeremy McGibbon.

Concatenating xarray objects along an axis with a MultiIndex or PeriodIndex
preserves the nature of the index (GH875). By
Stephan Hoyer.

Fixed an issue where plots using pcolormesh and Cartopy axes were being distorted
by the inference of the axis interval breaks. This change chooses not to modify
the coordinate variables when the axes have the attribute projection, allowing
Cartopy to handle the extent of pcolormesh plots (GH781). By
Joe Hamman.

2D plots now better handle additional coordinates which are not DataArray
dimensions (GH788). By Fabien Maussion.

This major release includes redesign of DataArray
internals, as well as new methods for reshaping, rolling and shifting
data. It includes preliminary support for pandas.MultiIndex,
as well as a number of other features and bug fixes, several of which
offer improved compatibility with pandas.

The project formerly known as “xray” is now “xarray”, pronounced “x-array”!
This avoids a namespace conflict with the entire field of x-ray science. Renaming
our project seemed like the right thing to do, especially because some
scientists who work with actual x-rays are interested in using this project in
their work. Thanks for your understanding and patience in this transition. You
can now find our documentation and code repository at new URLs:

To ease the transition, we have simultaneously released v0.7.0 of both
xray and xarray on the Python Package Index. These packages are
identical. For now, importxray still works, except it issues a
deprecation warning. This will be the last xray release. Going forward, we
recommend switching your import statements to importxarrayasxr.

The internal data model used by DataArray has been
rewritten to fix several outstanding issues (GH367, GH634,
this stackoverflow report). Internally, DataArray is now implemented
in terms of ._variable and ._coords attributes instead of holding
variables in a Dataset object.

This refactor ensures that if a DataArray has the
same name as one of its coordinates, the array and the coordinate no longer
share the same data.

In practice, this means that creating a DataArray with the same name as
one of its dimensions no longer automatically uses that array to label the
corresponding coordinate. You will now need to provide coordinate labels
explicitly. Here’s the old behavior:

It is no longer possible to convert a DataArray to a Dataset with
xray.DataArray.to_dataset() if it is unnamed. This will now
raise ValueError. If the array is unnamed, you need to supply the
name argument.

xray’s MultiIndex support is still experimental, and we have a long to-
do list of desired additions (GH719), including better display of
multi-index levels when printing a Dataset, and support for saving
datasets with a MultiIndex to a netCDF file. User contributions in this
area would be greatly appreciated.

The handling of colormaps and discrete color lists for 2D plots in
plot() was changed to provide more compatibility
with matplotlib’s contour and contourf functions (GH538).
Now discrete lists of colors should be specified using colors keyword,
rather than cmap.

More informative error message with from_dataframe()
if the frame has duplicate columns.

xray now uses deterministic names for dask arrays it creates or opens from
disk. This allows xray users to take advantage of dask’s nascent support for
caching intermediate computation results. See GH555 for an example.

This release includes numerous bug fixes and enhancements. Highlights
include the introduction of a plotting module and the new Dataset and DataArray
methods isel_points(), sel_points(),
where() and diff(). There are no
breaking changes from v0.5.2.

The optional arguments concat_over and mode in concat() have
been removed and replaced by data_vars and coords. The new arguments are both
more easily understood and more robustly implemented, and allowed us to fix a bug
where concat accidentally loaded data into memory. If you set values for
these optional arguments manually, you will need to update your code. The default
behavior should be unchanged.

open_mfdataset() now supports a preprocess argument for
preprocessing datasets prior to concatenaton. This is useful if datasets
cannot be otherwise merged automatically, e.g., if the original datasets
have conflicting index coordinates (GH443).

open_dataset() and open_mfdataset() now use a
global thread lock by default for reading from netCDF files with dask. This
avoids possible segmentation faults for reading from netCDF4 files when HDF5
is not configured properly for concurrent access (GH444).

Added support for serializing arrays of complex numbers with engine=’h5netcdf’.

The new save_mfdataset() function allows for saving multiple
datasets to disk simultaneously. This is useful when processing large datasets
with dask.array. For example, to save a dataset too big to fit into memory
to one file per year, we could write:

The headline feature in this release is experimental support for out-of-core
computing (data that doesn’t fit into memory) with dask. This includes a new
top-level function open_mfdataset() that makes it easy to open
a collection of netCDF (using dask) as a single xray.Dataset object. For
more on dask, read the blog post introducing xray + dask and the new
documentation section Out of core computation with dask.

Dask makes it possible to harness parallelism and manipulate gigantic datasets
with xray. It is currently an optional dependency, but it may become required
in the future.

The logic used for choosing which variables are concatenated with
concat() has changed. Previously, by default any variables
which were equal across a dimension were not concatenated. This lead to some
surprising behavior, where the behavior of groupby and concat operations
could depend on runtime values (GH268). For example:

fillna works on both Dataset and DataArray objects, and uses
index based alignment and broadcasting like standard binary operations. It
also can be applied by group, as illustrated in
Fill missing values with climatology.

New assign() and assign_coords()
methods patterned off the new DataFrame.assign
method in pandas:

You can now control the underlying backend used for accessing remote
datasets (via OPeNDAP) by specifying engine='netcdf4' or
engine='pydap'.

xray now provides experimental support for reading and writing netCDF4 files directly
via h5py with the h5netcdf package, avoiding the netCDF4-Python package. You
will need to install h5netcdf and specify engine='h5netcdf' to try this
feature.

Accessing data from remote datasets now has retrying logic (with exponential
backoff) that should make it robust to occasional bad responses from DAP
servers.

You can control the width of the Dataset repr with xray.set_options.
It can be used either as a context manager, in which case the default is restored
outside the context:

Fixed a bug where data netCDF variables read from disk with
engine='scipy' could still be associated with the file on disk, even
after closing the file (GH341). This manifested itself in warnings
about mmapped arrays and segmentation faults (if the data was accessed).

This is one of the biggest releases yet for xray: it includes some major
changes that may break existing code, along with the usual collection of minor
enhancements and bug fixes. On the plus side, this release includes all
hitherto planned breaking changes, so the upgrade path for xray should be
smoother going forward.

We now automatically align index labels in arithmetic, dataset construction,
merging and updating. This means the need for manually invoking methods like
align() and reindex_like() should be
vastly reduced.

You can turn this behavior off by supplying the keyword arugment
skipna=False.

These operations are lightning fast thanks to integration with bottleneck,
which is a new optional dependency for xray (numpy is used if bottleneck is
not installed).

Scalar coordinates no longer conflict with constant arrays with the same
value (e.g., in arithmetic, merging datasets and concat), even if they have
different shape (GH243). For example, the coordinate c here
persists through arithmetic, even though it has different shapes on each
DataArray:

We have updated our use of the terms of “coordinates” and “variables”. What
were known in previous versions of xray as “coordinates” and “variables” are
now referred to throughout the documentation as “coordinate variables” and
“data variables”. This brings xray in closer alignment to CF Conventions.
The only visible change besides the documentation is that Dataset.vars
has been renamed Dataset.data_vars.

You will need to update your code if you have been ignoring deprecation
warnings: methods and attributes that were deprecated in xray v0.3 or earlier
(e.g., dimensions, attributes`) have gone away.

The biggest feature I’m excited about working toward in the immediate future
is supporting out-of-core operations in xray using Dask, a part of the Blaze
project. For a preview of using Dask with weather data, read
this blog post by Matthew Rocklin. See GH328 for more details.

Tab-completion for these variables should work in editors such as IPython.
However, setting variables or attributes in this fashion is not yet
supported because there are some unresolved ambiguities (GH300).

You can now use a dictionary for indexing with labeled dimensions. This
provides a safe way to do assignment with labeled dimensions:

Non-index coordinates can now be faithfully written to and restored from
netCDF files. This is done according to CF conventions when possible by
using the coordinates attribute on a data variable. When not possible,
xray defines a global coordinates attribute.

Preliminary support for converting xray.DataArray objects to and from
CDATcdms2 variables.

We sped up any operation that involves creating a new Dataset or DataArray
(e.g., indexing, aggregation, arithmetic) by a factor of 30 to 50%. The full
speed up requires cyordereddict to be installed.

I am contemplating switching to the terms “coordinate variables” and “data
variables” instead of the (currently used) “coordinates” and “variables”,
following their use in CF Conventions (GH293). This would mostly
have implications for the documentation, but I would also change the
Dataset attribute vars to data.

I no longer certain that automatic label alignment for arithmetic would be a
good idea for xray – it is a feature from pandas that I have not missed
(GH186).

The main API breakage that I do anticipate in the next release is finally
making all aggregation operations skip missing values by default
(GH130). I’m pretty sick of writing ds.reduce(np.nanmean,'time').

The next version of xray (0.4) will remove deprecated features and aliases
whose use currently raises a warning.

If you have opinions about any of these anticipated changes, I would love to
hear them – please add a note to any of the referenced GitHub issues.

This is mostly a bug-fix release to make xray compatible with the latest
release of pandas (v0.15).

We added several features to better support working with missing values and
exporting xray objects to pandas. We also reorganized the internal API for
serializing and deserializing datasets, but this change should be almost
entirely transparent to users.

Other than breaking the experimental DataStore API, there should be no
backwards incompatible changes.

Revamped coordinates: “coordinates” now refer to all arrays that are not
used to index a dimension. Coordinates are intended to allow for keeping track
of arrays of metadata that describe the grid on which the points in “variable”
arrays lie. They are preserved (when unambiguous) even though mathematical
operations.

Dataset mathDataset objects now support all arithmetic
operations directly. Dataset-array operations map across all dataset
variables; dataset-dataset operations act on each pair of variables with the
same name.

GroupBy math: This provides a convenient shortcut for normalizing by the
average value of a group.

The dataset __repr__ method has been entirely overhauled; dataset
objects now show their values when printed.

You can now index a dataset with a list of variables to return a new dataset:
ds[['foo','bar']].