xbpch provides three main utilities for reading bpch files, all of which
are provided as top-level package imports. For most purposes, you should use
open_bpchdataset(), however a lower-level interface, BPCHFile() is also
provided in case you would prefer manually processing the bpch contents.

Path to the metadata “info” .dat files which are used to decipher
the metadata corresponding to each variable in the output dataset.
If not provided, will look for them in the current directory or
fall back on a generic set.

fields : list, optional

List of a subset of variable names to return. This can substantially
improve read performance. Note that the field here is just the tracer
name - not the category, e.g. ‘O3’ instead of ‘IJ-AVG-$_O3’.

categories : list, optional

List a subset of variable categories to look through. This can
substantially improve read performance.

endian : {‘=’, ‘>’, ‘<’}, optional

Endianness of file on disk. By default, “big endian” (“>”) is assumed.

decode_cf : bool

Enforce CF conventions for variable names, units, and other metadata

default_dtype : numpy.dtype, optional

Default datatype for variables encoded in file on disk (single-precision
float by default).

memmap : bool

Flag indicating that data should be memory-mapped from disk instead of
eagerly loaded into memory

dask : bool

Flag indicating that data reading should be deferred (delayed) to
construct a task-graph for later execution

You must have dask installed for this to work, as this greatly
simplifies issues relating to multi-file I/O.

Also, please note that this is not a very performant routine. I/O is still
limited by the fact that we need to manually scan/read through each bpch
file so that we can figure out what its contents are, since that metadata
isn’t saved anywhere. So this routine will actually sequentially load
Datasets for each bpch file, then concatenate them along the “time” axis.
You may wish to simply process each file individually, coerce to NetCDF,
and then ingest through xarray as normal.

Parameters:

paths : list of strs

Filenames to load; order doesn’t matter as they will be
lexicographically sorted before we read in the data

concat_dim : str, default=’time’

Dimension to concatenate Datasets over. We default to “time” since this
is how GEOS-Chem splits output files

compat : str (optional)

String indicating how to compare variables of the same name for
potential conflicts when merging:

‘broadcast_equals’: all values must be equal when variables are
broadcast against each other to ensure common dimensions.

‘equals’: all values and dimensions must be the same.

‘identical’: all values, dimensions and attributes must be the
same.

‘no_conflicts’: only values which are not null in both datasets
must be equal. The returned dataset then contains the combination
of all non-null values.

preprocess : callable (optional)

A pre-processing function to apply to each Dataset prior to
concatenation

lock : False, True, or threading.Lock (optional)

Passed to dask.array.from_array(). By default, xarray
employs a per-variable lock when reading data from NetCDF files,
but this model has not yet been extended or implemented for bpch files
and so this is not actually used. However, it is likely necessary
before dask’s multi-threaded backend can be used