Step 2: Which feature type?

Time series (station) data: data is located at named locations, called stations. There can be many stations, and usually for
each station you have multiple data with different time coordinates. Stations have a unique identifier. Examples: weather station data, fixed
buoys.

Profile data: A series of connected observations along a vertical line. Each profile has only one lat, lon coordinate (possibly nominal),
so that the points along the profile differ only in z coordinate and possibly time coordinate. There can be multiple profiles in the same file, and each
profile has a unique identifier If you have many profiles with the same lat, lon location, use the Time series Profile type. Examples:
atmospheric profiles from satellites, moving profilers.

Time series (station) Profile data: Profile data at fixed locations. This is a combination of Time series type and Profile type, so one
has time series of Profiles at fixed locations. A file can contain many stations and many time series at each station. Examples: profilers,
balloon soundings.

Trajectory data: A series of connected observations along a 1D curve in time and space. There can be multiple trajectories in
the same file, each with a unique identifier. Examples: aircraft data, drifting buoys.

Trajectory of Profiles: a collection of profilefeatures which originate along a trajectory. So these are trajectories
which have profile data (varying with z) at each (lat, lon) location. Examples: ship soundings.

Miscellaneous questions and advice

Should I use the unlimited dimension? This can have a huge impact on performance for large files, because it affects the data
layout on disk. The answer is: it depends.

If you have lots of variables at each observation, and you want to optimize the case of getting one or a few variables at all the points, then
don't use the unlimited dimension. This is called column oriented storage.

If you want to optimize the case of getting all or most of the variables at each point, then use the unlimited dimension. This is called row
oriented storage.

For important, long-lived archives, you should test the performance of each case using the read access pattern that you want to optimize.

If you don't know, then my prejudice is to use the unlimited dimension. For small datasets (<10 M ?) it is probably not that important.

Should I use coordinate variables or auxiliary coordinate variables?

A coordinate variable is 1D, and has the same name as its dimension, e.g. float time(time). The coordinate values must be
monotonically increasing or decreasing. There can be no missing values. Use a coordinate variable if those conditions are true.

An auxiliary coordinate variable may have missing values, and is not required to have monotonic, or even unique values. If that's the situation, you
must use an auxiliary coordinate, e.g. float time(sample).

What's the reason to include ids for things like trajectories or profiles?

The "instance" ids allow software like the CDM to efficiently fetch just the data for a named feature, using the id.

How big should I make my files? How should I divide the data between files?

If you have the choice, a fewer number of large files is better than zillions of small files. I would shoot for files in the range 50M - 2 Gbytes.

More important is to divide your files into distinct time ranges, called time partitioned files. This is a natural way to divide earth
science data. It allows the CDM to serve many files as a single dataset using CDM feature collections. For time partitioned files, if possible,
put the partitioning date in the filename.

Why should I bother to do all this extra work?

If you are publicly funded, you should make your data as accessible to others as possible. This is the minimum "extra work" your peers think
is needed for them to be able to use your data. And they sincerely thank you!

Differences from CF

9.1 Limits on coordinate types

Horizontal coordinates:

CF: "In Table 9.1 the spatial coordinates x and y typically refer to longitude and latitude but other horizontal coordinates could also be used (see
sections 4 and 5.6) "

CDM: only latitude and longitude are supported.

Vertical coordinates:

CDM: vertical coordinate may be height or pressure. Dimensionless Vertical Coordinates are not supported.

9.3 Limits on dimension ordering

CF: "In the multidimensional array representations, data variables have both an instance dimension and an element dimension. The dimensions may be
given in any order"

CDM: the instance dimension must be the outer (slowest varying) dimension

9.4 Attribute featureType is required

CF: "A global attribute, featureType, is required for all Discrete Geometry representations except the orthogonal multidimensional
array representation, for which it is highly recommended".

9.5 Feature instance id variable is required

CF: "Where feasible a variable with the attribute cf_role should be included. The only acceptable values of cf_role for Discrete
Geometry CF data sets are timeseries_id, profile_id, and trajectory_id. The variable carrying the cf_role attribute may have any data type. When a
variable is assigned this attribute, it must provide a unique identifier for each feature instance."

CDM: A variable representing the instance id is required, indicated by an attribute named cf_role, which follows all the CF rules above.

Notes on representations

In all cases, latitude, longitude, altitude and time coordinates must be recognized in the usual CF way. The altitude coordinate is optional in some of the
forms.

H.1 Point Data

In the CDM, point data is recognized by the featureType = "point" global attribute. The altitude coordinate is optional. All
coordinates must have the same dimension, called the obs or sample dimension. All variables with the obs dimension as outer dimension are
data variables.

H.2 Time Series Data

In the CDM, this form is recognized by the featureType = "timeSeries" global attribute. The altitude coordinate is optional.

Special station variables are recognized by standard names as given below. For backwards compatibility, the given aliases are allowed.

standard_name

alias

"timeseries_id"

"station_id"

"platform_name"

"station_description"

"surface_altitude"

"station_altitude"

"platform_id"

"station_WMO_id"

H.2.1 / H.2.2 Multidimensional Time Series Representation

The lat, lon and altitude coordinates must have the same dimension, called the station or instance dimension. All variables with the
station dimension as outer dimension are station variables. The time dimension must be of the form time(time) or
time(station, time), where the time dimension is the obs or sample dimension. All data variables must have the form data(station,
time).

For compatibility with earlier versions

ragged_row_count is an alias for sample_dimension standard name

ragged_row_index is an alias for feature_dimension standard name

all attributes can optionally be prefixed by "CF:"

H.2.3. Single time series, including deviations from a nominal fixed spatial location

The CDM uses the axis attribute to choose the correct coordinate. However, it provides no special handling for the precise coordinates.

H.2.4. Contiguous ragged array representation of time series

standard

alias

sample_dimension

CF:ragged_row_count

instance_dimension

CF:ragged_parent_index

H.3.5. Indexed ragged array representation of profiles

Example only shows double time(profile) but double time(obs) is also possible, when the observation
varies by time.

H.5.1. Multidimensional array representations of time series profiles

Specification says "The pressure(i,p,o), temperature(i,p,o), and humidity(i,p,o) data for element o of profile p at station i are associated with the
coordinate values time(i,p), z(i,p,o), lat(i), and lon(i). Any of the three dimensions could be the netCDF unlimited dimension, if it might be useful to be
able enlarge it."

Since CDM currently only allows dimensions to be in the order (station, profile, z), then only the station dimension could be unlimited in the
multidimensional representation.

This document is maintained by John Caron and was last
updated April 2011