This document outlines an abstract model for data and metadata corresponding to the CF metadata standard (version 1.5). CF is a primarily a convention for storing data in netCDF, and up to now has not presented a data model. However, the design of CF implies a data model to some extent, and this document is proposed to make it explicit. If adopted as an element of CF, the CF data model description will be updated in line with CF standard. The data model avoids prescribing more than is needed for interpreting CF as it stands, in order to avoid inconsistency with future developments of CF. This document is illustrated by the accompanying UML diagram of the data model.

As well as describing the CF data model, this document also comments on how it is implemented in netCDF. Since the CF data model could be implemented in file formats other than netCDF, it would be logically better to put the information about CF-netCDF in a separate document, but when introducing the data model for the first time, we feel that this document would be harder to understand if it omitted reference to the netCDF information. We propose that these functions should be separated in a later version of the data model. Some parts of the CF standard arise specifically from the requirements or restrictions of the netCDF file format, or are concerned with efficient ways of storing data on disk; these parts are not logically part of the data model and are only briefly mentioned in this document.

In this document, we use the word "construct" because we feel it to be a more language-neutral term than "object" or "structure". The constructs of this data model might correspond to objects in an OO language.

This data model makes a central assumption that each space construct is independent. Data variables stored in CF-netCDF files are often not independent, because they share coordinate variables. However, we view this solely as a means of saving disk space, and we assume that software will be able to alter any space construct in memory without affecting other space constructs. For instance, if the coordinates of one space construct are modified, it will not affect any other space construct. Explicit tests of equality will be required to establish whether two data variables have the same coordinates. Such tests are necessary in general if CF is applied to a dataset comprising more than one file, because different variables may then reside in different files, with their own coordinate variables.

Each space construct may have

- An ordered list of one or more
**dimension coordinate constructs**(or "dimensions" for short). - A
**data array**whose shape is determined by the dimensions in the order listed, excluding any dimensions of size one. If there are no dimensions of greater size than one, the data array is a scalar. Dimensions of size one are omitted because their position in the order of dimensions makes no difference to the order of data elements in the array. The elements of the data array must all be of the same data type, which may be numeric, character or string. - An unordered collection of
**auxiliary coordinate constructs**. - An unordered collection of
**cell measure constructs**. - A
**cell methods construct**, which refers to the dimensions (but not their sizes). - An unordered collection of
**transforms**. - Other
**properties**, which are metadata that do not refer to the dimensions, and serve to describe the data the space contains. Properties may be of any data type (numeric, character or string) and can be scalars or arrays. They are attributes in the netCDF file, but we use the term "property" instead because not all CF-netCDF attributes are properties in this sense. - A list of
**ancillary spaces**. This corresponds to the CF-netCDF`ancillary_variables`attribute, which identifies other spaces that provide metadata.

The CF-netCDF `formula_terms` (see also **Transforms**) and
`ancillary_variables` attributes make links between space constructs.
These links are fragile.
If a space construct is written to a file, it is not required that any
other space constructs to which it is linked are also written to the file.
If an operation alters one space
construct in a way which could invalidate a relationship with another space
construct, the link should be broken. The user of software will have to be
aware of these relationships and remake them if applicable and useful.

- A
**size**(an integer greater than zero), which can be equal to one. In CF-netCDF, there is a formal distinction between scalar coordinate variables and size-one coordinate variables, but they are logically the same; CF-netCDF supports scalar coordinate variables for simplicity and convenience in the netCDF file. An example of a size-one dimension is a vertical dimension for 1.5 m height. In this data model, a CF-netCDF scalar coordinate variable is regarded as a dimension coordinate construct with a size of unity.

- A one-dimensional numerical
**coordinate array**of the size specified for the dimension. If the size of the dimension is greater than one, the elements of the coordinate array must all be of the same numeric data type, they must all have different non-missing values, and they must be monotonically increasing or decreasing. Dimension coordinate constructs cannot have string-valued coordinates. In this data model, a CF-netCDF string-valued coordinate variable or string-valued scalar coordinate variable corresponds to an auxiliary coordinate construct (not a dimension coordinate construct), with a dimension whose coordinate construct has no coordinate array. - A two-dimensional
**boundary coordinate array**, whose slow-varying (second in Fortran) dimension equals the size specified by the dimension coordinate construct, and whose fast-varying dimension is two, indicating the extent of the cell. For climatological time dimensions, the bounds are interpreted in a special way indicated by the cell methods. - Properties (in the same sense as for the space construct) serving to describe the coordinates.

In this data model we permit a dimension not to have a coordinate array
if there is no appropriate numeric monotonic coordinate.
That is the case for a dimension that runs over ocean basins or area
types, for example, or for a dimension that indexes timeseries at
scattered points. Such dimensions do not correspond to a continuous
physical quantity.
(They will be called **index dimensions** in CF version 1.6.)

- A list of some (at least one) of the dimensions of the space construct in any order.

- A coordinate array with dimension sizes corresponding to the list of dimensions of the auxiliary coordinate construct. If there is a dimension with size greater than one, the elements of the coordinate array must all be of the same data type (numeric, character or string), but they do not have to be distinct or monotonic. Missing values are not allowed (in CF version 1.5).
- A boundary coordinate array with all the dimensions, in the same order, as the coordinate array, and a fastest-varying dimension (first dimension in Fortran) equal to the number of vertices of each cell.
- Properties serving to describe the coordinates.

- A list of some of the dimensions of the space construct in any order.
- Properties to describe itself.

- A
**measure property**, which indicates which metric of the grid it supplies e.g. cell areas. - A
**units property**consistent with the measure property e.g. m2. - A numeric array of metric values having the dimensions listed, excluding any dimensions of size one, or a scalar metric value if no dimensions of size greater than one are given. If there is a dimension with size greater than one, the elements of the array must all be of the same data type. It is assumed that the metric does not depend on any of the dimensions of the space which are not specified, and the values are implicitly propagated along these dimensions.

- A
**transform name**which indicates the nature of the transformation and implies the formulae to be used. A CF-netCDF file does not explicitly record the formulae; it depends on the application software knowing what to do. - An unordered collection of
**terms**, which are scalar parameters, pointers to dimension or auxiliary coordinate constructs of the space construct, and pointers to other space constructs. Each member of the collection has a particular role in the formulae.

Transforms correspond to the functions of the CF-netCDF attributes
`formula_terms`, which describes how to compute a vertical coordinate
variable from components (CF Appendix D),
and `grid_mapping`, which describes how to transform between
longitude-latitude space and the horizontal coordinates of the space construct
(CF Appendix F).
The transform name is the `standard_name` of a vertical coordinate
variable with `formula_terms`, and the `grid_mapping_name`
of a `grid_mapping` variable.
The scalar parameters are scalar data variables (which should
have `units` if dimensional) named by `formula_terms`,
and attributes of `grid_mapping` variables
(for which the units are specified by the transform).
The role of each term in the formulae of the transform is
identified by its keyword in a `formula_terms` attribute,
or its attribute name in a `grid_mapping` variable.

The attributes
`valid_max`,
`valid_min` and
`valid_range`
of data variables and coordinate variables are checks on the validity of
the values, which could be verified on input and written on output.
In this CF data model we assume they do not constrain any manipulations
which might be done on the data in memory,
and they are not part of the data model.

The attributes
`_FillValue` and
`missing_value`
of data variables specify how missing data is indicated in the data array.
This data model supports the idea of missing data, but does not depend on
any particular method of indicating it, so these attributes
are not part of the model.

The attributes
`add_offset`,
`compress`,
`flag_masks`,
`flag_meanings`,
`flag_values` and
`scale_factor`
are all used in methods of compressing the data to save space
in CF-netCDF files,
with or without loss of information.
They are not part of this data model because these operations do not
logically alter the data,
except that the `compress` attribute implies two alternative
interpretations of coordinates (compressed or uncompressed).
The "feature type" attribute and associated new conventions,
to be introduced in CF version 1.6,
will provide a way of packing multiple data
spaces of the same kind of discrete sampling geometry
(timeseries, trajectories, etc.) into a single CF-netCDF data variable,
in order to save space, since a multidimensional representation with
common coordinate variables is typically very wasteful in such cases.
This is a kind of compression. The data model would regard each instance
of the feature type as an independent space construct.
However, the "feature type" attribute itself is also a metadata property
that would be a property of the space construct and part of the data model.

The attributes
`bounds`,
`cell_measures`,
`cell_methods`,
`climatology`,
`Conventions`,
`coordinates`,
`formula_terms` and
`grid_mapping`
have various special or structural functions in the CF-netCDF file format.
Their functions and
the relationships they indicate are reflected in the structure
of this data model,
and these attributes do not correspond directly to
properties in the data model.

1st August 2011

Original version 0.1 of 10th January 2011

Jonathan Gregory and David Hassell