Xarray engine: extra dimensions

[1]:
import earthkit.data as ekd

2D wave spectra example

We analyse a 2D wave spectra product at 2025-12-15 00 UTC and 03 UTC. A specific feature of this dataset is that the fields are additionally indexed by wavelength and frequency, on top of the standard temporal dimension.

[2]:
ds_fl = ekd.from_source("sample", "2d-wave-spectra_an.grib").to_fieldlist()

[3]:
ds_fl.ls(keys=["metadata.directionNumber", "metadata.frequencyNumber"])
[3]:
metadata.directionNumber metadata.frequencyNumber
0 1 1
1 2 1
2 3 1
3 4 1
4 5 1
... ... ...
2083 32 29
2084 33 29
2085 34 29
2086 35 29
2087 36 29

2088 rows × 2 columns

To represent this structure in Xarray, the predefined dimensions of the Xarray engine must therefore be complemented with dimensions derived from the metadata keys "directionNumber" and "frequencyNumber" when calling to_xarray()

[4]:
ds = ds_fl.to_xarray(
    extra_dims=["metadata.directionNumber", "metadata.frequencyNumber"],
    add_earthkit_attrs=False,
)
ds
[4]:
<xarray.Dataset> Size: 1MB
Dimensions:                  (directionNumber: 36, frequencyNumber: 29,
                              forecast_reference_time: 2, latitude: 7,
                              longitude: 12)
Coordinates:
  * directionNumber          (directionNumber) int64 288B 1 2 3 4 ... 34 35 36
  * frequencyNumber          (frequencyNumber) int64 232B 1 2 3 4 ... 27 28 29
  * forecast_reference_time  (forecast_reference_time) datetime64[ns] 16B 202...
  * latitude                 (latitude) float64 56B 90.0 60.0 ... -60.0 -90.0
  * longitude                (longitude) float64 96B 0.0 30.0 ... 300.0 330.0
Data variables:
    2dfd                     (directionNumber, frequencyNumber, forecast_reference_time, latitude, longitude) float64 1MB ...
Attributes:
    Conventions:  CF-1.8
    institution:  ECMWF

The extra_dims option also supports defining an explicit mapping between the name of an extra dimension and the corresponding metadata key, in a way that is conceptually similar to dimension roles.

[5]:
ds2 = ds_fl.to_xarray(
    extra_dims=[{"d": "metadata.directionNumber"}, {"f": "metadata.frequencyNumber"}],
    add_earthkit_attrs=False,
)
ds2
[5]:
<xarray.Dataset> Size: 1MB
Dimensions:                  (d: 36, f: 29, forecast_reference_time: 2,
                              latitude: 7, longitude: 12)
Coordinates:
  * d                        (d) int64 288B 1 2 3 4 5 6 7 ... 31 32 33 34 35 36
  * f                        (f) int64 232B 1 2 3 4 5 6 7 ... 24 25 26 27 28 29
  * forecast_reference_time  (forecast_reference_time) datetime64[ns] 16B 202...
  * latitude                 (latitude) float64 56B 90.0 60.0 ... -60.0 -90.0
  * longitude                (longitude) float64 96B 0.0 30.0 ... 300.0 330.0
Data variables:
    2dfd                     (d, f, forecast_reference_time, latitude, longitude) float64 1MB ...
Attributes:
    Conventions:  CF-1.8
    institution:  ECMWF

Quantiles in a probabilistic forecast

Let us now consider a probabilistic forecast of 2-metre temperature.

[6]:
ds_fl2 = ekd.from_source("sample", "quantiles_pd.grib").to_fieldlist()

In this dataset, the fields are indexed by the GRIB metadata key "quantile".

[7]:
ds_fl2.ls(
    keys=[
        "parameter.variable",
        "time.base_datetime",
        "time.step",
        "ensemble.member",
        "metadata.number",
        "metadata.numberOfForecastsInEnsemble",
        "metadata.quantile",
    ]
)
[7]:
parameter.variable time.base_datetime time.step ensemble.member metadata.number metadata.numberOfForecastsInEnsemble metadata.quantile
0 2tp 2025-12-09 7 days 1 1 3 1:3
1 2tp 2025-12-09 7 days 1 1 5 1:5
2 2tp 2025-12-09 7 days 1 1 10 1:10
3 2tp 2025-12-09 7 days 2 2 3 2:3
4 2tp 2025-12-09 7 days 2 2 5 2:5
5 2tp 2025-12-09 7 days 2 2 10 2:10
6 2tp 2025-12-09 7 days 3 3 3 3:3
7 2tp 2025-12-09 7 days 3 3 5 3:5
8 2tp 2025-12-09 7 days 3 3 10 3:10
9 2tp 2025-12-09 7 days 4 4 5 4:5
10 2tp 2025-12-09 7 days 4 4 10 4:10
11 2tp 2025-12-09 7 days 5 5 5 5:5
12 2tp 2025-12-09 7 days 5 5 10 5:10
13 2tp 2025-12-09 7 days 6 6 10 6:10
14 2tp 2025-12-09 7 days 7 7 10 7:10
15 2tp 2025-12-09 7 days 8 8 10 8:10
16 2tp 2025-12-09 7 days 9 9 10 9:10
17 2tp 2025-12-09 7 days 10 10 10 10:10

By default, the ensemble dimension "member" is derived from the "ensemble.member" key. This key itself is extracted from the "number" GRIB key.

In the GRIB listing above we can see the usual meaning of the GRIB metadata key "number" (and the related "numberOfForecastsInEnsemble") is overridden by "quantile". As a result, the ensemble dimension "member" is no longer applicable.

For this reason, we must:

  • declare "quantile" as an extra dimension, and

  • remove the predefined ensemble dimension "member", since it would otherwise conflict with the "quantile" dimension.

[8]:
ds3 = ds_fl2.to_xarray(
    squeeze=False,
    extra_dims="metadata.quantile",
    drop_dims="member",
    add_earthkit_attrs=False,
)
ds3
[8]:
<xarray.Dataset> Size: 13kB
Dimensions:                  (quantile: 18, forecast_reference_time: 1,
                              step: 1, level: 1, level_type: 1, latitude: 7,
                              longitude: 12)
Coordinates:
  * quantile                 (quantile) <U5 360B '10:10' '1:10' ... '9:10'
  * forecast_reference_time  (forecast_reference_time) datetime64[ns] 8B 2025...
  * step                     (step) timedelta64[ns] 8B 7 days
  * level                    (level) int64 8B 0
  * level_type               (level_type) <U7 28B 'surface'
  * latitude                 (latitude) float64 56B 90.0 60.0 ... -60.0 -90.0
  * longitude                (longitude) float64 96B 0.0 30.0 ... 300.0 330.0
Data variables:
    2tp                      (quantile, forecast_reference_time, step, level, level_type, latitude, longitude) float64 12kB ...
Attributes:
    Conventions:  CF-1.8
    institution:  ECMWF

The option ensure_dims vs extra_dims

The extra_dims and ensure_dims options partially overlap in their usage - when introducing a new dimension that must not be squeezed, it is sufficient to list it in ensure_dims. In this case, there is no need to repeat the same dimension in extra_dims.

[9]:
ds4 = ds_fl2.sel({"metadata.quantile": "2:3"}).to_xarray(
    squeeze=True,
    ensure_dims="metadata.quantile",
    drop_dims="member",
    add_earthkit_attrs=False,
)
ds4
[9]:
<xarray.Dataset> Size: 836B
Dimensions:    (quantile: 1, latitude: 7, longitude: 12)
Coordinates:
  * quantile   (quantile) <U3 12B '2:3'
  * latitude   (latitude) float64 56B 90.0 60.0 30.0 0.0 -30.0 -60.0 -90.0
  * longitude  (longitude) float64 96B 0.0 30.0 60.0 90.0 ... 270.0 300.0 330.0
Data variables:
    2tp        (quantile, latitude, longitude) float64 672B ...
Attributes:
    Conventions:  CF-1.8
    institution:  ECMWF
[ ]: