Xarray engine: extra dimensions¶
[1]:
import earthkit.data as ekd
2D wave spectra example¶
We analyse a 2D wave spectra product at 2025-12-15 00 UTC and 03 UTC. A specific feature of this dataset is that the fields are additionally indexed by wavelength and frequency, on top of the standard temporal dimension.
[2]:
ds_fl = ekd.from_source("sample", "2d-wave-spectra_an.grib").to_fieldlist()
[3]:
ds_fl.ls(keys=["metadata.directionNumber", "metadata.frequencyNumber"])
[3]:
| metadata.directionNumber | metadata.frequencyNumber | |
|---|---|---|
| 0 | 1 | 1 |
| 1 | 2 | 1 |
| 2 | 3 | 1 |
| 3 | 4 | 1 |
| 4 | 5 | 1 |
| ... | ... | ... |
| 2083 | 32 | 29 |
| 2084 | 33 | 29 |
| 2085 | 34 | 29 |
| 2086 | 35 | 29 |
| 2087 | 36 | 29 |
2088 rows × 2 columns
To represent this structure in Xarray, the predefined dimensions of the
Xarray engine must therefore be complemented with dimensions derived
from the metadata keys "directionNumber" and "frequencyNumber" when calling to_xarray()
[4]:
ds = ds_fl.to_xarray(
extra_dims=["metadata.directionNumber", "metadata.frequencyNumber"],
add_earthkit_attrs=False,
)
ds
[4]:
<xarray.Dataset> Size: 1MB
Dimensions: (directionNumber: 36, frequencyNumber: 29,
forecast_reference_time: 2, latitude: 7,
longitude: 12)
Coordinates:
* directionNumber (directionNumber) int64 288B 1 2 3 4 ... 34 35 36
* frequencyNumber (frequencyNumber) int64 232B 1 2 3 4 ... 27 28 29
* forecast_reference_time (forecast_reference_time) datetime64[ns] 16B 202...
* latitude (latitude) float64 56B 90.0 60.0 ... -60.0 -90.0
* longitude (longitude) float64 96B 0.0 30.0 ... 300.0 330.0
Data variables:
2dfd (directionNumber, frequencyNumber, forecast_reference_time, latitude, longitude) float64 1MB ...
Attributes:
Conventions: CF-1.8
institution: ECMWFThe extra_dims option also supports defining an explicit mapping between the name of an extra dimension and the corresponding metadata key, in a way that is conceptually similar to dimension roles.
[5]:
ds2 = ds_fl.to_xarray(
extra_dims=[{"d": "metadata.directionNumber"}, {"f": "metadata.frequencyNumber"}],
add_earthkit_attrs=False,
)
ds2
[5]:
<xarray.Dataset> Size: 1MB
Dimensions: (d: 36, f: 29, forecast_reference_time: 2,
latitude: 7, longitude: 12)
Coordinates:
* d (d) int64 288B 1 2 3 4 5 6 7 ... 31 32 33 34 35 36
* f (f) int64 232B 1 2 3 4 5 6 7 ... 24 25 26 27 28 29
* forecast_reference_time (forecast_reference_time) datetime64[ns] 16B 202...
* latitude (latitude) float64 56B 90.0 60.0 ... -60.0 -90.0
* longitude (longitude) float64 96B 0.0 30.0 ... 300.0 330.0
Data variables:
2dfd (d, f, forecast_reference_time, latitude, longitude) float64 1MB ...
Attributes:
Conventions: CF-1.8
institution: ECMWFQuantiles in a probabilistic forecast¶
Let us now consider a probabilistic forecast of 2-metre temperature.
[6]:
ds_fl2 = ekd.from_source("sample", "quantiles_pd.grib").to_fieldlist()
In this dataset, the fields are indexed by the GRIB metadata key "quantile".
[7]:
ds_fl2.ls(
keys=[
"parameter.variable",
"time.base_datetime",
"time.step",
"ensemble.member",
"metadata.number",
"metadata.numberOfForecastsInEnsemble",
"metadata.quantile",
]
)
[7]:
| parameter.variable | time.base_datetime | time.step | ensemble.member | metadata.number | metadata.numberOfForecastsInEnsemble | metadata.quantile | |
|---|---|---|---|---|---|---|---|
| 0 | 2tp | 2025-12-09 | 7 days | 1 | 1 | 3 | 1:3 |
| 1 | 2tp | 2025-12-09 | 7 days | 1 | 1 | 5 | 1:5 |
| 2 | 2tp | 2025-12-09 | 7 days | 1 | 1 | 10 | 1:10 |
| 3 | 2tp | 2025-12-09 | 7 days | 2 | 2 | 3 | 2:3 |
| 4 | 2tp | 2025-12-09 | 7 days | 2 | 2 | 5 | 2:5 |
| 5 | 2tp | 2025-12-09 | 7 days | 2 | 2 | 10 | 2:10 |
| 6 | 2tp | 2025-12-09 | 7 days | 3 | 3 | 3 | 3:3 |
| 7 | 2tp | 2025-12-09 | 7 days | 3 | 3 | 5 | 3:5 |
| 8 | 2tp | 2025-12-09 | 7 days | 3 | 3 | 10 | 3:10 |
| 9 | 2tp | 2025-12-09 | 7 days | 4 | 4 | 5 | 4:5 |
| 10 | 2tp | 2025-12-09 | 7 days | 4 | 4 | 10 | 4:10 |
| 11 | 2tp | 2025-12-09 | 7 days | 5 | 5 | 5 | 5:5 |
| 12 | 2tp | 2025-12-09 | 7 days | 5 | 5 | 10 | 5:10 |
| 13 | 2tp | 2025-12-09 | 7 days | 6 | 6 | 10 | 6:10 |
| 14 | 2tp | 2025-12-09 | 7 days | 7 | 7 | 10 | 7:10 |
| 15 | 2tp | 2025-12-09 | 7 days | 8 | 8 | 10 | 8:10 |
| 16 | 2tp | 2025-12-09 | 7 days | 9 | 9 | 10 | 9:10 |
| 17 | 2tp | 2025-12-09 | 7 days | 10 | 10 | 10 | 10:10 |
By default, the ensemble dimension "member" is derived from the "ensemble.member" key. This key itself is extracted from the "number" GRIB key.
In the GRIB listing above we can see the usual meaning of the GRIB metadata key "number" (and the related "numberOfForecastsInEnsemble") is overridden by "quantile". As a result, the ensemble dimension "member" is no longer applicable.
For this reason, we must:
declare
"quantile"as an extra dimension, andremove the predefined ensemble dimension
"member", since it would otherwise conflict with the"quantile"dimension.
[8]:
ds3 = ds_fl2.to_xarray(
squeeze=False,
extra_dims="metadata.quantile",
drop_dims="member",
add_earthkit_attrs=False,
)
ds3
[8]:
<xarray.Dataset> Size: 13kB
Dimensions: (quantile: 18, forecast_reference_time: 1,
step: 1, level: 1, level_type: 1, latitude: 7,
longitude: 12)
Coordinates:
* quantile (quantile) <U5 360B '10:10' '1:10' ... '9:10'
* forecast_reference_time (forecast_reference_time) datetime64[ns] 8B 2025...
* step (step) timedelta64[ns] 8B 7 days
* level (level) int64 8B 0
* level_type (level_type) <U7 28B 'surface'
* latitude (latitude) float64 56B 90.0 60.0 ... -60.0 -90.0
* longitude (longitude) float64 96B 0.0 30.0 ... 300.0 330.0
Data variables:
2tp (quantile, forecast_reference_time, step, level, level_type, latitude, longitude) float64 12kB ...
Attributes:
Conventions: CF-1.8
institution: ECMWFThe option ensure_dims vs extra_dims¶
The extra_dims and ensure_dims options partially overlap in their usage - when introducing a new dimension that must not be squeezed, it is sufficient to list it in ensure_dims. In this case, there is no need to repeat the same dimension in extra_dims.
[9]:
ds4 = ds_fl2.sel({"metadata.quantile": "2:3"}).to_xarray(
squeeze=True,
ensure_dims="metadata.quantile",
drop_dims="member",
add_earthkit_attrs=False,
)
ds4
[9]:
<xarray.Dataset> Size: 836B
Dimensions: (quantile: 1, latitude: 7, longitude: 12)
Coordinates:
* quantile (quantile) <U3 12B '2:3'
* latitude (latitude) float64 56B 90.0 60.0 30.0 0.0 -30.0 -60.0 -90.0
* longitude (longitude) float64 96B 0.0 30.0 60.0 90.0 ... 270.0 300.0 330.0
Data variables:
2tp (quantile, latitude, longitude) float64 672B ...
Attributes:
Conventions: CF-1.8
institution: ECMWF[ ]: