Xarray engine: auxiliary coordinates¶
[1]:
import earthkit.data as ekd
Basic example¶
First, we get some GRIB data containing control and perturbed forecasts.
[2]:
ds_fl = ekd.from_source("sample", "ens_cf_pf.grib").to_fieldlist()
ds_fl.ls(extra_keys=["metadata.dataType"])
[2]:
| parameter.variable | time.valid_datetime | time.base_datetime | time.step | vertical.level | vertical.level_type | ensemble.member | geography.grid_type | metadata.dataType | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | t | 2024-06-03 00:00:00 | 2024-06-03 | 0 days 00:00:00 | 500 | pressure | 0 | regular_ll | cf |
| 1 | t | 2024-06-03 06:00:00 | 2024-06-03 | 0 days 06:00:00 | 500 | pressure | 0 | regular_ll | cf |
| 2 | t | 2024-06-03 00:00:00 | 2024-06-03 | 0 days 00:00:00 | 500 | pressure | 1 | regular_ll | pf |
| 3 | t | 2024-06-03 00:00:00 | 2024-06-03 | 0 days 00:00:00 | 500 | pressure | 2 | regular_ll | pf |
| 4 | t | 2024-06-03 06:00:00 | 2024-06-03 | 0 days 06:00:00 | 500 | pressure | 1 | regular_ll | pf |
| 5 | t | 2024-06-03 06:00:00 | 2024-06-03 | 0 days 06:00:00 | 500 | pressure | 2 | regular_ll | pf |
Using the Xarray engine keyword aux_coords one can declare an auxiliary coordinate "forecast_type" whose values are derived from the GRIB metadata key "dataType"and depend on a single dimension "member".
[3]:
ds = ds_fl.to_xarray(
aux_coords={"forecast_type": ("metadata.dataType", ("member",))},
)
ds.load()
[3]:
<xarray.Dataset> Size: 33kB
Dimensions: (member: 3, step: 2, latitude: 19, longitude: 36)
Coordinates:
* member (member) <U1 12B '0' '1' '2'
forecast_type (member) <U2 24B 'cf' 'pf' 'pf'
* step (step) timedelta64[ns] 16B 00:00:00 06:00:00
* latitude (latitude) float64 152B 90.0 80.0 70.0 ... -70.0 -80.0 -90.0
* longitude (longitude) float64 288B 0.0 10.0 20.0 ... 330.0 340.0 350.0
Data variables:
t (member, step, latitude, longitude) float64 33kB 250.2 ......
Attributes:
Conventions: CF-1.8
institution: ECMWFMore elaborate example: quantiles in a probabilistic forecast¶
Let us now consider a probabilistic forecast of 2-metre temperature.
[4]:
ds_fl2 = ekd.from_source("sample", "quantiles_pd.grib").to_fieldlist()
In this dataset, the fields are indexed by the GRIB metadata key "quantile", which is in turn composed of "number" and "numberOfForecastsInEnsemble"
[5]:
ds_fl2.ls(
keys=[
"metadata.shortName",
"metadata.dataDate",
"metadata.dataTime",
"metadata.stepRange",
"metadata.dataType",
"metadata.quantile",
"metadata.number",
"metadata.numberOfForecastsInEnsemble",
]
)
[5]:
| metadata.shortName | metadata.dataDate | metadata.dataTime | metadata.stepRange | metadata.dataType | metadata.quantile | metadata.number | metadata.numberOfForecastsInEnsemble | |
|---|---|---|---|---|---|---|---|---|
| 0 | 2tp | 20251209 | 0 | 0-168 | pd | 1:3 | 1 | 3 |
| 1 | 2tp | 20251209 | 0 | 0-168 | pd | 1:5 | 1 | 5 |
| 2 | 2tp | 20251209 | 0 | 0-168 | pd | 1:10 | 1 | 10 |
| 3 | 2tp | 20251209 | 0 | 0-168 | pd | 2:3 | 2 | 3 |
| 4 | 2tp | 20251209 | 0 | 0-168 | pd | 2:5 | 2 | 5 |
| 5 | 2tp | 20251209 | 0 | 0-168 | pd | 2:10 | 2 | 10 |
| 6 | 2tp | 20251209 | 0 | 0-168 | pd | 3:3 | 3 | 3 |
| 7 | 2tp | 20251209 | 0 | 0-168 | pd | 3:5 | 3 | 5 |
| 8 | 2tp | 20251209 | 0 | 0-168 | pd | 3:10 | 3 | 10 |
| 9 | 2tp | 20251209 | 0 | 0-168 | pd | 4:5 | 4 | 5 |
| 10 | 2tp | 20251209 | 0 | 0-168 | pd | 4:10 | 4 | 10 |
| 11 | 2tp | 20251209 | 0 | 0-168 | pd | 5:5 | 5 | 5 |
| 12 | 2tp | 20251209 | 0 | 0-168 | pd | 5:10 | 5 | 10 |
| 13 | 2tp | 20251209 | 0 | 0-168 | pd | 6:10 | 6 | 10 |
| 14 | 2tp | 20251209 | 0 | 0-168 | pd | 7:10 | 7 | 10 |
| 15 | 2tp | 20251209 | 0 | 0-168 | pd | 8:10 | 8 | 10 |
| 16 | 2tp | 20251209 | 0 | 0-168 | pd | 9:10 | 9 | 10 |
| 17 | 2tp | 20251209 | 0 | 0-168 | pd | 10:10 | 10 | 10 |
Note that, in this context, the usual meaning of the GRIB metadata key "number" (and the related "numberOfForecastsInEnsemble") is overridden by "quantile". As a result, the ensemble dimension normally derived from "number" is no longer applicable.
For this reason, we must:
declare the GRIB metadata key
"quantile"as an extra dimension, andremove the predefined ensemble dimension
"number", since it would otherwise conflict with the"quantile"dimension.
Still, it might be useful to keep the information carried by "number" and "numberOfForecastsInEnsemble" as auxiliary coordinates.
[6]:
ds2 = ds_fl2.to_xarray(
drop_dims="member",
extra_dims="metadata.quantile",
aux_coords={
"quantile_rank": ("ensemble.member", "metadata.quantile"),
"nquantiles": ("metadata.numberOfForecastsInEnsemble", "metadata.quantile"),
},
)
ds2.load()
[6]:
<xarray.Dataset> Size: 13kB
Dimensions: (quantile: 18, latitude: 7, longitude: 12)
Coordinates:
* quantile (quantile) <U5 360B '10:10' '1:10' '1:3' ... '8:10' '9:10'
quantile_rank (quantile) <U2 144B '10' '1' '1' '1' '2' ... '6' '7' '8' '9'
nquantiles (quantile) int64 144B 10 10 3 5 10 3 5 ... 5 10 5 10 10 10 10
* latitude (latitude) float64 56B 90.0 60.0 30.0 0.0 -30.0 -60.0 -90.0
* longitude (longitude) float64 96B 0.0 30.0 60.0 ... 270.0 300.0 330.0
Data variables:
2tp (quantile, latitude, longitude) float64 12kB 13.37 ... 0.0
Attributes:
Conventions: CF-1.8
institution: ECMWF[ ]: