Xarray engine: auxiliary coordinates

[1]:
import earthkit.data as ekd

Basic example

First, we get some GRIB data containing control and perturbed forecasts.

[2]:
ds_fl = ekd.from_source("sample", "ens_cf_pf.grib").to_fieldlist()
ds_fl.ls(extra_keys=["metadata.dataType"])

[2]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type metadata.dataType
0 t 2024-06-03 00:00:00 2024-06-03 0 days 00:00:00 500 pressure 0 regular_ll cf
1 t 2024-06-03 06:00:00 2024-06-03 0 days 06:00:00 500 pressure 0 regular_ll cf
2 t 2024-06-03 00:00:00 2024-06-03 0 days 00:00:00 500 pressure 1 regular_ll pf
3 t 2024-06-03 00:00:00 2024-06-03 0 days 00:00:00 500 pressure 2 regular_ll pf
4 t 2024-06-03 06:00:00 2024-06-03 0 days 06:00:00 500 pressure 1 regular_ll pf
5 t 2024-06-03 06:00:00 2024-06-03 0 days 06:00:00 500 pressure 2 regular_ll pf

Using the Xarray engine keyword aux_coords one can declare an auxiliary coordinate "forecast_type" whose values are derived from the GRIB metadata key "dataType"and depend on a single dimension "member".

[3]:
ds = ds_fl.to_xarray(
    aux_coords={"forecast_type": ("metadata.dataType", ("member",))},
)
ds.load()
[3]:
<xarray.Dataset> Size: 33kB
Dimensions:        (member: 3, step: 2, latitude: 19, longitude: 36)
Coordinates:
  * member         (member) <U1 12B '0' '1' '2'
    forecast_type  (member) <U2 24B 'cf' 'pf' 'pf'
  * step           (step) timedelta64[ns] 16B 00:00:00 06:00:00
  * latitude       (latitude) float64 152B 90.0 80.0 70.0 ... -70.0 -80.0 -90.0
  * longitude      (longitude) float64 288B 0.0 10.0 20.0 ... 330.0 340.0 350.0
Data variables:
    t              (member, step, latitude, longitude) float64 33kB 250.2 ......
Attributes:
    Conventions:  CF-1.8
    institution:  ECMWF

More elaborate example: quantiles in a probabilistic forecast

Let us now consider a probabilistic forecast of 2-metre temperature.

[4]:
ds_fl2 = ekd.from_source("sample", "quantiles_pd.grib").to_fieldlist()

In this dataset, the fields are indexed by the GRIB metadata key "quantile", which is in turn composed of "number" and "numberOfForecastsInEnsemble"

[5]:
ds_fl2.ls(
    keys=[
        "metadata.shortName",
        "metadata.dataDate",
        "metadata.dataTime",
        "metadata.stepRange",
        "metadata.dataType",
        "metadata.quantile",
        "metadata.number",
        "metadata.numberOfForecastsInEnsemble",
    ]
)
[5]:
metadata.shortName metadata.dataDate metadata.dataTime metadata.stepRange metadata.dataType metadata.quantile metadata.number metadata.numberOfForecastsInEnsemble
0 2tp 20251209 0 0-168 pd 1:3 1 3
1 2tp 20251209 0 0-168 pd 1:5 1 5
2 2tp 20251209 0 0-168 pd 1:10 1 10
3 2tp 20251209 0 0-168 pd 2:3 2 3
4 2tp 20251209 0 0-168 pd 2:5 2 5
5 2tp 20251209 0 0-168 pd 2:10 2 10
6 2tp 20251209 0 0-168 pd 3:3 3 3
7 2tp 20251209 0 0-168 pd 3:5 3 5
8 2tp 20251209 0 0-168 pd 3:10 3 10
9 2tp 20251209 0 0-168 pd 4:5 4 5
10 2tp 20251209 0 0-168 pd 4:10 4 10
11 2tp 20251209 0 0-168 pd 5:5 5 5
12 2tp 20251209 0 0-168 pd 5:10 5 10
13 2tp 20251209 0 0-168 pd 6:10 6 10
14 2tp 20251209 0 0-168 pd 7:10 7 10
15 2tp 20251209 0 0-168 pd 8:10 8 10
16 2tp 20251209 0 0-168 pd 9:10 9 10
17 2tp 20251209 0 0-168 pd 10:10 10 10

Note that, in this context, the usual meaning of the GRIB metadata key "number" (and the related "numberOfForecastsInEnsemble") is overridden by "quantile". As a result, the ensemble dimension normally derived from "number" is no longer applicable.

For this reason, we must:

  • declare the GRIB metadata key "quantile" as an extra dimension, and

  • remove the predefined ensemble dimension "number", since it would otherwise conflict with the "quantile" dimension.

Still, it might be useful to keep the information carried by "number" and "numberOfForecastsInEnsemble" as auxiliary coordinates.

[6]:
ds2 = ds_fl2.to_xarray(
    drop_dims="member",
    extra_dims="metadata.quantile",
    aux_coords={
        "quantile_rank": ("ensemble.member", "metadata.quantile"),
        "nquantiles": ("metadata.numberOfForecastsInEnsemble", "metadata.quantile"),
    },
)
ds2.load()
[6]:
<xarray.Dataset> Size: 13kB
Dimensions:        (quantile: 18, latitude: 7, longitude: 12)
Coordinates:
  * quantile       (quantile) <U5 360B '10:10' '1:10' '1:3' ... '8:10' '9:10'
    quantile_rank  (quantile) <U2 144B '10' '1' '1' '1' '2' ... '6' '7' '8' '9'
    nquantiles     (quantile) int64 144B 10 10 3 5 10 3 5 ... 5 10 5 10 10 10 10
  * latitude       (latitude) float64 56B 90.0 60.0 30.0 0.0 -30.0 -60.0 -90.0
  * longitude      (longitude) float64 96B 0.0 30.0 60.0 ... 270.0 300.0 330.0
Data variables:
    2tp            (quantile, latitude, longitude) float64 12kB 13.37 ... 0.0
Attributes:
    Conventions:  CF-1.8
    institution:  ECMWF
[ ]: