Xarray engine: variable key

The variable_key option in to_xarray() controls what metadata key should be used to form the DataArray variables. By default it is “param”, which is an alias to “shortName” for GRIB in earthkit-data.

Please note it is also possible to generate an Xarray with a single dataarray containing all the parameters from a GRIB fieldlist. See e.g. the Xarray engine: mono variable notebook for details.

[1]:
import earthkit.data as ekd
ds_fl = ekd.from_source("sample", "pl.grib")
ds = ds_fl.to_xarray()
ds
[1]:
<xarray.Dataset> Size: 176kB
Dimensions:                  (forecast_reference_time: 4, step: 2, level: 2,
                              latitude: 19, longitude: 36)
Coordinates:
  * forecast_reference_time  (forecast_reference_time) datetime64[ns] 32B 202...
  * step                     (step) timedelta64[ns] 16B 00:00:00 06:00:00
  * level                    (level) int64 16B 500 700
  * latitude                 (latitude) float64 152B 90.0 80.0 ... -80.0 -90.0
  * longitude                (longitude) float64 288B 0.0 10.0 ... 340.0 350.0
Data variables:
    r                        (forecast_reference_time, step, level, latitude, longitude) float64 88kB ...
    t                        (forecast_reference_time, step, level, latitude, longitude) float64 88kB ...
Attributes:
    class:        od
    stream:       oper
    levtype:      pl
    type:         fc
    expver:       0001
    date:         20240603
    time:         0
    domain:       g
    number:       0
    Conventions:  CF-1.8
    institution:  ECMWF

The param_level key

The built-in “param_level” metadata key combines the values of the “param” and “level” metadata keys as a str. We can use it as variable_key to build our Xarray.

[2]:
ds = ds_fl.to_xarray(variable_key="param_level")
ds
[2]:
<xarray.Dataset> Size: 176kB
Dimensions:                  (forecast_reference_time: 4, step: 2,
                              latitude: 19, longitude: 36)
Coordinates:
  * forecast_reference_time  (forecast_reference_time) datetime64[ns] 32B 202...
  * step                     (step) timedelta64[ns] 16B 00:00:00 06:00:00
  * latitude                 (latitude) float64 152B 90.0 80.0 ... -80.0 -90.0
  * longitude                (longitude) float64 288B 0.0 10.0 ... 340.0 350.0
Data variables:
    r500                     (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    r700                     (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    t500                     (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    t700                     (forecast_reference_time, step, latitude, longitude) float64 44kB ...
Attributes:
    class:        od
    stream:       oper
    levtype:      pl
    type:         fc
    expver:       0001
    date:         20240603
    time:         0
    domain:       g
    number:       0
    Conventions:  CF-1.8
    institution:  ECMWF

This technique can come in handy when there are parameters with different level types in the input data.

[3]:
ds_fl = ekd.from_source("sample", "mixed_pl_sfc.grib")
ds = ds_fl.to_xarray(variable_key="param_level")
ds
[3]:
<xarray.Dataset> Size: 1MB
Dimensions:                  (forecast_reference_time: 4, step: 2,
                              latitude: 19, longitude: 36)
Coordinates:
  * forecast_reference_time  (forecast_reference_time) datetime64[ns] 32B 202...
  * step                     (step) timedelta64[ns] 16B 00:00:00 06:00:00
  * latitude                 (latitude) float64 152B 90.0 80.0 ... -80.0 -90.0
  * longitude                (longitude) float64 288B 0.0 10.0 ... 340.0 350.0
Data variables: (12/32)
    2t0                      (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    msl0                     (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    r1000                    (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    r300                     (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    r400                     (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    r500                     (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    ...                       ...
    z1000                    (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    z300                     (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    z400                     (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    z500                     (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    z700                     (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    z850                     (forecast_reference_time, step, latitude, longitude) float64 44kB ...
Attributes:
    class:        od
    stream:       oper
    type:         fc
    expver:       0001
    date:         20240603
    time:         0
    domain:       g
    number:       0
    Conventions:  CF-1.8
    institution:  ECMWF

Using remapping

We can take it one step further and define a metadata key that combines the param, the level and the level type into a single key. We can achieve it by using the remapping option.

[4]:
ds_fl = ekd.from_source("sample", "mixed_pl_sfc.grib")
ds = ds_fl.to_xarray(variable_key="p_l_t", remapping={"p_l_t": "{param}_{levelist}_{levtype}"})
ds
[4]:
<xarray.Dataset> Size: 1MB
Dimensions:                  (forecast_reference_time: 4, step: 2,
                              latitude: 19, longitude: 36)
Coordinates:
  * forecast_reference_time  (forecast_reference_time) datetime64[ns] 32B 202...
  * step                     (step) timedelta64[ns] 16B 00:00:00 06:00:00
  * latitude                 (latitude) float64 152B 90.0 80.0 ... -80.0 -90.0
  * longitude                (longitude) float64 288B 0.0 10.0 ... 340.0 350.0
Data variables: (12/32)
    2t_sfc                   (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    msl_sfc                  (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    r_1000_pl                (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    r_300_pl                 (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    r_400_pl                 (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    r_500_pl                 (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    ...                       ...
    z_1000_pl                (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    z_300_pl                 (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    z_400_pl                 (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    z_500_pl                 (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    z_700_pl                 (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    z_850_pl                 (forecast_reference_time, step, latitude, longitude) float64 44kB ...
Attributes:
    class:        od
    stream:       oper
    type:         fc
    expver:       0001
    date:         20240603
    time:         0
    domain:       g
    number:       0
    Conventions:  CF-1.8
    institution:  ECMWF

This technique is particularly useful when the same parameter is available on multiple level types in the input data. In this case using “param_level” does not result in a full hypercube, however the same remapping that we used above does.

[5]:
ds_fl = ekd.from_source("sample", "mixed_pl_ml.grib")
ds = ds_fl.to_xarray(variable_key="p_l_t", remapping={"p_l_t": "{param}_{levelist}_{levtype}"})
ds
[5]:
<xarray.Dataset> Size: 351kB
Dimensions:                  (forecast_reference_time: 4, step: 2,
                              latitude: 19, longitude: 36)
Coordinates:
  * forecast_reference_time  (forecast_reference_time) datetime64[ns] 32B 202...
  * step                     (step) timedelta64[ns] 16B 00:00:00 06:00:00
  * latitude                 (latitude) float64 152B 90.0 80.0 ... -80.0 -90.0
  * longitude                (longitude) float64 288B 0.0 10.0 ... 340.0 350.0
Data variables:
    t_137_ml                 (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    t_500_pl                 (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    t_700_pl                 (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    t_90_ml                  (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    u_137_ml                 (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    u_500_pl                 (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    u_700_pl                 (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    u_90_ml                  (forecast_reference_time, step, latitude, longitude) float64 44kB ...
Attributes:
    class:        od
    stream:       oper
    type:         fc
    expver:       0001
    date:         20240603
    time:         0
    domain:       g
    Conventions:  CF-1.8
    institution:  ECMWF
[ ]: