Xarray engine: remappingΒΆ

[1]:
import earthkit.data as ekd

Remapping used to define a custom dimensionΒΆ

Let us consider 3 ensemble members: 1 control (cf) and 2 perturbed members (pf).

[2]:
ds_fl = ekd.from_source("sample", "ens_cf_pf.grib").to_fieldlist()
ds_fl.ls(extra_keys="metadata.dataType")

[2]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type metadata.dataType
0 t 2024-06-03 00:00:00 2024-06-03 0 days 00:00:00 500 pressure 0 regular_ll cf
1 t 2024-06-03 06:00:00 2024-06-03 0 days 06:00:00 500 pressure 0 regular_ll cf
2 t 2024-06-03 00:00:00 2024-06-03 0 days 00:00:00 500 pressure 1 regular_ll pf
3 t 2024-06-03 00:00:00 2024-06-03 0 days 00:00:00 500 pressure 2 regular_ll pf
4 t 2024-06-03 06:00:00 2024-06-03 0 days 06:00:00 500 pressure 1 regular_ll pf
5 t 2024-06-03 06:00:00 2024-06-03 0 days 06:00:00 500 pressure 2 regular_ll pf

Suppose we want to organise this field list along a custom dimension called "custom_member", whose coordinates are constructed by combining the metadata keys "metadata.dataType" and "ensemble.member", for example: ["cf_0", "pf_1", "pf_2"].

To achieve this, we

  • use the remapping option to define a virtual key "custom_member", and

  • declare "custom_member" as a new dimension.

[3]:
ds = ds_fl.to_xarray(
    remapping={"custom_member": "{metadata.dataType}_{ensemble.member}"},
    extra_dims="custom_member",
    add_earthkit_attrs=False,
)
ds
[3]:
<xarray.Dataset> Size: 33kB
Dimensions:        (custom_member: 3, step: 2, latitude: 19, longitude: 36)
Coordinates:
  * custom_member  (custom_member) <U4 48B 'cf_0' 'pf_1' 'pf_2'
  * step           (step) timedelta64[ns] 16B 00:00:00 06:00:00
  * latitude       (latitude) float64 152B 90.0 80.0 70.0 ... -70.0 -80.0 -90.0
  * longitude      (longitude) float64 288B 0.0 10.0 20.0 ... 330.0 340.0 350.0
Data variables:
    t              (custom_member, step, latitude, longitude) float64 33kB ...
Attributes:
    Conventions:  CF-1.8
    institution:  ECMWF

Note that it is not necessary to explicitly remove the predefined dimension "member" using the drop_dims option. The Xarray engine automatically drops it because it is already incorporated into another dimension β€” in this case, "custom_member".

Below, we present a more elaborate example illustrating how remapping can be used in conjunction with the extra_dims and dims_as_attrs options.

[4]:
ds2 = ds_fl.to_xarray(
    squeeze=True,
    remapping={
        "custom_member": "{metadata.dataType}_{ensemble.member}",
        "mars": "{metadata.class}_{metadata.stream}",
    },
    extra_dims=["custom_member", "mars"],
    dims_as_attrs="mars",
    add_earthkit_attrs=False,
)
ds2
[4]:
<xarray.Dataset> Size: 33kB
Dimensions:        (custom_member: 3, step: 2, latitude: 19, longitude: 36)
Coordinates:
  * custom_member  (custom_member) <U4 48B 'cf_0' 'pf_1' 'pf_2'
  * step           (step) timedelta64[ns] 16B 00:00:00 06:00:00
  * latitude       (latitude) float64 152B 90.0 80.0 70.0 ... -70.0 -80.0 -90.0
  * longitude      (longitude) float64 288B 0.0 10.0 20.0 ... 330.0 340.0 350.0
Data variables:
    t              (custom_member, step, latitude, longitude) float64 33kB ...
Attributes:
    Conventions:  CF-1.8
    institution:  ECMWF

Above, we declared "mars" as a new dimension whose coordinates combine the "class" and "stream" metadata keys. Because this dimension has size 1, it is squeezed by default. However, the "dims_as_attrs" option causes the coordinate value of this dimension to be preserved as a variable attribute.

[5]:
ds2["t"].attrs
[5]:
{'standard_name': 'air_temperature',
 'long_name': 'Temperature',
 'units': 'kelvin',
 'level_type': 'pressure',
 'mars': 'od_enfo'}

Remapping used to define a custom variable nameΒΆ

The following GRIB dataset contains the parameters t and u on both pressure levels and hybrid (model) levels.

[6]:
ds_fl2 = ekd.from_source("sample", "mixed_pl_ml.grib").to_fieldlist()
ds_fl2.ls()

[6]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 t 2024-06-03 00:00:00 2024-06-03 00:00:00 0 days 00:00:00 700 pressure 0 regular_ll
1 u 2024-06-03 00:00:00 2024-06-03 00:00:00 0 days 00:00:00 700 pressure 0 regular_ll
2 t 2024-06-03 00:00:00 2024-06-03 00:00:00 0 days 00:00:00 500 pressure 0 regular_ll
3 u 2024-06-03 00:00:00 2024-06-03 00:00:00 0 days 00:00:00 500 pressure 0 regular_ll
4 t 2024-06-03 06:00:00 2024-06-03 00:00:00 0 days 06:00:00 700 pressure 0 regular_ll
... ... ... ... ... ... ... ... ...
59 u 2024-06-04 12:00:00 2024-06-04 12:00:00 0 days 00:00:00 137 hybrid 0 regular_ll
60 t 2024-06-04 18:00:00 2024-06-04 12:00:00 0 days 06:00:00 90 hybrid 0 regular_ll
61 u 2024-06-04 18:00:00 2024-06-04 12:00:00 0 days 06:00:00 90 hybrid 0 regular_ll
62 t 2024-06-04 18:00:00 2024-06-04 12:00:00 0 days 06:00:00 137 hybrid 0 regular_ll
63 u 2024-06-04 18:00:00 2024-06-04 12:00:00 0 days 06:00:00 137 hybrid 0 regular_ll

64 rows Γ— 8 columns

When converting this field list into an Xarray dataset, we must handle the incompatibility between the level types associated with the same variables. One possible approach is to create a separate variable for each combination of parameter and level, for example: "t_hybrid_90", "t_hybrid_137", "t_pressure_500", "t_pressure_700", and similarly for u.

[7]:
ds3 = ds_fl2.to_xarray(
    remapping={"my_custom_var_key": "{parameter.variable}_{vertical.level_type}_{vertical.level}"},
    variable_key="my_custom_var_key",
    add_earthkit_attrs=False,
)
ds3
[7]:
<xarray.Dataset> Size: 351kB
Dimensions:                  (forecast_reference_time: 4, step: 2,
                              latitude: 19, longitude: 36)
Coordinates:
  * forecast_reference_time  (forecast_reference_time) datetime64[ns] 32B 202...
  * step                     (step) timedelta64[ns] 16B 00:00:00 06:00:00
  * latitude                 (latitude) float64 152B 90.0 80.0 ... -80.0 -90.0
  * longitude                (longitude) float64 288B 0.0 10.0 ... 340.0 350.0
Data variables:
    t_hybrid_137             (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    t_hybrid_90              (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    t_pressure_500           (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    t_pressure_700           (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    u_hybrid_137             (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    u_hybrid_90              (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    u_pressure_500           (forecast_reference_time, step, latitude, longitude) float64 44kB ...
    u_pressure_700           (forecast_reference_time, step, latitude, longitude) float64 44kB ...
Attributes:
    Conventions:  CF-1.8
    institution:  ECMWF

An alternative approach, which results in a more compact hypercube structure, is described below:

[8]:
ds4 = ds_fl2.to_xarray(
    level_dim_mode="level_per_type",
    remapping={"my_custom_var_key": "{parameter.variable}_{vertical.level_type}"},
    variable_key="my_custom_var_key",
    add_earthkit_attrs=False,
)
ds4
[8]:
<xarray.Dataset> Size: 351kB
Dimensions:                  (forecast_reference_time: 4, step: 2, hybrid: 2,
                              latitude: 19, longitude: 36, pressure: 2)
Coordinates:
  * forecast_reference_time  (forecast_reference_time) datetime64[ns] 32B 202...
  * step                     (step) timedelta64[ns] 16B 00:00:00 06:00:00
  * hybrid                   (hybrid) int64 16B 90 137
  * latitude                 (latitude) float64 152B 90.0 80.0 ... -80.0 -90.0
  * longitude                (longitude) float64 288B 0.0 10.0 ... 340.0 350.0
  * pressure                 (pressure) int64 16B 500 700
Data variables:
    t_hybrid                 (forecast_reference_time, step, hybrid, latitude, longitude) float64 88kB ...
    t_pressure               (forecast_reference_time, step, pressure, latitude, longitude) float64 88kB ...
    u_hybrid                 (forecast_reference_time, step, hybrid, latitude, longitude) float64 88kB ...
    u_pressure               (forecast_reference_time, step, pressure, latitude, longitude) float64 88kB ...
Attributes:
    Conventions:  CF-1.8
    institution:  ECMWF