Xarray engine: size-1 dimension as a variable attribute

First, we get some GRIB forecast data on single levels and read it into a GRIB fieldlist.

[1]:
import earthkit.data as ekd
[2]:
ds_fl = ekd.from_source("sample", "aifs-sfc.grib").to_fieldlist()

Examine the metadata content of the field list (for readibility, we select a single forecast start time and step).

[3]:
ds_fl.sel({"time.forecast_reference_time": "2025-12-12T00", "time.step": 0}).ls()
[3]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 tcw 2025-12-12 2025-12-12 0 days 0 entire_atmosphere 0 regular_ll
1 msl 2025-12-12 2025-12-12 0 days 0 mean_sea 0 regular_ll
2 10u 2025-12-12 2025-12-12 0 days 10 height_above_ground_level 0 regular_ll
3 2t 2025-12-12 2025-12-12 0 days 2 height_above_ground_level 0 regular_ll
4 tp 2025-12-12 2025-12-12 0 days 0 surface 0 regular_ll

We see that each variable has a different level and/or type of level. This would be an obstacle in forming an xarray Dataset object with an explicit level dimension (in any of the form provided by level_dim_mode). However, we can use an option dims_as_attrs to turn size-1 level dimensions into variables’ attributes, instead of just squeezing them.

Below, we use the default level_dim_mode="level" which builds "level" and "level_type" dimensions, and therefore these two dimensions are turned into attributes.

[4]:
ds = ds_fl.to_xarray(
    profile=None,
    dims_as_attrs=["level", "level_type"],
    add_earthkit_attrs=False,
    # The last option disables adding a spectial "_earthkit" attribure.
    # It is used just to improve attribute readability which we inspect next.
)
ds
[4]:
<xarray.Dataset> Size: 14kB
Dimensions:                  (forecast_reference_time: 2, step: 2, latitude: 7,
                              longitude: 12)
Coordinates:
  * forecast_reference_time  (forecast_reference_time) datetime64[ns] 16B 202...
  * step                     (step) timedelta64[ns] 16B 00:00:00 06:00:00
  * latitude                 (latitude) float64 56B 90.0 60.0 ... -60.0 -90.0
  * longitude                (longitude) float64 96B 0.0 30.0 ... 300.0 330.0
Data variables:
    10u                      (forecast_reference_time, step, latitude, longitude) float64 3kB ...
    2t                       (forecast_reference_time, step, latitude, longitude) float64 3kB ...
    msl                      (forecast_reference_time, step, latitude, longitude) float64 3kB ...
    tcw                      (forecast_reference_time, step, latitude, longitude) float64 3kB ...
    tp                       (forecast_reference_time, step, latitude, longitude) float64 3kB ...

Inspection of the variables’ attributes give this:

[5]:
{v: ds[v].attrs for v in ds}
[5]:
{'10u': {'level': 10, 'level_type': 'height_above_ground_level'},
 '2t': {'level': 2, 'level_type': 'height_above_ground_level'},
 'msl': {'level': 0, 'level_type': 'mean_sea'},
 'tcw': {'level': 0, 'level_type': 'entire_atmosphere'},
 'tp': {'level': 0, 'level_type': 'surface'}}

Using level_dim_mode="level_per_type" we can declare the template dimension "<level_per_type>" to be turned into variables’ attributes. This transforms the actual dimensions like "surface", "height_above_ground_level", etc. to variables’ attributes, since all of them are of size 1.

[6]:
ds2 = ds_fl.to_xarray(
    profile=None,
    level_dim_mode="level_per_type",
    dims_as_attrs="<level_per_type>",
    add_earthkit_attrs=False,
)
ds2
[6]:
<xarray.Dataset> Size: 14kB
Dimensions:                  (forecast_reference_time: 2, step: 2, latitude: 7,
                              longitude: 12)
Coordinates:
  * forecast_reference_time  (forecast_reference_time) datetime64[ns] 16B 202...
  * step                     (step) timedelta64[ns] 16B 00:00:00 06:00:00
  * latitude                 (latitude) float64 56B 90.0 60.0 ... -60.0 -90.0
  * longitude                (longitude) float64 96B 0.0 30.0 ... 300.0 330.0
Data variables:
    10u                      (forecast_reference_time, step, latitude, longitude) float64 3kB ...
    2t                       (forecast_reference_time, step, latitude, longitude) float64 3kB ...
    msl                      (forecast_reference_time, step, latitude, longitude) float64 3kB ...
    tcw                      (forecast_reference_time, step, latitude, longitude) float64 3kB ...
    tp                       (forecast_reference_time, step, latitude, longitude) float64 3kB ...
[7]:
{v: ds2[v].attrs for v in ds2}
[7]:
{'10u': {'height_above_ground_level': 10},
 '2t': {'height_above_ground_level': 2},
 'msl': {'mean_sea': 0},
 'tcw': {'entire_atmosphere': 0},
 'tp': {'surface': 0}}

A more elaborate example

Similarly, we can deal with a dataset, in which some of variables are single-level, and others are multi-level (however, with consistent vertical coordinates).

[8]:
ds_fl2 = ekd.from_source("sample", "aifs-pl_sfc.grib").to_fieldlist()

[9]:
ds_fl2.sel({"time.forecast_reference_time": "2025-12-12T00", "time.step": 0}).ls()
[9]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 z 2025-12-12 2025-12-12 0 days 500 pressure 0 regular_ll
1 z 2025-12-12 2025-12-12 0 days 850 pressure 0 regular_ll
2 z 2025-12-12 2025-12-12 0 days 1000 pressure 0 regular_ll
3 tcw 2025-12-12 2025-12-12 0 days 0 entire_atmosphere 0 regular_ll
4 msl 2025-12-12 2025-12-12 0 days 0 mean_sea 0 regular_ll
5 10u 2025-12-12 2025-12-12 0 days 10 height_above_ground_level 0 regular_ll
6 2t 2025-12-12 2025-12-12 0 days 2 height_above_ground_level 0 regular_ll
7 tp 2025-12-12 2025-12-12 0 days 0 surface 0 regular_ll
[10]:
ds3 = ds_fl2.to_xarray(
    profile=None,
    level_dim_mode="level_per_type",
    dims_as_attrs=["<level_per_type>"],
    add_earthkit_attrs=False,
)
ds3
[10]:
<xarray.Dataset> Size: 22kB
Dimensions:                  (forecast_reference_time: 2, step: 2, latitude: 7,
                              longitude: 12, pressure: 3)
Coordinates:
  * forecast_reference_time  (forecast_reference_time) datetime64[ns] 16B 202...
  * step                     (step) timedelta64[ns] 16B 00:00:00 06:00:00
  * latitude                 (latitude) float64 56B 90.0 60.0 ... -60.0 -90.0
  * longitude                (longitude) float64 96B 0.0 30.0 ... 300.0 330.0
  * pressure                 (pressure) int64 24B 500 850 1000
Data variables:
    10u                      (forecast_reference_time, step, latitude, longitude) float64 3kB ...
    2t                       (forecast_reference_time, step, latitude, longitude) float64 3kB ...
    msl                      (forecast_reference_time, step, latitude, longitude) float64 3kB ...
    tcw                      (forecast_reference_time, step, latitude, longitude) float64 3kB ...
    tp                       (forecast_reference_time, step, latitude, longitude) float64 3kB ...
    z                        (forecast_reference_time, step, pressure, latitude, longitude) float64 8kB ...

In this example the all the level-like dimensions were transformed into variables’ attributes except for the dimension "pressure" of the variable "z", since it is of size 3.

[11]:
{v: ds3[v].attrs for v in ds3}
[11]:
{'10u': {'height_above_ground_level': 10},
 '2t': {'height_above_ground_level': 2},
 'msl': {'mean_sea': 0},
 'tcw': {'entire_atmosphere': 0},
 'tp': {'surface': 0},
 'z': {}}

It works similarly for the default level_dim_mode="level":

[12]:
ds4 = ds_fl2.to_xarray(
    profile=None,
    dims_as_attrs=["level", "level_type"],
    add_earthkit_attrs=False,
)
ds4
[12]:
<xarray.Dataset> Size: 22kB
Dimensions:                  (forecast_reference_time: 2, step: 2, latitude: 7,
                              longitude: 12, level: 3)
Coordinates:
  * forecast_reference_time  (forecast_reference_time) datetime64[ns] 16B 202...
  * step                     (step) timedelta64[ns] 16B 00:00:00 06:00:00
  * latitude                 (latitude) float64 56B 90.0 60.0 ... -60.0 -90.0
  * longitude                (longitude) float64 96B 0.0 30.0 ... 300.0 330.0
  * level                    (level) int64 24B 500 850 1000
Data variables:
    10u                      (forecast_reference_time, step, latitude, longitude) float64 3kB ...
    2t                       (forecast_reference_time, step, latitude, longitude) float64 3kB ...
    msl                      (forecast_reference_time, step, latitude, longitude) float64 3kB ...
    tcw                      (forecast_reference_time, step, latitude, longitude) float64 3kB ...
    tp                       (forecast_reference_time, step, latitude, longitude) float64 3kB ...
    z                        (forecast_reference_time, step, level, latitude, longitude) float64 8kB ...
[13]:
{v: ds4[v].attrs for v in ds4}
[13]:
{'10u': {'level': 10, 'level_type': 'height_above_ground_level'},
 '2t': {'level': 2, 'level_type': 'height_above_ground_level'},
 'msl': {'level': 0, 'level_type': 'mean_sea'},
 'tcw': {'level': 0, 'level_type': 'entire_atmosphere'},
 'tp': {'level': 0, 'level_type': 'surface'},
 'z': {'level_type': 'pressure'}}
[ ]: