Xarray engine: size-1 dimension as a variable attribute¶
First, we get some GRIB forecast data on single levels and read it into a GRIB fieldlist.
[1]:
import earthkit.data as ekd
[2]:
ds_fl = ekd.from_source("sample", "aifs-sfc.grib").to_fieldlist()
Examine the metadata content of the field list (for readibility, we select a single forecast start time and step).
[3]:
ds_fl.sel({"time.forecast_reference_time": "2025-12-12T00", "time.step": 0}).ls()
[3]:
| parameter.variable | time.valid_datetime | time.base_datetime | time.step | vertical.level | vertical.level_type | ensemble.member | geography.grid_type | |
|---|---|---|---|---|---|---|---|---|
| 0 | tcw | 2025-12-12 | 2025-12-12 | 0 days | 0 | entire_atmosphere | 0 | regular_ll |
| 1 | msl | 2025-12-12 | 2025-12-12 | 0 days | 0 | mean_sea | 0 | regular_ll |
| 2 | 10u | 2025-12-12 | 2025-12-12 | 0 days | 10 | height_above_ground_level | 0 | regular_ll |
| 3 | 2t | 2025-12-12 | 2025-12-12 | 0 days | 2 | height_above_ground_level | 0 | regular_ll |
| 4 | tp | 2025-12-12 | 2025-12-12 | 0 days | 0 | surface | 0 | regular_ll |
We see that each variable has a different level and/or type of level. This would be an obstacle in forming an xarray Dataset object with an explicit level dimension (in any of the form provided by level_dim_mode). However, we can use an option dims_as_attrs to turn size-1 level dimensions into variables’ attributes, instead of just squeezing them.
Below, we use the default level_dim_mode="level" which builds "level" and "level_type" dimensions, and therefore these two dimensions are turned into attributes.
[4]:
ds = ds_fl.to_xarray(
profile=None,
dims_as_attrs=["level", "level_type"],
add_earthkit_attrs=False,
# The last option disables adding a spectial "_earthkit" attribure.
# It is used just to improve attribute readability which we inspect next.
)
ds
[4]:
<xarray.Dataset> Size: 14kB
Dimensions: (forecast_reference_time: 2, step: 2, latitude: 7,
longitude: 12)
Coordinates:
* forecast_reference_time (forecast_reference_time) datetime64[ns] 16B 202...
* step (step) timedelta64[ns] 16B 00:00:00 06:00:00
* latitude (latitude) float64 56B 90.0 60.0 ... -60.0 -90.0
* longitude (longitude) float64 96B 0.0 30.0 ... 300.0 330.0
Data variables:
10u (forecast_reference_time, step, latitude, longitude) float64 3kB ...
2t (forecast_reference_time, step, latitude, longitude) float64 3kB ...
msl (forecast_reference_time, step, latitude, longitude) float64 3kB ...
tcw (forecast_reference_time, step, latitude, longitude) float64 3kB ...
tp (forecast_reference_time, step, latitude, longitude) float64 3kB ...Inspection of the variables’ attributes give this:
[5]:
{v: ds[v].attrs for v in ds}
[5]:
{'10u': {'level': 10, 'level_type': 'height_above_ground_level'},
'2t': {'level': 2, 'level_type': 'height_above_ground_level'},
'msl': {'level': 0, 'level_type': 'mean_sea'},
'tcw': {'level': 0, 'level_type': 'entire_atmosphere'},
'tp': {'level': 0, 'level_type': 'surface'}}
Using level_dim_mode="level_per_type" we can declare the template dimension "<level_per_type>" to be turned into variables’ attributes. This transforms the actual dimensions like "surface", "height_above_ground_level", etc. to variables’ attributes, since all of them are of size 1.
[6]:
ds2 = ds_fl.to_xarray(
profile=None,
level_dim_mode="level_per_type",
dims_as_attrs="<level_per_type>",
add_earthkit_attrs=False,
)
ds2
[6]:
<xarray.Dataset> Size: 14kB
Dimensions: (forecast_reference_time: 2, step: 2, latitude: 7,
longitude: 12)
Coordinates:
* forecast_reference_time (forecast_reference_time) datetime64[ns] 16B 202...
* step (step) timedelta64[ns] 16B 00:00:00 06:00:00
* latitude (latitude) float64 56B 90.0 60.0 ... -60.0 -90.0
* longitude (longitude) float64 96B 0.0 30.0 ... 300.0 330.0
Data variables:
10u (forecast_reference_time, step, latitude, longitude) float64 3kB ...
2t (forecast_reference_time, step, latitude, longitude) float64 3kB ...
msl (forecast_reference_time, step, latitude, longitude) float64 3kB ...
tcw (forecast_reference_time, step, latitude, longitude) float64 3kB ...
tp (forecast_reference_time, step, latitude, longitude) float64 3kB ...[7]:
{v: ds2[v].attrs for v in ds2}
[7]:
{'10u': {'height_above_ground_level': 10},
'2t': {'height_above_ground_level': 2},
'msl': {'mean_sea': 0},
'tcw': {'entire_atmosphere': 0},
'tp': {'surface': 0}}
A more elaborate example¶
Similarly, we can deal with a dataset, in which some of variables are single-level, and others are multi-level (however, with consistent vertical coordinates).
[8]:
ds_fl2 = ekd.from_source("sample", "aifs-pl_sfc.grib").to_fieldlist()
[9]:
ds_fl2.sel({"time.forecast_reference_time": "2025-12-12T00", "time.step": 0}).ls()
[9]:
| parameter.variable | time.valid_datetime | time.base_datetime | time.step | vertical.level | vertical.level_type | ensemble.member | geography.grid_type | |
|---|---|---|---|---|---|---|---|---|
| 0 | z | 2025-12-12 | 2025-12-12 | 0 days | 500 | pressure | 0 | regular_ll |
| 1 | z | 2025-12-12 | 2025-12-12 | 0 days | 850 | pressure | 0 | regular_ll |
| 2 | z | 2025-12-12 | 2025-12-12 | 0 days | 1000 | pressure | 0 | regular_ll |
| 3 | tcw | 2025-12-12 | 2025-12-12 | 0 days | 0 | entire_atmosphere | 0 | regular_ll |
| 4 | msl | 2025-12-12 | 2025-12-12 | 0 days | 0 | mean_sea | 0 | regular_ll |
| 5 | 10u | 2025-12-12 | 2025-12-12 | 0 days | 10 | height_above_ground_level | 0 | regular_ll |
| 6 | 2t | 2025-12-12 | 2025-12-12 | 0 days | 2 | height_above_ground_level | 0 | regular_ll |
| 7 | tp | 2025-12-12 | 2025-12-12 | 0 days | 0 | surface | 0 | regular_ll |
[10]:
ds3 = ds_fl2.to_xarray(
profile=None,
level_dim_mode="level_per_type",
dims_as_attrs=["<level_per_type>"],
add_earthkit_attrs=False,
)
ds3
[10]:
<xarray.Dataset> Size: 22kB
Dimensions: (forecast_reference_time: 2, step: 2, latitude: 7,
longitude: 12, pressure: 3)
Coordinates:
* forecast_reference_time (forecast_reference_time) datetime64[ns] 16B 202...
* step (step) timedelta64[ns] 16B 00:00:00 06:00:00
* latitude (latitude) float64 56B 90.0 60.0 ... -60.0 -90.0
* longitude (longitude) float64 96B 0.0 30.0 ... 300.0 330.0
* pressure (pressure) int64 24B 500 850 1000
Data variables:
10u (forecast_reference_time, step, latitude, longitude) float64 3kB ...
2t (forecast_reference_time, step, latitude, longitude) float64 3kB ...
msl (forecast_reference_time, step, latitude, longitude) float64 3kB ...
tcw (forecast_reference_time, step, latitude, longitude) float64 3kB ...
tp (forecast_reference_time, step, latitude, longitude) float64 3kB ...
z (forecast_reference_time, step, pressure, latitude, longitude) float64 8kB ...In this example the all the level-like dimensions were transformed into variables’ attributes except for the dimension "pressure" of the variable "z", since it is of size 3.
[11]:
{v: ds3[v].attrs for v in ds3}
[11]:
{'10u': {'height_above_ground_level': 10},
'2t': {'height_above_ground_level': 2},
'msl': {'mean_sea': 0},
'tcw': {'entire_atmosphere': 0},
'tp': {'surface': 0},
'z': {}}
It works similarly for the default level_dim_mode="level":
[12]:
ds4 = ds_fl2.to_xarray(
profile=None,
dims_as_attrs=["level", "level_type"],
add_earthkit_attrs=False,
)
ds4
[12]:
<xarray.Dataset> Size: 22kB
Dimensions: (forecast_reference_time: 2, step: 2, latitude: 7,
longitude: 12, level: 3)
Coordinates:
* forecast_reference_time (forecast_reference_time) datetime64[ns] 16B 202...
* step (step) timedelta64[ns] 16B 00:00:00 06:00:00
* latitude (latitude) float64 56B 90.0 60.0 ... -60.0 -90.0
* longitude (longitude) float64 96B 0.0 30.0 ... 300.0 330.0
* level (level) int64 24B 500 850 1000
Data variables:
10u (forecast_reference_time, step, latitude, longitude) float64 3kB ...
2t (forecast_reference_time, step, latitude, longitude) float64 3kB ...
msl (forecast_reference_time, step, latitude, longitude) float64 3kB ...
tcw (forecast_reference_time, step, latitude, longitude) float64 3kB ...
tp (forecast_reference_time, step, latitude, longitude) float64 3kB ...
z (forecast_reference_time, step, level, latitude, longitude) float64 8kB ...[13]:
{v: ds4[v].attrs for v in ds4}
[13]:
{'10u': {'level': 10, 'level_type': 'height_above_ground_level'},
'2t': {'level': 2, 'level_type': 'height_above_ground_level'},
'msl': {'level': 0, 'level_type': 'mean_sea'},
'tcw': {'level': 0, 'level_type': 'entire_atmosphere'},
'tp': {'level': 0, 'level_type': 'surface'},
'z': {'level_type': 'pressure'}}
[ ]: