Xarray engine: writing back to GRIB

Warning: converting back Xarray to GRIB is an experimental feature and is not yet fully supported.

First, we get some example GRIB data and convert it into Xarray.

[1]:
import earthkit.data as ekd

ds_fl = ekd.from_source("sample", "pl.grib").to_fieldlist()
ds_xr = ds_fl.to_xarray()
ds_xr
[1]:
<xarray.Dataset> Size: 176kB
Dimensions:                  (forecast_reference_time: 4, step: 2, level: 2,
                              latitude: 19, longitude: 36)
Coordinates:
  * forecast_reference_time  (forecast_reference_time) datetime64[ns] 32B 202...
  * step                     (step) timedelta64[ns] 16B 00:00:00 06:00:00
  * level                    (level) int64 16B 500 700
  * latitude                 (latitude) float64 152B 90.0 80.0 ... -80.0 -90.0
  * longitude                (longitude) float64 288B 0.0 10.0 ... 340.0 350.0
Data variables:
    r                        (forecast_reference_time, step, level, latitude, longitude) float64 88kB ...
    t                        (forecast_reference_time, step, level, latitude, longitude) float64 88kB ...
Attributes:
    Conventions:  CF-1.8
    institution:  ECMWF

By default, add_earthkit_attrs=True in to_xarray() and some special earthkit attributes are added to the dataset. This is needed for the Xarray to GRIB conversion. For this reason, if the Xarray is modified we must ensure the variable attributes are copied to the new Xarray dataset. By default, variable attributes are not kept in Xarray computations so we need to set the global Xarray keep_attrs option to enable it.

[2]:
# ensure earthkit attributes are set
import xarray as xr

xr.set_options(keep_attrs=True)

# modify values
ds_xr += 1

Using to_target()

It is possible to directly write the Xarray dataset created with the earthkit engine into a GRIB file with to_target(). This is a memory efficient way to write GRIB to disk since only one field is loaded into memory at a time. We can call to_target() either on the earthkit accessor or as a top level function.

First, we write a datarray into a GRIB file.

[3]:
# option1: writing to GRIB file using the accessor
ds_xr["t"].earthkit.to_target("file", "_from_xr_1.grib")

# option2: writing to GRIB file using the top level function
ekd.to_target("file", "_from_xr_1a.grib", data=ds_xr["t"])

# check the results
ds_tmp1 = ekd.from_source("file", "_from_xr_1.grib").to_fieldlist()
ds_tmp1[0]
[3]:
Field
number_of_values684
array_typendarray
array_dtypefloat64
variablet
standard_nameair_temperature
long_nameTemperature
unitskelvin
chem_variableNone
valid_datetime2024-06-03 00:00:00
base_datetime2024-06-03 00:00:00
step0:00:00
level500
layerNone
level_typepressure
member0
grid_spec{'grid': [10, 10]}
grid_typeregular_ll
shape(19, 36)
area(90.0, 0.0, -90.0, 350.0)

Next, we write the whole dataset into a GRIB file.

[4]:
# option1: writing to GRIB file using the accessor
ds_xr.earthkit.to_target("file", "_from_xr_2.grib")

# option2: writing to GRIB file using the top level function
ekd.to_target("file", "_from_xr_2a.grib", data=ds_xr)

# check the results
ds_tmp2 = ekd.from_source("file", "_from_xr_2.grib").to_fieldlist()
ds_tmp2.head()
[4]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 r 2024-06-03 00:00:00 2024-06-03 00:00:00 0 days 00:00:00 500 pressure 0 regular_ll
1 r 2024-06-03 00:00:00 2024-06-03 00:00:00 0 days 00:00:00 700 pressure 0 regular_ll
2 r 2024-06-03 06:00:00 2024-06-03 00:00:00 0 days 06:00:00 500 pressure 0 regular_ll
3 r 2024-06-03 06:00:00 2024-06-03 00:00:00 0 days 06:00:00 700 pressure 0 regular_ll
4 r 2024-06-03 12:00:00 2024-06-03 12:00:00 0 days 00:00:00 500 pressure 0 regular_ll

We check if the computation results were correctly written to the generated GRIB data.

[5]:
# original GRIB data
print(ds_fl.sel({"parameter.variable": "t", "time.step": 0, "vertical.level": 500})[0].values[0])
# GRIB data converted from the modified xarray object
print(ds_tmp1.sel({"parameter.variable": "t", "time.step": 0, "vertical.level": 500})[0].values[0])
250.22500610351562
251.22500610351562

Using to_fieldlist()

We can also convert the Xarray dataset into a GRIB fieldlist by using to_fieldlist() on the earthkit accessor of the Xarray object. Please note that this will generate a fieldlist entirely stored in memory.

First, we convert a dataarray to a GRIB fieldlist.

[6]:
ds_fl1 = ds_xr["t"].earthkit.to_fieldlist()
ds_fl1.head()
[6]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 t 2024-06-03 00:00:00 2024-06-03 00:00:00 0 days 00:00:00 500 pressure 0 regular_ll
1 t 2024-06-03 00:00:00 2024-06-03 00:00:00 0 days 00:00:00 700 pressure 0 regular_ll
2 t 2024-06-03 06:00:00 2024-06-03 00:00:00 0 days 06:00:00 500 pressure 0 regular_ll
3 t 2024-06-03 06:00:00 2024-06-03 00:00:00 0 days 06:00:00 700 pressure 0 regular_ll
4 t 2024-06-03 12:00:00 2024-06-03 12:00:00 0 days 00:00:00 500 pressure 0 regular_ll

Next, we convert back the whole dataset into a GRIB fieldlist.

[7]:
ds_fl2 = ds_xr.earthkit.to_fieldlist()
ds_fl2.head()
[7]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 r 2024-06-03 00:00:00 2024-06-03 00:00:00 0 days 00:00:00 500 pressure 0 regular_ll
1 r 2024-06-03 00:00:00 2024-06-03 00:00:00 0 days 00:00:00 700 pressure 0 regular_ll
2 r 2024-06-03 06:00:00 2024-06-03 00:00:00 0 days 06:00:00 500 pressure 0 regular_ll
3 r 2024-06-03 06:00:00 2024-06-03 00:00:00 0 days 06:00:00 700 pressure 0 regular_ll
4 r 2024-06-03 12:00:00 2024-06-03 12:00:00 0 days 00:00:00 500 pressure 0 regular_ll

The generated GRIB fieldlist can be saved to disk using the to_target() method.

[8]:
out_name = "_from_xr_3.grib"
ds_fl1.to_target("file", out_name)
# read back and check the saved GRIB
ds_tmp = ekd.from_source("file", out_name).to_fieldlist()
ds_tmp[0]
[8]:
Field
number_of_values684
array_typendarray
array_dtypefloat64
variablet
standard_nameair_temperature
long_nameTemperature
unitskelvin
chem_variableNone
valid_datetime2024-06-03 00:00:00
base_datetime2024-06-03 00:00:00
step0:00:00
level500
layerNone
level_typepressure
member0
grid_spec{'grid': [10, 10]}
grid_typeregular_ll
shape(19, 36)
area(90.0, 0.0, -90.0, 350.0)
[ ]: