Xarray engine: overview

Earthkit-data comes with its own Xarray engine called “earthkit” to convert fieldlists to Xarray.

To start with, we get the example data we will use in this notebook and read it into a fieldlist.

[1]:
import earthkit.data as ekd

ds_fl = ekd.from_source("sample", "pl.grib").to_fieldlist()

Creating Xarray

To convert a fieldlist to Xarray we need to use to_xarray().

[2]:
ds = ds_fl.to_xarray()
ds
[2]:
<xarray.Dataset> Size: 176kB
Dimensions:                  (forecast_reference_time: 4, step: 2, level: 2,
                              latitude: 19, longitude: 36)
Coordinates:
  * forecast_reference_time  (forecast_reference_time) datetime64[ns] 32B 202...
  * step                     (step) timedelta64[ns] 16B 00:00:00 06:00:00
  * level                    (level) int64 16B 500 700
  * latitude                 (latitude) float64 152B 90.0 80.0 ... -80.0 -90.0
  * longitude                (longitude) float64 288B 0.0 10.0 ... 340.0 350.0
Data variables:
    r                        (forecast_reference_time, step, level, latitude, longitude) float64 88kB ...
    t                        (forecast_reference_time, step, level, latitude, longitude) float64 88kB ...
Attributes:
    Conventions:  CF-1.8
    institution:  ECMWF

to_xarray() has a large number of keyword arguments to control how the Xarray dataset is generated. To simplify the usage we can define profiles providing custom defaults for most of the keyword arguments. At the moment, there are 3 pre-defined profiles available: "earthkit" (the default) and 2 legacy profiles: "mars" and "grib". We can pass them to_xarray() via the profile kwarg.

Writing back to GRIB

This is an experimental feature!

In order to write back the Xarray into a GRIB it has to keep the original variable attributes that the eartkit engine generated. By default, variable attributes are not kept in Xarray computations so we need to set the global Xarray keep_attrs option to enable it as shown in the following cell:

[3]:
import xarray as xr

xr.set_options(keep_attrs=True)

ds = ds_fl.to_xarray()
ds += 1

Generating a fieldlist

To create GRIB fieldlist we need to call to_fieldlist() on the earthkit accessor. The result is an array fieldlist holding all the data in memory.

[4]:
ds_fl1 = ds.earthkit.to_fieldlist()
ds_fl1.head()
[4]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 r 2024-06-03 00:00:00 2024-06-03 00:00:00 0 days 00:00:00 500 pressure 0 regular_ll
1 r 2024-06-03 00:00:00 2024-06-03 00:00:00 0 days 00:00:00 700 pressure 0 regular_ll
2 r 2024-06-03 06:00:00 2024-06-03 00:00:00 0 days 06:00:00 500 pressure 0 regular_ll
3 r 2024-06-03 06:00:00 2024-06-03 00:00:00 0 days 06:00:00 700 pressure 0 regular_ll
4 r 2024-06-03 12:00:00 2024-06-03 12:00:00 0 days 00:00:00 500 pressure 0 regular_ll

We can see that the GRIB field values changed as expected if we compare the original and resulting fieldlists.

[5]:
m_0 = ds_fl.sel({"parameter.variable": "t", "time.step": 6, "vertical.level": 500})[0].values.mean()
m_1 = ds_fl1.sel({"parameter.variable": "t", "time.step": 6, "vertical.level": 500})[0].values.mean()
m_0, m_1
[5]:
(np.float64(254.25649845948692), np.float64(255.25649845948692))

Generating a GRIB file

Once we have the GRIB fieldlist it can be saved to disk using to_target().

[6]:
ds_fl1.to_target("file", "_from_xr_1.grib")
ekd.from_source("file", "_from_xr_1.grib").to_fieldlist().head()
[6]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 r 2024-06-03 00:00:00 2024-06-03 00:00:00 0 days 00:00:00 500 pressure 0 regular_ll
1 r 2024-06-03 00:00:00 2024-06-03 00:00:00 0 days 00:00:00 700 pressure 0 regular_ll
2 r 2024-06-03 06:00:00 2024-06-03 00:00:00 0 days 06:00:00 500 pressure 0 regular_ll
3 r 2024-06-03 06:00:00 2024-06-03 00:00:00 0 days 06:00:00 700 pressure 0 regular_ll
4 r 2024-06-03 12:00:00 2024-06-03 12:00:00 0 days 00:00:00 500 pressure 0 regular_ll

It is also possible to directly write the Xarray into a GRIB file when calling to_target() on the earthkit accessor. This will be a more memory efficient way to write the data to disk than generating a fieldlist first.

[7]:
ds.earthkit.to_target("file", "_from_xr_2.grib")
ekd.from_source("file", "_from_xr_2.grib").to_fieldlist().head()
[7]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 r 2024-06-03 00:00:00 2024-06-03 00:00:00 0 days 00:00:00 500 pressure 0 regular_ll
1 r 2024-06-03 00:00:00 2024-06-03 00:00:00 0 days 00:00:00 700 pressure 0 regular_ll
2 r 2024-06-03 06:00:00 2024-06-03 00:00:00 0 days 06:00:00 500 pressure 0 regular_ll
3 r 2024-06-03 06:00:00 2024-06-03 00:00:00 0 days 06:00:00 700 pressure 0 regular_ll
4 r 2024-06-03 12:00:00 2024-06-03 12:00:00 0 days 00:00:00 500 pressure 0 regular_ll
[ ]: