Xarray engine: overview

Earthkit-data comes with its own Xarray engine called “earthkit” to perform conversions between GRIB and Xarray data.

To start with, we get the example data will use in this notebook and read it into a GRIB fieldlist.

[1]:
import earthkit.data as ekd
ds_fl = ekd.from_source("sample", "pl.grib")

Creating Xarray

To convert a GRIB fieldlist to Xarray we need to use to_xarray().

[2]:
ds = ds_fl.to_xarray()
ds
[2]:
<xarray.Dataset> Size: 176kB
Dimensions:                  (forecast_reference_time: 4, step: 2, level: 2,
                              latitude: 19, longitude: 36)
Coordinates:
  * forecast_reference_time  (forecast_reference_time) datetime64[ns] 32B 202...
  * step                     (step) timedelta64[ns] 16B 00:00:00 06:00:00
  * level                    (level) int64 16B 500 700
  * latitude                 (latitude) float64 152B 90.0 80.0 ... -80.0 -90.0
  * longitude                (longitude) float64 288B 0.0 10.0 ... 340.0 350.0
Data variables:
    r                        (forecast_reference_time, step, level, latitude, longitude) float64 88kB ...
    t                        (forecast_reference_time, step, level, latitude, longitude) float64 88kB ...
Attributes:
    class:        od
    stream:       oper
    levtype:      pl
    type:         fc
    expver:       0001
    date:         20240603
    time:         0
    domain:       g
    number:       0
    Conventions:  CF-1.8
    institution:  ECMWF

to_xarray() has a large number of keyword arguments to control how the Xarray dataset is generated. To simplify the usage we can define profiles providing custom defaults for most of the keyword arguments. At the moment, there are 2 pre-defined profiles available: “mars” (the default) and “grib”. We can pass them to_xarray() via the profile kwarg.

Writing back to GRIB

This is an experimental feature!

In order to write back the Xarray into a GRIB it has to keep the original variable attributes that the eartkit engine generated. By default, variable attributes are not kept in Xarray computations so we need to set the global Xarray keep_attrs option to enable it as shown in the following cell:

[3]:
import xarray as xr
xr.set_options(keep_attrs=True)

ds = ds_fl.to_xarray()
ds += 1

Generating a fieldlist

To create GRIB fieldlist we need to call to_fieldlist() on the “earthkit” accessor. The result is an array fieldlist holding all the data in memory.

[4]:
ds_fl1 = ds.earthkit.to_fieldlist()
ds_fl1.head()
[4]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType
0 ecmf r isobaricInhPa 500 20240603 0 0 fc 0 regular_ll
1 ecmf r isobaricInhPa 700 20240603 0 0 fc 0 regular_ll
2 ecmf r isobaricInhPa 500 20240603 0 6 fc 0 regular_ll
3 ecmf r isobaricInhPa 700 20240603 0 6 fc 0 regular_ll
4 ecmf r isobaricInhPa 500 20240603 1200 0 fc 0 regular_ll

We can see that the GRIB field values changed as expected if we compare the original and resulting fieldlists.

[8]:
m_0 = ds_fl.sel(param="t", step=6, level=500)[0].values.mean()
m_1 = ds_fl1.sel(param="t", step=6, level=500)[0].values.mean()
m_0, m_1
[8]:
(254.25649845948692, 255.25649845948692)

Generating a GRIB file

Once we have the GRIB fieldlist it can be saved to disk using save() method.

[6]:
ds_fl1.to_target("file", "_from_xr_1.grib")
ekd.from_source("file", "_from_xr_1.grib").head()
[6]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType
0 ecmf r isobaricInhPa 500 20240603 0 0 fc 0 regular_ll
1 ecmf r isobaricInhPa 700 20240603 0 0 fc 0 regular_ll
2 ecmf r isobaricInhPa 500 20240603 0 6 fc 0 regular_ll
3 ecmf r isobaricInhPa 700 20240603 0 6 fc 0 regular_ll
4 ecmf r isobaricInhPa 500 20240603 1200 0 fc 0 regular_ll

It is also possible to directly write the Xarray into a GRIB file when calling to_grib() on the earthkit accessor. This will be a more memory efficient way to write the data to disk than generating a fieldlist first.

[7]:
ds.earthkit.to_grib("_from_xr_2.grib")
ekd.from_source("file", "_from_xr_2.grib").head()
[7]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType
0 ecmf r isobaricInhPa 500 20240603 0 0 fc 0 regular_ll
1 ecmf r isobaricInhPa 700 20240603 0 0 fc 0 regular_ll
2 ecmf r isobaricInhPa 500 20240603 0 6 fc 0 regular_ll
3 ecmf r isobaricInhPa 700 20240603 0 6 fc 0 regular_ll
4 ecmf r isobaricInhPa 500 20240603 1200 0 fc 0 regular_ll
[ ]: