Xarray engine: overview¶
Earthkit-data comes with its own Xarray engine called “earthkit” to convert fieldlists to Xarray.
To start with, we get the example data we will use in this notebook and read it into a fieldlist.
[1]:
import earthkit.data as ekd
ds_fl = ekd.from_source("sample", "pl.grib").to_fieldlist()
Creating Xarray¶
To convert a fieldlist to Xarray we need to use to_xarray().
[2]:
ds = ds_fl.to_xarray()
ds
[2]:
<xarray.Dataset> Size: 176kB
Dimensions: (forecast_reference_time: 4, step: 2, level: 2,
latitude: 19, longitude: 36)
Coordinates:
* forecast_reference_time (forecast_reference_time) datetime64[ns] 32B 202...
* step (step) timedelta64[ns] 16B 00:00:00 06:00:00
* level (level) int64 16B 500 700
* latitude (latitude) float64 152B 90.0 80.0 ... -80.0 -90.0
* longitude (longitude) float64 288B 0.0 10.0 ... 340.0 350.0
Data variables:
r (forecast_reference_time, step, level, latitude, longitude) float64 88kB ...
t (forecast_reference_time, step, level, latitude, longitude) float64 88kB ...
Attributes:
Conventions: CF-1.8
institution: ECMWFto_xarray() has a large number of keyword arguments to control how the Xarray dataset is generated. To simplify the usage we can define profiles providing custom defaults for most of the keyword arguments. At the moment, there are 3 pre-defined profiles available: "earthkit" (the default) and 2 legacy profiles: "mars" and "grib". We can pass them to_xarray() via the profile kwarg.
Writing back to GRIB¶
This is an experimental feature!
In order to write back the Xarray into a GRIB it has to keep the original variable attributes that the eartkit engine generated. By default, variable attributes are not kept in Xarray computations so we need to set the global Xarray keep_attrs option to enable it as shown in the following cell:
[3]:
import xarray as xr
xr.set_options(keep_attrs=True)
ds = ds_fl.to_xarray()
ds += 1
Generating a fieldlist¶
To create GRIB fieldlist we need to call to_fieldlist() on the earthkit accessor. The result is an array fieldlist holding all the data in memory.
[4]:
ds_fl1 = ds.earthkit.to_fieldlist()
ds_fl1.head()
[4]:
| parameter.variable | time.valid_datetime | time.base_datetime | time.step | vertical.level | vertical.level_type | ensemble.member | geography.grid_type | |
|---|---|---|---|---|---|---|---|---|
| 0 | r | 2024-06-03 00:00:00 | 2024-06-03 00:00:00 | 0 days 00:00:00 | 500 | pressure | 0 | regular_ll |
| 1 | r | 2024-06-03 00:00:00 | 2024-06-03 00:00:00 | 0 days 00:00:00 | 700 | pressure | 0 | regular_ll |
| 2 | r | 2024-06-03 06:00:00 | 2024-06-03 00:00:00 | 0 days 06:00:00 | 500 | pressure | 0 | regular_ll |
| 3 | r | 2024-06-03 06:00:00 | 2024-06-03 00:00:00 | 0 days 06:00:00 | 700 | pressure | 0 | regular_ll |
| 4 | r | 2024-06-03 12:00:00 | 2024-06-03 12:00:00 | 0 days 00:00:00 | 500 | pressure | 0 | regular_ll |
We can see that the GRIB field values changed as expected if we compare the original and resulting fieldlists.
[5]:
m_0 = ds_fl.sel({"parameter.variable": "t", "time.step": 6, "vertical.level": 500})[0].values.mean()
m_1 = ds_fl1.sel({"parameter.variable": "t", "time.step": 6, "vertical.level": 500})[0].values.mean()
m_0, m_1
[5]:
(np.float64(254.25649845948692), np.float64(255.25649845948692))
Generating a GRIB file¶
Once we have the GRIB fieldlist it can be saved to disk using to_target().
[6]:
ds_fl1.to_target("file", "_from_xr_1.grib")
ekd.from_source("file", "_from_xr_1.grib").to_fieldlist().head()
[6]:
| parameter.variable | time.valid_datetime | time.base_datetime | time.step | vertical.level | vertical.level_type | ensemble.member | geography.grid_type | |
|---|---|---|---|---|---|---|---|---|
| 0 | r | 2024-06-03 00:00:00 | 2024-06-03 00:00:00 | 0 days 00:00:00 | 500 | pressure | 0 | regular_ll |
| 1 | r | 2024-06-03 00:00:00 | 2024-06-03 00:00:00 | 0 days 00:00:00 | 700 | pressure | 0 | regular_ll |
| 2 | r | 2024-06-03 06:00:00 | 2024-06-03 00:00:00 | 0 days 06:00:00 | 500 | pressure | 0 | regular_ll |
| 3 | r | 2024-06-03 06:00:00 | 2024-06-03 00:00:00 | 0 days 06:00:00 | 700 | pressure | 0 | regular_ll |
| 4 | r | 2024-06-03 12:00:00 | 2024-06-03 12:00:00 | 0 days 00:00:00 | 500 | pressure | 0 | regular_ll |
It is also possible to directly write the Xarray into a GRIB file when calling to_target() on the earthkit accessor. This will be a more memory efficient way to write the data to disk than generating a fieldlist first.
[7]:
ds.earthkit.to_target("file", "_from_xr_2.grib")
ekd.from_source("file", "_from_xr_2.grib").to_fieldlist().head()
[7]:
| parameter.variable | time.valid_datetime | time.base_datetime | time.step | vertical.level | vertical.level_type | ensemble.member | geography.grid_type | |
|---|---|---|---|---|---|---|---|---|
| 0 | r | 2024-06-03 00:00:00 | 2024-06-03 00:00:00 | 0 days 00:00:00 | 500 | pressure | 0 | regular_ll |
| 1 | r | 2024-06-03 00:00:00 | 2024-06-03 00:00:00 | 0 days 00:00:00 | 700 | pressure | 0 | regular_ll |
| 2 | r | 2024-06-03 06:00:00 | 2024-06-03 00:00:00 | 0 days 06:00:00 | 500 | pressure | 0 | regular_ll |
| 3 | r | 2024-06-03 06:00:00 | 2024-06-03 00:00:00 | 0 days 06:00:00 | 700 | pressure | 0 | regular_ll |
| 4 | r | 2024-06-03 12:00:00 | 2024-06-03 12:00:00 | 0 days 00:00:00 | 500 | pressure | 0 | regular_ll |
[ ]: