Xarray engine: GRIB related workflow¶
This notebook demonstrates how to work with Xarray in a GRIB related workflow consisting of the following steps:
read input GRIB data
convert it to Xarray
perform computations in Xarray changing both the values and metadata
write the resulting Xarray into GRIB
Getting the data¶
First, we read the input GRIB data and convert it into Xarray.
[1]:
import earthkit.data as ekd
fl = ekd.from_source("sample", "pl.grib").to_fieldlist()
ds = fl.to_xarray()
ds
[1]:
<xarray.Dataset> Size: 176kB
Dimensions: (forecast_reference_time: 4, step: 2, level: 2,
latitude: 19, longitude: 36)
Coordinates:
* forecast_reference_time (forecast_reference_time) datetime64[ns] 32B 202...
* step (step) timedelta64[ns] 16B 00:00:00 06:00:00
* level (level) int64 16B 500 700
* latitude (latitude) float64 152B 90.0 80.0 ... -80.0 -90.0
* longitude (longitude) float64 288B 0.0 10.0 ... 340.0 350.0
Data variables:
r (forecast_reference_time, step, level, latitude, longitude) float64 88kB ...
t (forecast_reference_time, step, level, latitude, longitude) float64 88kB ...
Attributes:
Conventions: CF-1.8
institution: ECMWF- forecast_reference_time: 4
- step: 2
- level: 2
- latitude: 19
- longitude: 36
- forecast_reference_time(forecast_reference_time)datetime64[ns]2024-06-03 ... 2024-06-04T12:00:00
array(['2024-06-03T00:00:00.000000000', '2024-06-03T12:00:00.000000000', '2024-06-04T00:00:00.000000000', '2024-06-04T12:00:00.000000000'], dtype='datetime64[ns]') - step(step)timedelta64[ns]00:00:00 06:00:00
- standard_name :
- forecast_period
- long_name :
- time since forecast_reference_time
array([ 0, 21600000000000], dtype='timedelta64[ns]')
- level(level)int64500 700
- standard_name :
- air_pressure
- long_name :
- pressure
- units :
- hectopascal
- positive :
- down
array([500, 700])
- latitude(latitude)float6490.0 80.0 70.0 ... -80.0 -90.0
- units :
- degrees_north
- standard_name :
- latitude
- long_name :
- latitude
array([ 90., 80., 70., 60., 50., 40., 30., 20., 10., 0., -10., -20., -30., -40., -50., -60., -70., -80., -90.]) - longitude(longitude)float640.0 10.0 20.0 ... 330.0 340.0 350.0
- units :
- degrees_east
- standard_name :
- longitude
- long_name :
- longitude
array([ 0., 10., 20., 30., 40., 50., 60., 70., 80., 90., 100., 110., 120., 130., 140., 150., 160., 170., 180., 190., 200., 210., 220., 230., 240., 250., 260., 270., 280., 290., 300., 310., 320., 330., 340., 350.])
- r(forecast_reference_time, step, level, latitude, longitude)float64...
- standard_name :
- relative_humidity
- long_name :
- Relative humidity
- units :
- percent
- level_type :
- pressure
- _earthkit :
- {'message': b"GRIB\x00\x00l\x01\x00\x004\x80b\x9a\xff\x80\x9dd\x01\xf4\x18\x06\x03\x00\x00\x01\x00\x00\x01\x00\x00\x00\x15\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x01\t\x04\x010001\x00\x00\x00\x00\x00 \x00\xff\x00\x00$\x00\x13\x01_\x90\x00\x00\x00\x80\x81_\x90\x05W0'\x10'\x10\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x80\x02D\xb9}n\x00\x007777", 'bitsPerValue': 16, 'grid_spec': '{"grid": [10, 10]}'}
[10944 values with dtype=float64]
- t(forecast_reference_time, step, level, latitude, longitude)float64...
- standard_name :
- air_temperature
- long_name :
- Temperature
- units :
- kelvin
- level_type :
- pressure
- _earthkit :
- {'message': b"GRIB\x00\x00l\x01\x00\x004\x80b\x9a\xff\x80\x82d\x01\xf4\x18\x06\x03\x00\x00\x01\x00\x00\x01\x00\x00\x00\x15\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x01\t\x04\x010001\x00\x00\x00\x00\x00 \x00\xff\x00\x00$\x00\x13\x01_\x90\x00\x00\x00\x80\x81_\x90\x05W0'\x10'\x10\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x80\x02D\xb9}n\x00\x007777", 'bitsPerValue': 16, 'grid_spec': '{"grid": [10, 10]}'}
[10944 values with dtype=float64]
- Conventions :
- CF-1.8
- institution :
- ECMWF
The reference field¶
By default, add_earthkit_attrs=True in to_xarray() and some special earthkit attributes are added to each datarray in the dataset. These are used for converting the Xarray to fieldlist or writing it into GRIB.
We can inspect this extra data via the earthkit accessor. The accessor’s reference_field property returns a field representing the input data the dataarray was built from. This provides all the original metadata that is otherwise not present in the datarray, but it does not actually have any valid data values. Please note that this field is not stored in the datarray but the earthkit attributes provide enough information to build it when needed.
For example, for the temperature variable the reference field looks like this:
[2]:
ds.t.earthkit.reference_field
[2]:
| number_of_values | 684 |
| array_type | ndarray |
| array_dtype | float64 |
| variable | t |
| standard_name | air_temperature |
| long_name | Temperature |
| units | kelvin |
| chem_variable | None |
| valid_datetime | 2024-06-03 00:00:00 |
| base_datetime | 2024-06-03 00:00:00 |
| step | 0:00:00 |
| level | 500 |
| layer | None |
| level_type | pressure |
| member | 0 |
| grid_spec | {'grid': [10, 10]} |
| grid_type | regular_ll |
| shape | (19, 36) |
| area | (90.0, 0.0, -90.0, 350.0) |
Modifying values and metadata¶
We want to carry on these earthkit specific attributes in the computations. By default, variable attributes are not kept in Xarray computations so we need to set the global Xarray keep_attrs option to enable it.
[3]:
import xarray as xr
xr.set_options(keep_attrs=True)
[3]:
<xarray.core.options.set_options at 0x10784c440>
With this we do a very simple computation with the temperature dataarray and also change the variable metadata.
[4]:
da = ds["t"]
# modify values to imitate computations
da = da + 10.0
# change metadata to potential temperature for the sake of this example
da.attrs["standard_name"] = "air_potential_temperature"
da.attrs["long_name"] = "Potential temperature"
# we intentionally set a variable name that is not GRIB related.
# In GRIB the shortName for potential temperature is "pt".
da = da.rename("theta")
[5]:
da
[5]:
<xarray.DataArray 'theta' (forecast_reference_time: 4, step: 2, level: 2,
latitude: 19, longitude: 36)> Size: 88kB
array([[[[[260.2250061 , 260.2250061 , 260.2250061 , ...,
260.2250061 , 260.2250061 , 260.2250061 ],
[257.90567017, 259.53652954, 260.77090454, ...,
253.89785767, 254.7484436 , 256.18789673],
[258.80117798, 261.88809204, 264.14297485, ...,
251.49746704, 252.6078186 , 255.38320923],
...,
[240.8031311 , 242.75527954, 244.98672485, ...,
243.67520142, 241.49746704, 240.21817017],
[240.48672485, 241.10488892, 241.92422485, ...,
240.8031311 , 240.40176392, 240.24258423],
[241.2250061 , 241.2250061 , 241.2250061 , ...,
241.2250061 , 241.2250061 , 241.2250061 ]],
[[275.46455383, 275.46455383, 275.46455383, ...,
275.46455383, 275.46455383, 275.46455383],
[272.76533508, 274.58760071, 275.90400696, ...,
267.53779602, 268.86103821, 270.72920227],
[273.94697571, 277.06513977, 279.22724915, ...,
265.10810852, 266.98310852, 270.27705383],
...
240.10939026, 240.76173401, 241.96095276],
[240.64649963, 239.96388245, 239.50978088, ...,
243.2490387 , 242.22364807, 241.40040588],
[240.47755432, 240.47755432, 240.47755432, ...,
240.47755432, 240.47755432, 240.47755432]],
[[274.55142212, 274.55142212, 274.55142212, ...,
274.55142212, 274.55142212, 274.55142212],
[273.12466431, 274.57095337, 275.74966431, ...,
270.73892212, 271.04458618, 271.84732056],
[272.19497681, 276.46939087, 280.02896118, ...,
270.50845337, 268.87857056, 269.24185181],
...,
[256.77310181, 256.09439087, 254.94595337, ...,
256.50650024, 256.39907837, 256.69985962],
[247.95864868, 246.73501587, 245.77310181, ...,
252.94595337, 251.03384399, 249.38833618],
[244.21548462, 244.21548462, 244.21548462, ...,
244.21548462, 244.21548462, 244.21548462]]]]],
shape=(4, 2, 2, 19, 36))
Coordinates:
* forecast_reference_time (forecast_reference_time) datetime64[ns] 32B 202...
* step (step) timedelta64[ns] 16B 00:00:00 06:00:00
* level (level) int64 16B 500 700
* latitude (latitude) float64 152B 90.0 80.0 ... -80.0 -90.0
* longitude (longitude) float64 288B 0.0 10.0 ... 340.0 350.0
Attributes:
standard_name: air_potential_temperature
long_name: Potential temperature
units: kelvin
level_type: pressure
_earthkit: {'message': b"GRIB\x00\x00l\x01\x00\x004\x80b\x9a\xff\x80...- forecast_reference_time: 4
- step: 2
- level: 2
- latitude: 19
- longitude: 36
- 260.2 260.2 260.2 260.2 260.2 260.2 ... 244.2 244.2 244.2 244.2 244.2
array([[[[[260.2250061 , 260.2250061 , 260.2250061 , ..., 260.2250061 , 260.2250061 , 260.2250061 ], [257.90567017, 259.53652954, 260.77090454, ..., 253.89785767, 254.7484436 , 256.18789673], [258.80117798, 261.88809204, 264.14297485, ..., 251.49746704, 252.6078186 , 255.38320923], ..., [240.8031311 , 242.75527954, 244.98672485, ..., 243.67520142, 241.49746704, 240.21817017], [240.48672485, 241.10488892, 241.92422485, ..., 240.8031311 , 240.40176392, 240.24258423], [241.2250061 , 241.2250061 , 241.2250061 , ..., 241.2250061 , 241.2250061 , 241.2250061 ]], [[275.46455383, 275.46455383, 275.46455383, ..., 275.46455383, 275.46455383, 275.46455383], [272.76533508, 274.58760071, 275.90400696, ..., 267.53779602, 268.86103821, 270.72920227], [273.94697571, 277.06513977, 279.22724915, ..., 265.10810852, 266.98310852, 270.27705383], ... 240.10939026, 240.76173401, 241.96095276], [240.64649963, 239.96388245, 239.50978088, ..., 243.2490387 , 242.22364807, 241.40040588], [240.47755432, 240.47755432, 240.47755432, ..., 240.47755432, 240.47755432, 240.47755432]], [[274.55142212, 274.55142212, 274.55142212, ..., 274.55142212, 274.55142212, 274.55142212], [273.12466431, 274.57095337, 275.74966431, ..., 270.73892212, 271.04458618, 271.84732056], [272.19497681, 276.46939087, 280.02896118, ..., 270.50845337, 268.87857056, 269.24185181], ..., [256.77310181, 256.09439087, 254.94595337, ..., 256.50650024, 256.39907837, 256.69985962], [247.95864868, 246.73501587, 245.77310181, ..., 252.94595337, 251.03384399, 249.38833618], [244.21548462, 244.21548462, 244.21548462, ..., 244.21548462, 244.21548462, 244.21548462]]]]], shape=(4, 2, 2, 19, 36)) - forecast_reference_time(forecast_reference_time)datetime64[ns]2024-06-03 ... 2024-06-04T12:00:00
array(['2024-06-03T00:00:00.000000000', '2024-06-03T12:00:00.000000000', '2024-06-04T00:00:00.000000000', '2024-06-04T12:00:00.000000000'], dtype='datetime64[ns]') - step(step)timedelta64[ns]00:00:00 06:00:00
- standard_name :
- forecast_period
- long_name :
- time since forecast_reference_time
array([ 0, 21600000000000], dtype='timedelta64[ns]')
- level(level)int64500 700
- standard_name :
- air_pressure
- long_name :
- pressure
- units :
- hectopascal
- positive :
- down
array([500, 700])
- latitude(latitude)float6490.0 80.0 70.0 ... -80.0 -90.0
- units :
- degrees_north
- standard_name :
- latitude
- long_name :
- latitude
array([ 90., 80., 70., 60., 50., 40., 30., 20., 10., 0., -10., -20., -30., -40., -50., -60., -70., -80., -90.]) - longitude(longitude)float640.0 10.0 20.0 ... 330.0 340.0 350.0
- units :
- degrees_east
- standard_name :
- longitude
- long_name :
- longitude
array([ 0., 10., 20., 30., 40., 50., 60., 70., 80., 90., 100., 110., 120., 130., 140., 150., 160., 170., 180., 190., 200., 210., 220., 230., 240., 250., 260., 270., 280., 290., 300., 310., 320., 330., 340., 350.])
- standard_name :
- air_potential_temperature
- long_name :
- Potential temperature
- units :
- kelvin
- level_type :
- pressure
- _earthkit :
- {'message': b"GRIB\x00\x00l\x01\x00\x004\x80b\x9a\xff\x80\x82d\x01\xf4\x18\x06\x03\x00\x00\x01\x00\x00\x01\x00\x00\x00\x15\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x01\t\x04\x010001\x00\x00\x00\x00\x00 \x00\xff\x00\x00$\x00\x13\x01_\x90\x00\x00\x00\x80\x81_\x90\x05W0'\x10'\x10\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x80\x02D\xb9}n\x00\x007777", 'bitsPerValue': 16, 'grid_spec': '{"grid": [10, 10]}'}
The earthkit specific data is still available but it is unchanged so it is out of sync with the actual datarray metadata.
[6]:
da.earthkit.reference_field
[6]:
| number_of_values | 684 |
| array_type | ndarray |
| array_dtype | float64 |
| variable | t |
| standard_name | air_temperature |
| long_name | Temperature |
| units | kelvin |
| chem_variable | None |
| valid_datetime | 2024-06-03 00:00:00 |
| base_datetime | 2024-06-03 00:00:00 |
| step | 0:00:00 |
| level | 500 |
| layer | None |
| level_type | pressure |
| member | 0 |
| grid_spec | {'grid': [10, 10]} |
| grid_type | regular_ll |
| shape | (19, 36) |
| area | (90.0, 0.0, -90.0, 350.0) |
Currently, we need to update the earthkit specific information manually by using set() on the accessor.
[7]:
da = da.earthkit.set({"parameter.variable": "pt"})
da.earthkit.reference_field
[7]:
| number_of_values | 684 |
| array_type | ndarray |
| array_dtype | float64 |
| variable | pt |
| standard_name | unknown |
| long_name | Potential temperature |
| units | kelvin |
| chem_variable | None |
| valid_datetime | 2024-06-03 00:00:00 |
| base_datetime | 2024-06-03 00:00:00 |
| step | 0:00:00 |
| level | 500 |
| layer | None |
| level_type | pressure |
| member | 0 |
| grid_spec | {'grid': [10, 10]} |
| grid_type | regular_ll |
| shape | (19, 36) |
| area | (90.0, 0.0, -90.0, 350.0) |
Writing to GRIB¶
We finish the example by writing the Xarray into a GRIB file. This is done by forming a field for each combination of the coordinate values using the reference field as a template.
[8]:
ekd.to_target("file", "_theta.grib", data=da)
We read back the resulting GRIB data and check its content. We can see that the metadata is correctly set.
[9]:
fl_new = ekd.from_source("file", "_theta.grib").to_fieldlist()
fl_new[0]
[9]:
| number_of_values | 684 |
| array_type | ndarray |
| array_dtype | float64 |
| variable | pt |
| standard_name | unknown |
| long_name | Potential temperature |
| units | kelvin |
| chem_variable | None |
| valid_datetime | 2024-06-03 00:00:00 |
| base_datetime | 2024-06-03 00:00:00 |
| step | 0:00:00 |
| level | 500 |
| layer | None |
| level_type | pressure |
| member | 0 |
| grid_spec | {'grid': [10, 10]} |
| grid_type | regular_ll |
| shape | (19, 36) |
| area | (90.0, 0.0, -90.0, 350.0) |
[10]:
fl_new.ls()
[10]:
| parameter.variable | time.valid_datetime | time.base_datetime | time.step | vertical.level | vertical.level_type | ensemble.member | geography.grid_type | |
|---|---|---|---|---|---|---|---|---|
| 0 | pt | 2024-06-03 00:00:00 | 2024-06-03 00:00:00 | 0 days 00:00:00 | 500 | pressure | 0 | regular_ll |
| 1 | pt | 2024-06-03 00:00:00 | 2024-06-03 00:00:00 | 0 days 00:00:00 | 700 | pressure | 0 | regular_ll |
| 2 | pt | 2024-06-03 06:00:00 | 2024-06-03 00:00:00 | 0 days 06:00:00 | 500 | pressure | 0 | regular_ll |
| 3 | pt | 2024-06-03 06:00:00 | 2024-06-03 00:00:00 | 0 days 06:00:00 | 700 | pressure | 0 | regular_ll |
| 4 | pt | 2024-06-03 12:00:00 | 2024-06-03 12:00:00 | 0 days 00:00:00 | 500 | pressure | 0 | regular_ll |
| 5 | pt | 2024-06-03 12:00:00 | 2024-06-03 12:00:00 | 0 days 00:00:00 | 700 | pressure | 0 | regular_ll |
| 6 | pt | 2024-06-03 18:00:00 | 2024-06-03 12:00:00 | 0 days 06:00:00 | 500 | pressure | 0 | regular_ll |
| 7 | pt | 2024-06-03 18:00:00 | 2024-06-03 12:00:00 | 0 days 06:00:00 | 700 | pressure | 0 | regular_ll |
| 8 | pt | 2024-06-04 00:00:00 | 2024-06-04 00:00:00 | 0 days 00:00:00 | 500 | pressure | 0 | regular_ll |
| 9 | pt | 2024-06-04 00:00:00 | 2024-06-04 00:00:00 | 0 days 00:00:00 | 700 | pressure | 0 | regular_ll |
| 10 | pt | 2024-06-04 06:00:00 | 2024-06-04 00:00:00 | 0 days 06:00:00 | 500 | pressure | 0 | regular_ll |
| 11 | pt | 2024-06-04 06:00:00 | 2024-06-04 00:00:00 | 0 days 06:00:00 | 700 | pressure | 0 | regular_ll |
| 12 | pt | 2024-06-04 12:00:00 | 2024-06-04 12:00:00 | 0 days 00:00:00 | 500 | pressure | 0 | regular_ll |
| 13 | pt | 2024-06-04 12:00:00 | 2024-06-04 12:00:00 | 0 days 00:00:00 | 700 | pressure | 0 | regular_ll |
| 14 | pt | 2024-06-04 18:00:00 | 2024-06-04 12:00:00 | 0 days 06:00:00 | 500 | pressure | 0 | regular_ll |
| 15 | pt | 2024-06-04 18:00:00 | 2024-06-04 12:00:00 | 0 days 06:00:00 | 700 | pressure | 0 | regular_ll |