GRIB: modifying metadata

This notebook demonstrates how to modify the metadata in GRIB fields.

First we read some GRIB data containing pressure level fields.

[1]:
import datetime

import earthkit.data as ekd

fl = ekd.from_source("sample", "tuv_pl.grib").to_fieldlist()

We will use the first field in the rest of the notebook.

[2]:
f = fl[0]
f.ls()
[2]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 t 2018-08-01 12:00:00 2018-08-01 12:00:00 0 days 1000 pressure 0 regular_ll

Using set()

A field can be modified by using set(). It will create a new field with updated metadata.

The preferred way is to use the high-level field metadata keys whenever possible.

[3]:
f1 = f.set({"parameter.variable": "u", "parameter.units": "m/s", "vertical.level": 500})
f1.ls()
[3]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 u 2018-08-01 12:00:00 2018-08-01 12:00:00 0 days 500 pressure 0 regular_ll

If you do need to use raw (ecCodes GRIB) metadata keys it is also possible (it only works for fields created from GRIB data).

[4]:
f1 = f.set({"metadata.shortName": "u", "metadata.level": 500})
f1.ls()
[4]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 u 2018-08-01 12:00:00 2018-08-01 12:00:00 0 days 500 pressure 0 regular_ll

The two types of keys can be mixed. In this case the rule is that the high level keys are applied first, followed by the raw keys (prefixed with metadata).

[5]:
f1 = f.set({"parameter.variable": "u", "parameter.units": "m/s", "metadata.level": 500})
f1.ls()
[5]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 u 2018-08-01 12:00:00 2018-08-01 12:00:00 0 days 500 pressure 0 regular_ll

Please note, there is an important GRIB related difference between using the high-level keys and the raw ecCodes GRIB keys in set(). Please see the “Modified fields and the associated GRIB message” chapter below for details.

Setting time

Setting keys for the “time” field component allows using multiple formats. By default a “datetime” key takes a datatime.datetime object and a “step” key takes a datatime.timedelta object.

[6]:
f1 = f.set({"time.base_datetime": datetime.datetime(2000, 12, 18, 12), "time.step": datetime.timedelta(hours=6)})
f1.ls()
[6]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 t 2000-12-18 18:00:00 2000-12-18 12:00:00 0 days 06:00:00 1000 pressure 0 regular_ll

On top of that, we can also use many compatible formats, e.g:

  • for datetime: ISO date strings, numpy datetime64 values, integers as yyyymmdd (the hour is assumed to be 0 in this case)

  • for timedelta: integers (as hours), strings like “6s”, “6m”, “6h” (for seconds, minutes or hours)

[7]:
f1 = f.set({"time.base_datetime": "2000-12-18T12", "time.step": 6})
f1.ls()
[7]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 t 2000-12-18 18:00:00 2000-12-18 12:00:00 0 days 06:00:00 1000 pressure 0 regular_ll

Setting the step will automatically update the a valid time too.

[8]:
f1 = f.set({"time.step": "10s"})
f1.ls()
[8]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 t 2018-08-01 12:00:10 2018-08-01 12:00:00 0 days 00:00:10 1000 pressure 0 regular_ll

Setting components

It is allowed to set whole individual field components with set(). The simplest way is to specify them as a dict. E.g. the following cell sets a new “time” component on the field.

[9]:
f1 = f.set(time={"base_datetime": "2000-12-18T12", "step": 6})
f1.ls()
[9]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 t 2000-12-18 18:00:00 2000-12-18 12:00:00 0 days 06:00:00 1000 pressure 0 regular_ll

If the dict is not fully specifying the component an exception is raised. E.g. “step” on it is own does not define a time component.

[10]:
try:
    f.set(time={"step": 6})
except Exception as e:
    print(e)
Cannot create ForecastTime from keys: ['step'].

Saving the modified field to disk

We change the level and save the modified field into a GRIB file.

[11]:
f1 = f.set({"vertical.level": 500})
f1.to_target("file", "_res_lev.grib")

# read back the data and compare the values in the first field
f1_w = ekd.from_source("file", "_res_lev.grib").to_fieldlist()
f1_w.ls()
[11]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 t 2018-08-01 12:00:00 2018-08-01 12:00:00 0 days 500 pressure 0 regular_ll

Modified fields and the associated GRIB message

When a field was created from GRIB data the associated GRIB message can be accessed via the field with message().

[12]:
f.message()[:10]
[12]:
b'GRIB\x00\x00\x96\x01\x00\x00'

Having modified the high-level field metadata this GRIB message is not updated and out of sync with the high-level field components. As a result, we cannot access it any longer in the new field. The same is true for any raw ecCodes GRIB metadata.

[13]:
f1 = f.set({"vertical.level": 500})
f1.message()  # returns None
[14]:
f1.get("metadata.level")  # returns None
[15]:
try:
    f1.metadata("level")
except KeyError as e:
    print(e)
'Key metadata.level not found in field'

If we want to keep a valid associated GRIB message in the modified field we need to call sync(). This will create a new GRIB handle, update the relevant metadata in it and create a new field out of it.

[16]:
f1 = f1.sync()
f1.get("metadata.level")
[16]:
500
[17]:
f1.metadata("level")
[17]:
500

Alternatively, we can use the sync=True kwarg in set() to execute the syncing as part of the setting process.

[18]:
f1 = f.set({"vertical.level": 500}, sync=True)
f1.message()[:10]
[18]:
b'GRIB\x00\x00\x96\x01\x00\x00'
[19]:
f1.get("metadata.level")
[19]:
500

Obviously, when only raw metadata keys are used in set() there is no need for syncing.

[20]:
f1 = f.set({"metadata.level": 500})
f1.message()[:10]
[20]:
b'GRIB\x00\x00\x96\x01\x00\x00'
[21]:
f1.get("metadata.level")
[21]:
500