GRIB: retrieving FDB data lazily¶
This example reads GRIB data lazily from a local FDB.
FDB (Fields DataBase) is a domain-specific object store developed at ECMWF for storing, indexing and retrieving GRIB data. For more information on FBD please consult the following pages:
FDB support in earthkit-data requires both FDB and pyfdb to be installed.
Setting up the input FDB¶
We create an FDB in the current folder using the schema taken from the pyfdb test suite. To do so first we need to ensure the directory exists. Next, we have to specify the configuration.
[1]:
import os
import earthkit.data as ekd
fdb_schema = "../default_fdb_schema"
fdb_dir = "./_fdb_lazy_demo"
os.makedirs(fdb_dir, exist_ok=True)
config = {
"type": "local",
"engine": "toc",
"schema": fdb_schema,
"spaces": [{"handler": "Default", "roots": [{"path": fdb_dir}]}],
}
# get GRIB data on pressure levels and load it into a fieldlist
fl_in = ekd.from_source("sample", "pl.grib").to_fieldlist()
# write GRIB data to our local FDB
fl_in.to_target("fdb", config=config)
Reading data from the FDB¶
We retrieve data from the FDB source with from_source() using the lazy=True kwarg. With this the field/fieldlist structure is inferred from the request and the actual GRIB data is only retrieved when the values are needed. The exception is a reference field (per parameter) that is used behind the scenes for metadata that cannot be inferred from the request.
[2]:
request = {
"class": "od",
"expver": "0001",
"stream": "oper",
"date": [20240603, 20240604],
"time": [0, 1200],
"domain": "g",
"type": "fc",
"levtype": "pl",
"levelist": [500, 700],
"step": [0, 6],
"param": [130, 157],
}
fl = ekd.from_source(
"fdb",
request,
config=config,
lazy=True,
).to_fieldlist()
Standard field properties work.
[3]:
fl.head()
[3]:
| parameter.variable | time.valid_datetime | time.base_datetime | time.step | vertical.level | vertical.level_type | ensemble.member | geography.grid_type | |
|---|---|---|---|---|---|---|---|---|
| 0 | t | 2024-06-03 00:00:00 | 2024-06-03 | 0 days 00:00:00 | 500 | pressure | 0 | regular_ll |
| 1 | r | 2024-06-03 00:00:00 | 2024-06-03 | 0 days 00:00:00 | 500 | pressure | 0 | regular_ll |
| 2 | t | 2024-06-03 00:00:00 | 2024-06-03 | 0 days 00:00:00 | 700 | pressure | 0 | regular_ll |
| 3 | r | 2024-06-03 00:00:00 | 2024-06-03 | 0 days 00:00:00 | 700 | pressure | 0 | regular_ll |
| 4 | t | 2024-06-03 06:00:00 | 2024-06-03 | 0 days 06:00:00 | 500 | pressure | 0 | regular_ll |
Getting high-level metadata works.
[4]:
fl[0].get("parameter.variable"), fl[1].get("parameter.variable")
[4]:
('t', 'r')
For a given field the data is retrieved from the FDB when the values are needed.
[5]:
fl[7].values[:4]
[5]:
array([57.20194507, 57.20194507, 57.20194507, 57.20194507])
Raw GRIB metadata¶
Please note the lazy loaded FDB is an experimental feature and accessing raw GRIB metadata might give the wrong result. This is down to the fact that the raw GRIB metadata is read from the reference field (per parameter). We use field #0 and #2 to demonstrate it.
[6]:
fl[0], fl[2]
[6]:
(Field(t, 2024-06-03 00:00:00, 2024-06-03 00:00:00, 0:00:00, 500, pressure, 0, regular_ll),
Field(t, 2024-06-03 00:00:00, 2024-06-03 00:00:00, 0:00:00, 700, pressure, 0, regular_ll))
When we use raw GRIB keys that have no variation throughout the fields per parameter (e.g. “shortName”) we get the correct values. However, for keys like “levelist” we get a wrong value for the second field (500 instead of 700).
[7]:
for k in ["metadata.shortName", "metadata.levelist"]:
print(k, fl[0].get(k), fl[2].get(k))
metadata.shortName t t
metadata.levelist 500 500
High-level metadata should always work.
[8]:
for k in ["parameter.variable", "vertical.level"]:
print(k, fl[0].get(k), fl[2].get(k))
parameter.variable t t
vertical.level 500 700
Xarray support¶
Conversion to Xarray works and does not require any extra data retrieval from the FDB.
[9]:
ds = fl.to_xarray()
ds
[9]:
<xarray.Dataset> Size: 176kB
Dimensions: (forecast_reference_time: 4, step: 2, level: 2,
latitude: 19, longitude: 36)
Coordinates:
* forecast_reference_time (forecast_reference_time) datetime64[ns] 32B 202...
* step (step) timedelta64[ns] 16B 00:00:00 06:00:00
* level (level) int64 16B 500 700
* latitude (latitude) float64 152B 90.0 80.0 ... -80.0 -90.0
* longitude (longitude) float64 288B 0.0 10.0 ... 340.0 350.0
Data variables:
r (forecast_reference_time, step, level, latitude, longitude) float64 88kB ...
t (forecast_reference_time, step, level, latitude, longitude) float64 88kB ...
Attributes:
Conventions: CF-1.8
institution: ECMWFWhen we carry out a computation each GRIB field involved is retrieved from the FDB. Currently, this is done by performing a separate retrieval per field. The retrieved fields are not cached locally but discarded after the values are extracted.
[10]:
m = ds.t.mean(["step", "level"]).load()
m
[10]:
<xarray.DataArray 't' (forecast_reference_time: 4, latitude: 19, longitude: 36)> Size: 22kB
array([[[257.72924805, 257.72924805, 257.72924805, ..., 257.72924805,
257.72924805, 257.72924805],
[255.30249023, 256.99707031, 258.26367188, ..., 250.86230469,
251.89160156, 253.47998047],
[255.93823242, 259.12451172, 261.5078125 , ..., 248.36401367,
249.62573242, 252.46582031],
...,
[238.59375 , 239.85839844, 241.45727539, ..., 241.83569336,
239.90942383, 238.53637695],
[235.27368164, 235.31665039, 235.57080078, ..., 236.88452148,
236.1496582 , 235.55712891],
[234.2277832 , 234.2277832 , 234.2277832 , ..., 234.2277832 ,
234.2277832 , 234.2277832 ]],
[[257.02156448, 257.02156448, 257.02156448, ..., 257.02156448,
257.02156448, 257.02156448],
[255.38655472, 256.76448441, 257.82625198, ..., 252.01155472,
252.75764847, 253.95491409],
[254.80794144, 258.04402542, 260.85652542, ..., 249.80598831,
249.97371292, 251.84114456],
...
[238.72603226, 237.12837601, 237.04805374, ..., 240.22407913,
240.77290726, 240.33394241],
[234.35078812, 233.49507523, 233.03071976, ..., 237.3146553 ,
236.40669632, 235.39522171],
[232.53755569, 232.53755569, 232.53755569, ..., 232.53755569,
232.53755569, 232.53755569]],
[[256.76979065, 256.76979065, 256.76979065, ..., 256.76979065,
256.76979065, 256.76979065],
[255.4912262 , 256.78126526, 257.88331604, ..., 253.53248596,
253.75880432, 254.40187073],
[253.97413635, 257.88014221, 261.40919495, ..., 253.35157776,
251.49610901, 251.48902893],
...,
[239.75074768, 238.69850159, 237.39552307, ..., 238.52833557,
239.03321838, 239.75172424],
[234.31446838, 233.35670471, 232.66896057, ..., 237.8896637 ,
236.55738831, 235.39356995],
[232.35523987, 232.35523987, 232.35523987, ..., 232.35523987,
232.35523987, 232.35523987]]], shape=(4, 19, 36))
Coordinates:
* forecast_reference_time (forecast_reference_time) datetime64[ns] 32B 202...
* latitude (latitude) float64 152B 90.0 80.0 ... -80.0 -90.0
* longitude (longitude) float64 288B 0.0 10.0 ... 340.0 350.0
Attributes:
standard_name: air_temperature
long_name: Temperature
units: kelvin
level_type: pressure
_earthkit: {'message': b"GRIB\x00\x00l\x01\x00\x004\x80b\x9a\xff\x80...[ ]: