Retrieving data from FDB

[1]:
import earthkit.data

FDB (Fields DataBase) is a domain-specific object store developed at ECMWF for storing, indexing and retrieving GRIB data. For more information on FBD please consult the following pages:

This example requires FDB access and the FDB_HOME environment variable has to be set correctly.

The following request was written to retrieve data from the operational FDB at ECMWF. Please note that the date must be adjusted since FDB at ECMWF only stores the most recent dates.

[2]:
request = {
    "class": "od",
    "expver": "0001",
    "stream": "oper",
    "date": "20260423",
    "time": [0, 12],
    "domain": "g",
    "type": "an",
    "levtype": "sfc",
    "step": 0,
    "param": [151, 167, 168],
}

Reading as a stream

By default we retrieve data from an FDB source with from_source() as a stream.

Iteration with one field at a time in memory

When we use the default arguments in from_source() the resulting object can only be used for iteration and only one field is kept in memory at a time. Fields created in the iteration get deleted when going out of scope.

[3]:
ds = earthkit.data.from_source("fdb", request=request).to_fieldlist()
for f in ds:
    print(f)
Field(msl, 2026-04-23 00:00:00, 2026-04-23 00:00:00, 0:00:00, 0, surface, 0, reduced_gg)
Field(2t, 2026-04-23 00:00:00, 2026-04-23 00:00:00, 0:00:00, 0, surface, 0, reduced_gg)
Field(2d, 2026-04-23 00:00:00, 2026-04-23 00:00:00, 0:00:00, 0, surface, 0, reduced_gg)
Field(msl, 2026-04-23 12:00:00, 2026-04-23 12:00:00, 0:00:00, 0, surface, 0, reduced_gg)
Field(2t, 2026-04-23 12:00:00, 2026-04-23 12:00:00, 0:00:00, 0, surface, 0, reduced_gg)
Field(2d, 2026-04-23 12:00:00, 2026-04-23 12:00:00, 0:00:00, 0, surface, 0, reduced_gg)

Once the iteration is completed, there is nothing left in ds.

[4]:
sum([1 for _ in ds])
[4]:
0

Iteration with group_by

When we use the py:func:~earthkit.data.indexing.stream.StreamFieldList.group_by method we can iterate through the stream in groups defined by metadata keys. Each iteration step results in a :py:class:~earthkit.data.indexing.simple.SimpleFieldList` object, which is built by consuming GRIB messages from the stream until the values of the metadata keys change. The generated FieldList keeps GRIB messages in memory then gets deleted when going out of scope.

[5]:
ds = earthkit.data.from_source("fdb", request=request).to_fieldlist()
for f in ds.group_by("metadata.time"):
    print(f"len={len(f)} {f.metadata(('param', 'level'))}")
len=3 [('msl', 0), ('2t', 0), ('2d', 0)]
len=3 [('msl', 0), ('2t', 0), ('2d', 0)]

Iteration with batched

When we use the batched() method we can iterate through the stream in batches of fixed size. In this example we create a stream and read 2 fields from it at a time.

[6]:
ds = earthkit.data.from_source("fdb", request=request).to_fieldlist()
for f in ds.batched(2):
    print(f"len={len(f)} {f.metadata(('param', 'level'))}")
len=2 [('msl', 0), ('2t', 0)]
len=2 [('2d', 0), ('msl', 0)]
len=2 [('2t', 0), ('2d', 0)]

Storing all the fields in memory

We can load the whole stream into memory by using read_all=True in to_fieldlist(). The resulting object will be a FieldList.

[7]:
ds = earthkit.data.from_source("fdb", request=request).to_fieldlist(read_all=True)
[8]:
len(ds)
[8]:
6
[9]:
ds.ls()
[9]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 msl 2026-04-23 00:00:00 2026-04-23 00:00:00 0 days 0 surface 0 reduced_gg
1 2t 2026-04-23 00:00:00 2026-04-23 00:00:00 0 days 0 surface 0 reduced_gg
2 2d 2026-04-23 00:00:00 2026-04-23 00:00:00 0 days 0 surface 0 reduced_gg
3 msl 2026-04-23 12:00:00 2026-04-23 12:00:00 0 days 0 surface 0 reduced_gg
4 2t 2026-04-23 12:00:00 2026-04-23 12:00:00 0 days 0 surface 0 reduced_gg
5 2d 2026-04-23 12:00:00 2026-04-23 12:00:00 0 days 0 surface 0 reduced_gg
[10]:
ds.sel({"parameter.variable": "2t"}).ls()
[10]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 2t 2026-04-23 00:00:00 2026-04-23 00:00:00 0 days 0 surface 0 reduced_gg
1 2t 2026-04-23 12:00:00 2026-04-23 12:00:00 0 days 0 surface 0 reduced_gg
[11]:
ds.to_xarray()
[11]:
<xarray.Dataset> Size: 422MB
Dimensions:                  (forecast_reference_time: 2, values: 6599680)
Coordinates:
  * forecast_reference_time  (forecast_reference_time) datetime64[ns] 16B 202...
    latitude                 (values) float64 53MB ...
    longitude                (values) float64 53MB ...
Dimensions without coordinates: values
Data variables:
    2d                       (forecast_reference_time, values) float64 106MB ...
    2t                       (forecast_reference_time, values) float64 106MB ...
    msl                      (forecast_reference_time, values) float64 106MB ...
Attributes:
    Conventions:  CF-1.8
    institution:  ECMWF

Reading into a file

We can retrieve data from FDB into a file, which is located in the cache:

[12]:
ds = earthkit.data.from_source("fdb", request=request, stream=False).to_fieldlist()
[13]:
ds.ls()
[13]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 msl 2026-04-23 00:00:00 2026-04-23 00:00:00 0 days 0 surface 0 reduced_gg
1 2t 2026-04-23 00:00:00 2026-04-23 00:00:00 0 days 0 surface 0 reduced_gg
2 2d 2026-04-23 00:00:00 2026-04-23 00:00:00 0 days 0 surface 0 reduced_gg
3 msl 2026-04-23 12:00:00 2026-04-23 12:00:00 0 days 0 surface 0 reduced_gg
4 2t 2026-04-23 12:00:00 2026-04-23 12:00:00 0 days 0 surface 0 reduced_gg
5 2d 2026-04-23 12:00:00 2026-04-23 12:00:00 0 days 0 surface 0 reduced_gg

The data is now cached. Subsequent retrievals will use the cached file directly.

[ ]: