Retrieving data from FDB

[1]:
import earthkit.data

FDB (Fields DataBase) is a domain-specific object store developed at ECMWF for storing, indexing and retrieving GRIB data. For more information on FBD please consult the following pages:

This example requires FDB access and the FDB_HOME environment variable has to be set correctly.

The following request was written to retrieve data from the operational FDB at ECMWF. Please note that the date must be adjusted since FDB at ECMWF only stores the most recent dates.

[2]:
request = {
    'class': 'od',
    'expver': '0001',
    'stream': 'oper',
    'date': '20240421',
    'time': [0, 12],
    'domain': 'g',
    'type': 'an',
    'levtype': 'sfc',
    'step': 0,
    'param': [151, 167, 168]
}

Reading as a stream

By default we retrieve data from an FDB source with from_source() as a stream.

Iteration with one field at a time in memory

When we use the default arguments in from_source() the resulting object can only be used for iteration and only one field is kept in memory at a time. Fields created in the iteration get deleted when going out of scope.

[3]:
ds = earthkit.data.from_source("fdb", request=request)
for f in ds:
    print(f)
GribField(msl,None,20240421,0,0,0)
GribField(2t,None,20240421,0,0,0)
GribField(2d,None,20240421,0,0,0)
GribField(msl,None,20240421,1200,0,0)
GribField(2t,None,20240421,1200,0,0)
GribField(2d,None,20240421,1200,0,0)

Once the iteration is completed, there is nothing left in ds.

[4]:
sum([1 for _ in ds])
[4]:
0

Iteration with group_by

When we use the group_by method we can iterate throught the stream in groups defined by metadata keys. Each iteration step results in a FieldList object, which is built by consuming GRIB messages from the stream until the values of the metadata keys change. The generated FieldList keeps GRIB messages in memory then gets deleted when going out of scope.

[5]:
ds = earthkit.data.from_source("fdb", request=request)
for f in ds.group_by("time"):
    print(f"len={len(f)} {f.metadata(('param', 'level'))}")
len=3 [('msl', 0), ('2t', 0), ('2d', 0)]
len=3 [('msl', 0), ('2t', 0), ('2d', 0)]

Iteration with batched

When we use the batched method we can iterate throught the stream in batches of fixed size. In this example we create a stream and read 2 fields from it at a time.

[6]:
ds = earthkit.data.from_source("fdb", request=request)
for f in ds.batched(2):
    print(f"len={len(f)} {f.metadata(('param', 'level'))}")
len=2 [('msl', 0), ('2t', 0)]
len=2 [('2d', 0), ('msl', 0)]
len=2 [('2t', 0), ('2d', 0)]

Storing all the fields in memory

We can load the whole stream into memory by using read_all=True in from_source(). The resulting object will be a FieldList.

[7]:
ds = earthkit.data.from_source("fdb", request=request, read_all=True)
[8]:
len(ds)
[8]:
6
[9]:
ds.ls()
[9]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType
0 ecmf msl surface 0 20240421 0 0 an 0 reduced_gg
1 ecmf 2t surface 0 20240421 0 0 an 0 reduced_gg
2 ecmf 2d surface 0 20240421 0 0 an 0 reduced_gg
3 ecmf msl surface 0 20240421 1200 0 an 0 reduced_gg
4 ecmf 2t surface 0 20240421 1200 0 an 0 reduced_gg
5 ecmf 2d surface 0 20240421 1200 0 an 0 reduced_gg
[10]:
ds.sel(param="2t").ls()
[10]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType
0 ecmf 2t surface 0 20240421 0 0 an 0 reduced_gg
1 ecmf 2t surface 0 20240421 1200 0 an 0 reduced_gg
[11]:
ds.to_xarray()
[11]:
<xarray.Dataset>
Dimensions:     (number: 1, time: 2, step: 1, surface: 1, values: 6599680)
Coordinates:
  * number      (number) int64 0
  * time        (time) datetime64[ns] 2024-04-21 2024-04-21T12:00:00
  * step        (step) timedelta64[ns] 00:00:00
  * surface     (surface) float64 0.0
    latitude    (values) float64 ...
    longitude   (values) float64 ...
    valid_time  (time, step) datetime64[ns] ...
Dimensions without coordinates: values
Data variables:
    msl         (number, time, step, surface, values) float32 ...
    t2m         (number, time, step, surface, values) float32 ...
    d2m         (number, time, step, surface, values) float32 ...
Attributes:
    GRIB_edition:            1
    GRIB_centre:             ecmf
    GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             European Centre for Medium-Range Weather Forecasts
    history:                 2024-04-22T11:01 GRIB to CDM+CF via cfgrib-0.9.1...

Reading into a file

We can retrieve data from FDB into a file, which is located in the cache:

[12]:
ds = earthkit.data.from_source("fdb", request=request, stream=False)
[13]:
ds.ls()
[13]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType
0 ecmf msl surface 0 20240421 0 0 an 0 reduced_gg
1 ecmf 2t surface 0 20240421 0 0 an 0 reduced_gg
2 ecmf 2d surface 0 20240421 0 0 an 0 reduced_gg
3 ecmf msl surface 0 20240421 1200 0 an 0 reduced_gg
4 ecmf 2t surface 0 20240421 1200 0 an 0 reduced_gg
5 ecmf 2d surface 0 20240421 1200 0 an 0 reduced_gg

The data is now cached. Subsequent retrievals will used the cached file directly.

[ ]: