Retrieving data from FDB

[1]:

import earthkit.data

FDB (Fields DataBase) is a domain-specific object store developed at ECMWF for storing, indexing and retrieving GRIB data. For more information on FBD please consult the following pages:

FDB
pyfdb

This example requires FDB access and the FDB_HOME environment variable has to be set correctly.

The following request was written to retrieve data from the operational FDB at ECMWF. Please note that the date must be adjusted since FDB at ECMWF only stores the most recent dates.

[2]:

request = {
    'class': 'od',
    'expver': '0001',
    'stream': 'oper',
    'date': '20240421',
    'time': [0, 12],
    'domain': 'g',
    'type': 'an',
    'levtype': 'sfc',
    'step': 0,
    'param': [151, 167, 168]
}

Reading as a stream

By default we retrieve data from an FDB source with from_source() as a stream.

Iteration with one field at a time in memory

When we use the default arguments in from_source() the resulting object can only be used for iteration and only one field is kept in memory at a time. Fields created in the iteration get deleted when going out of scope.

[3]:

ds = earthkit.data.from_source("fdb", request=request)
for f in ds:
    print(f)

GribField(msl,None,20240421,0,0,0)
GribField(2t,None,20240421,0,0,0)
GribField(2d,None,20240421,0,0,0)
GribField(msl,None,20240421,1200,0,0)
GribField(2t,None,20240421,1200,0,0)
GribField(2d,None,20240421,1200,0,0)

Once the iteration is completed, there is nothing left in ds.

[4]:

sum([1 for _ in ds])

[4]:

Iteration with group_by

When we use the group_by method we can iterate throught the stream in groups defined by metadata keys. Each iteration step results in a FieldList object, which is built by consuming GRIB messages from the stream until the values of the metadata keys change. The generated FieldList keeps GRIB messages in memory then gets deleted when going out of scope.

[5]:

ds = earthkit.data.from_source("fdb", request=request)
for f in ds.group_by("time"):
    print(f"len={len(f)} {f.metadata(('param', 'level'))}")

len=3 [('msl', 0), ('2t', 0), ('2d', 0)]
len=3 [('msl', 0), ('2t', 0), ('2d', 0)]

Iteration with batched

When we use the batched method we can iterate throught the stream in batches of fixed size. In this example we create a stream and read 2 fields from it at a time.

[6]:

ds = earthkit.data.from_source("fdb", request=request)
for f in ds.batched(2):
    print(f"len={len(f)} {f.metadata(('param', 'level'))}")

len=2 [('msl', 0), ('2t', 0)]
len=2 [('2d', 0), ('msl', 0)]
len=2 [('2t', 0), ('2d', 0)]

Storing all the fields in memory

We can load the whole stream into memory by using read_all=True in from_source(). The resulting object will be a FieldList.

[7]:

ds = earthkit.data.from_source("fdb", request=request, read_all=True)

[8]:

len(ds)

[8]:

[9]:

ds.ls()

[9]:

	centre	shortName	typeOfLevel	dataDate	dataTime	dataType	gridType
0	ecmf	msl	surface	20240421	0	an	reduced_gg
1	ecmf	2t	surface	20240421	0	an	reduced_gg
2	ecmf	2d	surface	20240421	0	an	reduced_gg
3	ecmf	msl	surface	20240421	1200	an	reduced_gg
4	ecmf	2t	surface	20240421	1200	an	reduced_gg
5	ecmf	2d	surface	20240421	1200	an	reduced_gg

[10]:

ds.sel(param="2t").ls()

[10]:

	centre	shortName	typeOfLevel	level	dataDate	dataTime	stepRange	dataType	number	gridType
0	ecmf	2t	surface	0	20240421	0	0	an	0	reduced_gg
1	ecmf	2t	surface	0	20240421	1200	0	an	0	reduced_gg

[11]:

ds.to_xarray()

Reading into a file

We can retrieve data from FDB into a file, which is located in the cache:

[12]:

ds = earthkit.data.from_source("fdb", request=request, stream=False)

[13]:

ds.ls()

[13]:

	centre	shortName	typeOfLevel	dataDate	dataTime	dataType	gridType
0	ecmf	msl	surface	20240421	0	an	reduced_gg
1	ecmf	2t	surface	20240421	0	an	reduced_gg
2	ecmf	2d	surface	20240421	0	an	reduced_gg
3	ecmf	msl	surface	20240421	1200	an	reduced_gg
4	ecmf	2t	surface	20240421	1200	an	reduced_gg
5	ecmf	2d	surface	20240421	1200	an	reduced_gg

The data is now cached. Subsequent retrievals will used the cached file directly.

[ ]: