Retrieving data from FDB
[1]:
import earthkit.data
FDB (Fields DataBase) is a domain-specific object store developed at ECMWF for storing, indexing and retrieving GRIB data. For more information on FBD please consult the following pages:
This example requires FDB access and the FDB_HOME environment variable has to be set correctly.
The following request was written to retrieve data from the operational FDB at ECMWF. Please note that the date must be adjusted since FDB at ECMWF only stores the most recent dates.
[2]:
request = {
'class': 'od',
'expver': '0001',
'stream': 'oper',
'date': '20240421',
'time': [0, 12],
'domain': 'g',
'type': 'an',
'levtype': 'sfc',
'step': 0,
'param': [151, 167, 168]
}
Reading as a stream
By default we retrieve data from an FDB source with from_source() as a stream.
Iteration with one field at a time in memory
When we use the default arguments in from_source() the resulting object can only be used for iteration and only one field is kept in memory at a time. Fields created in the iteration get deleted when going out of scope.
[3]:
ds = earthkit.data.from_source("fdb", request=request)
for f in ds:
print(f)
GribField(msl,None,20240421,0,0,0)
GribField(2t,None,20240421,0,0,0)
GribField(2d,None,20240421,0,0,0)
GribField(msl,None,20240421,1200,0,0)
GribField(2t,None,20240421,1200,0,0)
GribField(2d,None,20240421,1200,0,0)
Once the iteration is completed, there is nothing left in ds.
[4]:
sum([1 for _ in ds])
[4]:
0
Iteration with group_by
When we use the group_by method we can iterate throught the stream in groups defined by metadata keys. Each iteration step results in a FieldList object, which is built by consuming GRIB messages from the stream until the values of the metadata keys change. The generated FieldList keeps GRIB messages in memory then gets deleted when going out of scope.
[5]:
ds = earthkit.data.from_source("fdb", request=request)
for f in ds.group_by("time"):
print(f"len={len(f)} {f.metadata(('param', 'level'))}")
len=3 [('msl', 0), ('2t', 0), ('2d', 0)]
len=3 [('msl', 0), ('2t', 0), ('2d', 0)]
Iteration with batched
When we use the batched method we can iterate throught the stream in batches of fixed size. In this example we create a stream and read 2 fields from it at a time.
[6]:
ds = earthkit.data.from_source("fdb", request=request)
for f in ds.batched(2):
print(f"len={len(f)} {f.metadata(('param', 'level'))}")
len=2 [('msl', 0), ('2t', 0)]
len=2 [('2d', 0), ('msl', 0)]
len=2 [('2t', 0), ('2d', 0)]
Storing all the fields in memory
We can load the whole stream into memory by using read_all=True in from_source(). The resulting object will be a FieldList.
[7]:
ds = earthkit.data.from_source("fdb", request=request, read_all=True)
[8]:
len(ds)
[8]:
6
[9]:
ds.ls()
[9]:
| centre | shortName | typeOfLevel | level | dataDate | dataTime | stepRange | dataType | number | gridType | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ecmf | msl | surface | 0 | 20240421 | 0 | 0 | an | 0 | reduced_gg |
| 1 | ecmf | 2t | surface | 0 | 20240421 | 0 | 0 | an | 0 | reduced_gg |
| 2 | ecmf | 2d | surface | 0 | 20240421 | 0 | 0 | an | 0 | reduced_gg |
| 3 | ecmf | msl | surface | 0 | 20240421 | 1200 | 0 | an | 0 | reduced_gg |
| 4 | ecmf | 2t | surface | 0 | 20240421 | 1200 | 0 | an | 0 | reduced_gg |
| 5 | ecmf | 2d | surface | 0 | 20240421 | 1200 | 0 | an | 0 | reduced_gg |
[10]:
ds.sel(param="2t").ls()
[10]:
| centre | shortName | typeOfLevel | level | dataDate | dataTime | stepRange | dataType | number | gridType | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ecmf | 2t | surface | 0 | 20240421 | 0 | 0 | an | 0 | reduced_gg |
| 1 | ecmf | 2t | surface | 0 | 20240421 | 1200 | 0 | an | 0 | reduced_gg |
[11]:
ds.to_xarray()
[11]:
<xarray.Dataset>
Dimensions: (number: 1, time: 2, step: 1, surface: 1, values: 6599680)
Coordinates:
* number (number) int64 0
* time (time) datetime64[ns] 2024-04-21 2024-04-21T12:00:00
* step (step) timedelta64[ns] 00:00:00
* surface (surface) float64 0.0
latitude (values) float64 ...
longitude (values) float64 ...
valid_time (time, step) datetime64[ns] ...
Dimensions without coordinates: values
Data variables:
msl (number, time, step, surface, values) float32 ...
t2m (number, time, step, surface, values) float32 ...
d2m (number, time, step, surface, values) float32 ...
Attributes:
GRIB_edition: 1
GRIB_centre: ecmf
GRIB_centreDescription: European Centre for Medium-Range Weather Forecasts
GRIB_subCentre: 0
Conventions: CF-1.7
institution: European Centre for Medium-Range Weather Forecasts
history: 2024-04-22T11:01 GRIB to CDM+CF via cfgrib-0.9.1...Reading into a file
We can retrieve data from FDB into a file, which is located in the cache:
[12]:
ds = earthkit.data.from_source("fdb", request=request, stream=False)
[13]:
ds.ls()
[13]:
| centre | shortName | typeOfLevel | level | dataDate | dataTime | stepRange | dataType | number | gridType | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ecmf | msl | surface | 0 | 20240421 | 0 | 0 | an | 0 | reduced_gg |
| 1 | ecmf | 2t | surface | 0 | 20240421 | 0 | 0 | an | 0 | reduced_gg |
| 2 | ecmf | 2d | surface | 0 | 20240421 | 0 | 0 | an | 0 | reduced_gg |
| 3 | ecmf | msl | surface | 0 | 20240421 | 1200 | 0 | an | 0 | reduced_gg |
| 4 | ecmf | 2t | surface | 0 | 20240421 | 1200 | 0 | an | 0 | reduced_gg |
| 5 | ecmf | 2d | surface | 0 | 20240421 | 1200 | 0 | an | 0 | reduced_gg |
The data is now cached. Subsequent retrievals will used the cached file directly.
[ ]: