Reading data from URLs as a stream

[1]:
import earthkit.data as ekd

earthkit-data can read GRIB data from a URL as a stream without writing anything to disk. This can be activated with the stream=True kwarg when calling from_source().

[2]:
ds = ekd.from_source("url",
                       "https://sites.ecmwf.int/repository/earthkit-data/examples/test4.grib",
                        stream=True)

The resulting object only supports one iteration. Having finsihed the iteration the stream is consumed and no more data is available.

[3]:
for f in ds:
    # f is GribField object. It gets deleted when going out of scope
    print(f)
GribField(t,500,20070101,1200,0,0)
GribField(z,500,20070101,1200,0,0)
GribField(t,850,20070101,1200,0,0)
GribField(z,850,20070101,1200,0,0)

The iteration can be done in batches by using batched or group_by.

[4]:
ds = ekd.from_source("url",
                       "https://sites.ecmwf.int/repository/earthkit-data/examples/test4.grib",
                        stream=True)

for f in ds.batched(2):
    # f is a fieldlist
    print(f"len={len(f)} {f.metadata(('param', 'level'))}")
len=2 [('t', 500), ('z', 500)]
len=2 [('t', 850), ('z', 850)]

Reading the whole stream into memory

We can load the whole stream into memory by using read_all=True in from_source(). The resulting object will be a FieldList storing all the GRIB messages in memory.

[5]:
ds = ekd.from_source("url",
                       "https://sites.ecmwf.int/repository/earthkit-data/examples/test4.grib",
                        stream=True, read_all=True)

len(ds)
[5]:
4
[6]:
ds.ls()
[6]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType
0 ecmf t isobaricInhPa 500 20070101 1200 0 an 0 regular_ll
1 ecmf z isobaricInhPa 500 20070101 1200 0 an 0 regular_ll
2 ecmf t isobaricInhPa 850 20070101 1200 0 an 0 regular_ll
3 ecmf z isobaricInhPa 850 20070101 1200 0 an 0 regular_ll

Multiple URLs

The stream option works even when the input is a list of URLs.

[7]:
ds = ekd.from_source("url",
                       ["https://sites.ecmwf.int/repository/earthkit-data/examples/test4.grib",
                       "https://sites.ecmwf.int/repository/earthkit-data/examples/test6.grib"],
                        stream=True)

for f in ds.batched(3):
    # f is a fieldlist
    print(f"len={len(f)} {f.metadata(('param', 'level'))}")
len=3 [('t', 500), ('z', 500), ('t', 850)]
len=3 [('z', 850), ('t', 1000), ('u', 1000)]
len=3 [('v', 1000), ('t', 850), ('u', 850)]
len=1 [('v', 850)]
[ ]: