Reading data from URLs as a stream

[1]:
import earthkit.data as ekd

earthkit-data can read GRIB data from a URL as a stream without writing anything to disk. This can be activated with the stream=True kwarg when calling from_source().

[2]:
ds = ekd.from_source(
    "url", "https://sites.ecmwf.int/repository/earthkit-data/examples/test4.grib", stream=True
).to_fieldlist()
"The resulting object only supports one iteration. Having finished the iteration the stream is consumed and no more data is available."
[3]:
for f in ds:
    # f is GribField object. It gets deleted when going out of scope
    print(f)
Field(t, 2007-01-01 12:00:00, 2007-01-01 12:00:00, 0:00:00, 500, pressure, 0, regular_ll)
Field(z, 2007-01-01 12:00:00, 2007-01-01 12:00:00, 0:00:00, 500, pressure, 0, regular_ll)
Field(t, 2007-01-01 12:00:00, 2007-01-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
Field(z, 2007-01-01 12:00:00, 2007-01-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)

The iteration can be done in batches by using batched() or group_by().

[4]:
ds = ekd.from_source(
    "url", "https://sites.ecmwf.int/repository/earthkit-data/examples/test4.grib", stream=True
).to_fieldlist()

for f in ds.batched(2):
    # f is a fieldlist
    print(f"len={len(f)} {f.get(['parameter.variable', 'vertical.level'])}")
len=2 [['t', 500], ['z', 500]]
len=2 [['t', 850], ['z', 850]]

Reading the whole stream into memory

We can load the whole stream into memory by using read_all=True in to_fieldlist(). The resulting object will be a SimpleFieldList storing all the GRIB messages in memory.

[5]:
ds = ekd.from_source(
    "url", "https://sites.ecmwf.int/repository/earthkit-data/examples/test4.grib", stream=True
).to_fieldlist(read_all=True)

len(ds)
[5]:
4
[6]:
ds.ls()
[6]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 t 2007-01-01 12:00:00 2007-01-01 12:00:00 0 days 500 pressure 0 regular_ll
1 z 2007-01-01 12:00:00 2007-01-01 12:00:00 0 days 500 pressure 0 regular_ll
2 t 2007-01-01 12:00:00 2007-01-01 12:00:00 0 days 850 pressure 0 regular_ll
3 z 2007-01-01 12:00:00 2007-01-01 12:00:00 0 days 850 pressure 0 regular_ll

Multiple URLs

The stream option works even when the input is a list of URLs.

[7]:
ds = ekd.from_source(
    "url",
    [
        "https://sites.ecmwf.int/repository/earthkit-data/examples/test4.grib",
        "https://sites.ecmwf.int/repository/earthkit-data/examples/test6.grib",
    ],
    stream=True,
).to_fieldlist()

for f in ds.batched(3):
    # f is a fieldlist
    print(f"len={len(f)} {f.get(['parameter.variable', 'vertical.level'])}")
len=3 [['t', 500], ['z', 500], ['t', 850]]
len=3 [['z', 850], ['t', 1000], ['u', 1000]]
len=3 [['v', 1000], ['t', 850], ['u', 850]]
len=1 [['v', 850]]
[ ]: