Reading data from URLs as a stream¶

[1]:

import earthkit.data as ekd

earthkit-data can read GRIB data from a URL as a stream without writing anything to disk. This can be activated with the stream=True kwarg when calling from_source().

[2]:

ds = ekd.from_source(
    "url", "https://sites.ecmwf.int/repository/earthkit-data/examples/test4.grib", stream=True
).to_fieldlist()

"The resulting object only supports one iteration. Having finished the iteration the stream is consumed and no more data is available."

[3]:

for f in ds:
    # f is GribField object. It gets deleted when going out of scope
    print(f)

Field(t, 2007-01-01 12:00:00, 2007-01-01 12:00:00, 0:00:00, 500, pressure, 0, regular_ll)
Field(z, 2007-01-01 12:00:00, 2007-01-01 12:00:00, 0:00:00, 500, pressure, 0, regular_ll)
Field(t, 2007-01-01 12:00:00, 2007-01-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
Field(z, 2007-01-01 12:00:00, 2007-01-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)

The iteration can be done in batches by using batched() or group_by().

[4]:

ds = ekd.from_source(
    "url", "https://sites.ecmwf.int/repository/earthkit-data/examples/test4.grib", stream=True
).to_fieldlist()

for f in ds.batched(2):
    # f is a fieldlist
    print(f"len={len(f)} {f.get(['parameter.variable', 'vertical.level'])}")

len=2 [['t', 500], ['z', 500]]
len=2 [['t', 850], ['z', 850]]

Reading the whole stream into memory¶

We can load the whole stream into memory by using read_all=True in to_fieldlist(). The resulting object will be a SimpleFieldList storing all the GRIB messages in memory.

[5]:

ds = ekd.from_source(
    "url", "https://sites.ecmwf.int/repository/earthkit-data/examples/test4.grib", stream=True
).to_fieldlist(read_all=True)

len(ds)

[5]:

[6]:

ds.ls()

[6]:

	parameter.variable	time.valid_datetime	time.base_datetime	vertical.level	vertical.level_type	geography.grid_type
0	t	2007-01-01 12:00:00	2007-01-01 12:00:00	500	pressure	regular_ll
1	z	2007-01-01 12:00:00	2007-01-01 12:00:00	500	pressure	regular_ll
2	t	2007-01-01 12:00:00	2007-01-01 12:00:00	850	pressure	regular_ll
3	z	2007-01-01 12:00:00	2007-01-01 12:00:00	850	pressure	regular_ll

Multiple URLs¶

The stream option works even when the input is a list of URLs.

[7]:

ds = ekd.from_source(
    "url",
    [
        "https://sites.ecmwf.int/repository/earthkit-data/examples/test4.grib",
        "https://sites.ecmwf.int/repository/earthkit-data/examples/test6.grib",
    ],
    stream=True,
).to_fieldlist()

for f in ds.batched(3):
    # f is a fieldlist
    print(f"len={len(f)} {f.get(['parameter.variable', 'vertical.level'])}")

len=3 [['t', 500], ['z', 500], ['t', 850]]
len=3 [['z', 850], ['t', 1000], ['u', 1000]]
len=3 [['v', 1000], ['t', 850], ['u', 850]]
len=1 [['v', 850]]

[ ]: