Reading data from URLs as a stream¶
[1]:
import earthkit.data as ekd
earthkit-data can read GRIB data from a URL as a stream without writing anything to disk. This can be activated with the stream=True kwarg when calling from_source().
[2]:
ds = ekd.from_source(
"url", "https://sites.ecmwf.int/repository/earthkit-data/examples/test4.grib", stream=True
).to_fieldlist()
"The resulting object only supports one iteration. Having finished the iteration the stream is consumed and no more data is available."
[3]:
for f in ds:
# f is GribField object. It gets deleted when going out of scope
print(f)
Field(t, 2007-01-01 12:00:00, 2007-01-01 12:00:00, 0:00:00, 500, pressure, 0, regular_ll)
Field(z, 2007-01-01 12:00:00, 2007-01-01 12:00:00, 0:00:00, 500, pressure, 0, regular_ll)
Field(t, 2007-01-01 12:00:00, 2007-01-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
Field(z, 2007-01-01 12:00:00, 2007-01-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
The iteration can be done in batches by using batched() or group_by().
[4]:
ds = ekd.from_source(
"url", "https://sites.ecmwf.int/repository/earthkit-data/examples/test4.grib", stream=True
).to_fieldlist()
for f in ds.batched(2):
# f is a fieldlist
print(f"len={len(f)} {f.get(['parameter.variable', 'vertical.level'])}")
len=2 [['t', 500], ['z', 500]]
len=2 [['t', 850], ['z', 850]]
Reading the whole stream into memory¶
We can load the whole stream into memory by using read_all=True in to_fieldlist(). The resulting object will be a SimpleFieldList storing all the GRIB messages in memory.
[5]:
ds = ekd.from_source(
"url", "https://sites.ecmwf.int/repository/earthkit-data/examples/test4.grib", stream=True
).to_fieldlist(read_all=True)
len(ds)
[5]:
4
[6]:
ds.ls()
[6]:
| parameter.variable | time.valid_datetime | time.base_datetime | time.step | vertical.level | vertical.level_type | ensemble.member | geography.grid_type | |
|---|---|---|---|---|---|---|---|---|
| 0 | t | 2007-01-01 12:00:00 | 2007-01-01 12:00:00 | 0 days | 500 | pressure | 0 | regular_ll |
| 1 | z | 2007-01-01 12:00:00 | 2007-01-01 12:00:00 | 0 days | 500 | pressure | 0 | regular_ll |
| 2 | t | 2007-01-01 12:00:00 | 2007-01-01 12:00:00 | 0 days | 850 | pressure | 0 | regular_ll |
| 3 | z | 2007-01-01 12:00:00 | 2007-01-01 12:00:00 | 0 days | 850 | pressure | 0 | regular_ll |
Multiple URLs¶
The stream option works even when the input is a list of URLs.
[7]:
ds = ekd.from_source(
"url",
[
"https://sites.ecmwf.int/repository/earthkit-data/examples/test4.grib",
"https://sites.ecmwf.int/repository/earthkit-data/examples/test6.grib",
],
stream=True,
).to_fieldlist()
for f in ds.batched(3):
# f is a fieldlist
print(f"len={len(f)} {f.get(['parameter.variable', 'vertical.level'])}")
len=3 [['t', 500], ['z', 500], ['t', 850]]
len=3 [['z', 850], ['t', 1000], ['u', 1000]]
len=3 [['v', 1000], ['t', 850], ['u', 850]]
len=1 [['v', 850]]
[ ]: