Streams
We can read GRIB and CoverageJson data as a stream by using the stream=True option in from_source(). It is only available for the following sources:
Iterating over a stream
When reading a stream the returned object offers limited access to the data and primarily serves as an iterator. Once the iteration is finished the stream data is not available any longer.
The example below shows how we iterate through a GRIB data stream field by field:
>>> import earthkit.data as ekd
>>> url = "https://sites.ecmwf.int/repository/earthkit-data/examples/test6.grib"
>>> ds = ekd.from_source("url", url, stream=True)
>>> for f in fields:
... print(f)
...
GribField(t,1000,20180801,1200,0,0)
GribField(u,1000,20180801,1200,0,0)
GribField(v,1000,20180801,1200,0,0)
GribField(t,850,20180801,1200,0,0)
GribField(u,850,20180801,1200,0,0)
GribField(v,850,20180801,1200,0,0)
We can also use batched() to iterate in batches of fixed size. Each iteration step now yields a Fieldlist.
>>> import earthkit.data as ekd
>>> url = "https://sites.ecmwf.int/repository/earthkit-data/examples/test6.grib"
>>> ds = ekd.from_source("url", url, stream=True)
>>> for f in ds.batched(2):
... print(f"len={len(f)} {f.metadata(('param', 'level'))}")
...
len=2 [('t', 1000), ('u', 1000)]
len=2 [('v', 1000), ('t', 850)]
len=2 [('u', 850), ('v', 850)]
Another option is to use group_by() to iterate in groups defined by metadata keys. Each iteration step results in a Fieldlist, which is built by consuming GRIB messages from the stream until the values of the metadata keys change.
>>> import earthkit.data as ekd
>>> url = "https://sites.ecmwf.int/repository/earthkit-data/examples/test6.grib"
>>> ds = ekd.from_source("url", url, stream=True)
>>> for f in ds._group_by("level"):
... print(f"len={len(f)} {f.metadata(('param', 'level'))}")
...
len=3 [('t', 1000), ('u', 1000), ('v', 1000)]
len=3 [('t', 850), ('u', 850), ('v', 850)]
Reading all the data into memory
We can load the whole stream into memory by using read_all=True in from_source(). The resulting object will be a FieldList storing all the GRIB messages in memory. Use this option carefully!
>>> import earthkit.data as ekd
>>> url = "https://sites.ecmwf.int/repository/earthkit-data/examples/test6.grib"
>>> ds = ekd.from_source("url", url, stream=True, read_all=True)
>>> len(ds)
6