Streams¶
Warning
This guide is currently under construction and may be incomplete or inaccurate.
We can read grib and CoverageJson data as a stream by using the stream=True option in from_source(). It is only available for the following sources:
Iterating over a stream¶
When reading a stream we need to call to_fieldlist() on the returned object to get an iterable of Fields. Once the iteration is finished the stream data is not available any longer.
The example below shows how we iterate through a GRIB data stream field by field:
>>> import earthkit.data as ekd
>>> url = "https://sites.ecmwf.int/repository/earthkit-data/how-tos/test6.grib"
>>> fl = ekd.from_source("url", url, stream=True).to_fieldlist()
>>> for f in fl:
... print(f)
...
GribField(t,1000,20180801,1200,0,0)
GribField(u,1000,20180801,1200,0,0)
GribField(v,1000,20180801,1200,0,0)
GribField(t,850,20180801,1200,0,0)
GribField(u,850,20180801,1200,0,0)
GribField(v,850,20180801,1200,0,0)
We can also use batched() to iterate in batches of fixed size. Each iteration step now yields a Fieldlist.
>>> import earthkit.data as ekd
>>> url = "https://sites.ecmwf.int/repository/earthkit-data/how-tos/test6.grib"
>>> fl = ekd.from_source("url", url, stream=True).to_fieldlist()
>>> for f in fl.batched(2):
... print(f"len={len(f)} {f.get(('parameter.variable', 'vertical.level'))}")
...
len=2 [('t', 1000), ('u', 1000)]
len=2 [('v', 1000), ('t', 850)]
len=2 [('u', 850), ('v', 850)]
Another option is to use group_by() to iterate in groups defined by metadata keys. Each iteration step results in a Fieldlist, which is built by consuming GRIB messages from the stream until the values of the metadata keys change.
>>> import earthkit.data as ekd
>>> url = "https://sites.ecmwf.int/repository/earthkit-data/how-tos/test6.grib"
>>> fl = ekd.from_source("url", url, stream=True).to_fieldlist()
>>> for f in fl.group_by("vertical.level"):
... print(f"len={len(f)} {f.get(('parameter.variable', 'vertical.level'))}")
...
len=3 [('t', 1000), ('u', 1000), ('v', 1000)]
len=3 [('t', 850), ('u', 850), ('v', 850)]
Reading all the data into memory¶
We can load the whole stream into memory by using read_all=True in to_fieldlist(). The resulting object will be a FieldList storing all the Fields in memory. Use this option carefully!
>>> import earthkit.data as ekd
>>> url = "https://sites.ecmwf.int/repository/earthkit-data/how-tos/test6.grib"
>>> fl = ekd.from_source("url", url, stream=True).to_fieldlist(read_all=True)
>>> len(fl)
6