Data objects
When we call from_source() it will return a data object. The actual object type depends on the source parameters and the data format, but is supposed to implement a common set of methods/operators, some of which will only be available for certain data types.
The list of common methods/operators:
Conversion to scientific Python objects
We can convert data objects into familiar scientific Python objects (including numpy arrays, pandas dataframes, xarray datasets):
ds.to_xarray() # for field data
ds.to_pandas() # for non-field data
ds.to_numpy() # when the data is a n-dimensional array.
Concatenation
Data objects can be concatenated with the “+” operator:
>>> import earthkit.data as ekd
>>> ds1 = ekd.from_source("file", "docs/examples/test.grib")
>>> len(ds1)
2
>>> ds2 = ekd.from_source("file", "docs/examples/test6.grib")
>>> len(ds2)
6
>>> ds = ds1 + ds2
>>> len(ds)
8
Iteration
When an earthkit-data data source or dataset provides a FieldList or message list, we can iterate through it to access each element (in a given order see below).
In the the following example we read a GRIB file from disk. In the iteration each element is a field (representing a GRIB message):
>>> import earthkit.data as ekd
>>> ds = ekd.from_source("file", "docs/examples/test6.grib")
>>> len(ds)
6
>>> for f in ds:
... print(f)
...
GribField(t,1000,20180801,1200,0,0)
GribField(u,1000,20180801,1200,0,0)
GribField(v,1000,20180801,1200,0,0)
GribField(t,850,20180801,1200,0,0)
GribField(u,850,20180801,1200,0,0)
GribField(v,850,20180801,1200,0,0)
Iteration with .batched()
When an earthkit-data data source or dataset provides a FieldList or message list, we can iterate through it in batches of fixed size using batched(). This method also works for streams.
In the the following example we read a GRIB file from disk and iterate through it in batches of 2. Each iteration step yields a FieldList of 2 fields.
>>> import earthkit.data as ekd
>>> ds = ekd.from_source("file", "docs/examples/test6.grib")
>>> for f in ds.batched(2):
... print(f"len={len(f)} {f.metadata(('param', 'level'))}")
...
len=2 [('t', 1000), ('u', 1000)]
len=2 [('v', 1000), ('t', 850)]
len=2 [('u', 850), ('v', 850)]
Iteration with .group_by()
When an earthkit-data data source or dataset provides a FieldList or message list, we can iterate through it in groups defined by metadata keys using group_by(). This method also works for streams.
In the the following example we read a GRIB file from disk and iterate through it in groups defined by the “level” metadata key. Each iteration step yields a FieldList containing fields with the same “level” value.
>>> import earthkit.data as ekd
>>> ds = ekd.from_source("file", "docs/examples/test6.grib")
>>> for f in ds.group_by("level"):
... print(f"len={len(f)} {f.metadata(('param', 'level'))}")
...
len=3 [('t', 1000), ('u', 1000), ('v', 1000)]
len=3 [('t', 850), ('u', 850), ('v', 850)]
Selection with [...]
When an earthkit-data data source or dataset provides a FieldList or message list, a subset of it can be created using the standard python list interface relying on brackets and slices. Slicing also works by providing a list or ndarray of indices.
>>> import earthkit.data as ekd
>>> ds = ekd.from_source("file", "docs/examples/test6.grib")
>>> len(ds)
6
>>> ds[0]
GribField(t,1000,20180801,1200,0,0)
>>> for f in ds[0:3]:
... print(f)
GribField(t,1000,20180801,1200,0,0)
GribField(u,1000,20180801,1200,0,0)
GribField(v,1000,20180801,1200,0,0)
>>> for f in ds[0:4:2]:
... print(f)
GribField(t,1000,20180801,1200,0,0)
GribField(v,1000,20180801,1200,0,0)
>>> ds[-1]
GribField(v,850,20180801,1200,0,0)
>>> for f in ds[-2:]:
... print(f)
GribField(u,850,20180801,1200,0,0)
GribField(v,850,20180801,1200,0,0)
>>> for f in ds[[1, 3]]:
... print(f)
...
GribField(u,1000,20180801,1200,0,0)
GribField(t,850,20180801,1200,0,0)
>>> for f in ds[np.array([1, 3])]:
... print(f)
...
GribField(u,1000,20180801,1200,0,0)
GribField(t,850,20180801,1200,0,0)
Selection with .sel()
When an earthkit-data data source or dataset provides a FieldList or message list, the method .sel() allows filtering this list and we can select a subset of the list. .sel() returns a view to original data, so no data is copied. The selection offers the same functionality as the original data object, so methods like .to_numpy(), .to_xarray(), etc. are all available.
For more details see: sel().
The following example demonstrates the usage of .sel(). The input data contains temperature and wind fields on various pressure levels.
>>> import earthkit.data as ekd
>>> ds = ekd.from_source("file", "docs/examples/tuv_pl.grib")
>>> len(ds)
18
>>> subset = ds.sel(param="t")
>>> len(subset)
6
>>> for f in subset:
... print(f)
...
GribField(t,1000,20180801,1200,0,0)
GribField(t,850,20180801,1200,0,0)
GribField(t,700,20180801,1200,0,0)
GribField(t,500,20180801,1200,0,0)
GribField(t,400,20180801,1200,0,0)
GribField(t,300,20180801,1200,0,0)
>>> subset = ds.sel(param=["u", "v"], level=slice(400, 700))
>>> len(subset)
6
>>> for f in subset:
... print(f)
...
GribField(u,700,20180801,1200,0,0)
GribField(v,700,20180801,1200,0,0)
GribField(u,500,20180801,1200,0,0)
GribField(v,500,20180801,1200,0,0)
GribField(u,400,20180801,1200,0,0)
GribField(v,400,20180801,1200,0,0)
Selection with .isel()
When an earthkit-data data source or dataset provides a FieldList, the method .isel() allows filtering this list and we can select a subset of the list. .isel() returns a view to the original data, so no data is copied. The selection offers the same functionality as the original data object, so methods like .to_numpy(), .to_xarray() , etc. are all available.
.isel() works similarly to sel but conditions are specified by indices of metadata keys. A metadata index stores the unique, sorted values of the corresponding metadata key from all the fields in the input data.
For more details see: isel()
The following example demonstrates the usage of .isel(). The input data contains temperature and wind fields on various pressure levels.
>>> import earthkit.data as ekd
>>> ds = ekd.from_source("file", "docs/examples/tuv_pl.grib")
>>> len(ds)
18
>>> ds.indices
{'levelist': (1000, 850, 700, 500, 400, 300), 'param': ('t', 'u', 'v')}
>>> subset = ds.isel(param=0)
>>> len(ds)
6
>>> for f in subset:
... print(f)
...
GribField(t,1000,20180801,1200,0,0)
GribField(t,850,20180801,1200,0,0)
GribField(t,700,20180801,1200,0,0)
GribField(t,500,20180801,1200,0,0)
GribField(t,400,20180801,1200,0,0)
GribField(t,300,20180801,1200,0,0)
>>> subset = ds.isel(param=[1, 2], level=slice(2, 4))
>>> len(subset)
4
>>> for f in subset:
... print(f)
...
GribField(u,700,20180801,1200,0,0)
GribField(v,700,20180801,1200,0,0)
GribField(u,500,20180801,1200,0,0)
GribField(v,500,20180801,1200,0,0)
Ordering with .order_by()
When an earthkit-data data source or dataset provides a FieldList or message list, the method .order_by() allows sorting this list.
.order_by() returns a “view” so no new data is generated on disk or in memory. The resulting object offers the same functionality as the original data object, so methods like .to_numpy(), .to_xarray(), etc. are all available.
For more details see: order_by()
>>> import earthkit.data as ekd
>>> ds = ekd.from_source("file", "docs/examples/test6.grib")
>>> len(ds)
6
>>> for f in ds.order_by("param"):
... print(f)
...
GribField(t,850,20180801,1200,0,0)
GribField(t,1000,20180801,1200,0,0)
GribField(u,850,20180801,1200,0,0)
GribField(u,1000,20180801,1200,0,0)
GribField(v,850,20180801,1200,0,0)
GribField(v,1000,20180801,1200,0,0)
>>> for f in ds.order_by(["level", "param"]):
... print(f)
...
GribField(t,850,20180801,1200,0,0)
GribField(u,850,20180801,1200,0,0)
GribField(v,850,20180801,1200,0,0)
GribField(t,1000,20180801,1200,0,0)
GribField(u,1000,20180801,1200,0,0)
GribField(v,1000,20180801,1200,0,0)
>>> for f in ds.order_by(param=["u", "t", "v"]):
... print(f)
...
GribField(u,850,20180801,1200,0,0)
GribField(u,1000,20180801,1200,0,0)
GribField(t,850,20180801,1200,0,0)
GribField(t,1000,20180801,1200,0,0)
GribField(v,850,20180801,1200,0,0)
GribField(v,1000,20180801,1200,0,0)
Accessing data values
We can extract the values from data objects as an ndarray using the .to_numpy() method or the .values property.
When an earthkit-data source provides a FieldList, these methods can be called both on the whole object and on the individual fields, too.
While .to_numpy(), by default, preserves the shape of the fields, .values always returns a flat array per field. By using flatten=True, we can force .to_numpy() to return a flat ndarray per field.
For more details see: to_numpy().
In the following example the input GRIB data contains 6 fields each defined on a latitude-longitude grid with a shape of (7, 12).
>>> import earthkit.data as ekd
>>> ds = ekd.from_source("file", "docs/examples/test6.grib")
>>> ds.to_numpy().shape
(6, 7, 12)
>>> ds.to_numpy(flatten=True).shape
(6, 84)
>>> ds.values.shape
(6, 84)
>>> for f in ds:
... f.values.shape
...
(84,)
(84,)
(84,)
(84,)
(84,)
(84,)
>>> for f in ds:
... f.to_numpy().shape
...
(7, 12)
(7, 12)
(7, 12)
(7, 12)
(7, 12)
(7, 12)
Accessing metadata
We can extract metadata from data objects using the .metadata() method.
When an earthkit-data source provides a FieldList or message list, this method can be called both on the whole object and on the individual fields, too.
For more details see: FieldList.metadata() and
Field.metadata()
Inspecting contents
On certain data objects (currently only GRIB and BUFR) we can call .ls(), .head() or .tail().
For more details see: ls().