earthkit.data.readers.pandas.featurelist¶

Classes¶

PandasList

Base class for all sources.

Module Contents¶

class earthkit.data.readers.pandas.featurelist.PandasList(df)¶

Bases: earthkit.data.featurelist.simple.IndexFeatureListBase

Base class for all sources.

batched(n)¶

Iterate through the object in batches of n.

Parameters:: n (int) – Batch size.
Returns:: Returns an iterator yielding batches of n elements. Each batch is a new object containing a view to the data in the original object, so no data is copied. The last batch may contain fewer than n elements.
Return type:: object

describe(*args, **kwargs)¶: Generate a summary of the fieldlist.

get(keys, default=None, astype=None, raise_on_missing=False, output='auto', group_by_key=False, flatten_dict=False, remapping=None, patch=None)¶

Return values for the specified keys from all the fields.

Parameters:

keys (str, list, tuple) – Specify the field metadata keys to extract. Can be a single key (str) or multiple keys as a list/tuple of str. Keys are assumed to be of the form “component.key”. For example, “time.valid_datetime” or “parameter.name”. It is also allowed to specify just the component name like “time” or “parameter”. In this case the corresponding component’s to_dict() method is called and its result is returned. For other keys, the method looks for them in the private components of the fields (if any) and returns the value from the first private component that contains it.
default (Any, None) – Specify the default value(s) for keys. Returned when the given key is not found and raise_on_missing is False. When default is a single value, it is used for all the keys. Otherwise it must be a list/tuple of the same length as keys.
astype (type as str, int or float) – Return type for keys. When astype is a single type, it is used for all the keys. Otherwise it must be a list/tuple of the same length as keys.
raise_on_missing (bool) – When True, raises KeyError if any of keys is not found.
output (type, str) –
Specify the output structure type in conjunction with group_by_key. When group_by is False (default) the output is a list with one item per field and output has the following effect on the items:
- ”auto” (default):
  - when keys is a str returns a single value per field
  - when keys is a list/tuple returns a list/tuple of values per field
- list or “list”: returns a list of values per field.
- tuple or “tuple”: returns a tuple of values per field.
- dict or “dict”: returns a dictionary with keys and their values per field.
When group_by_key is True the output is grouped by key as follows and return an object with one item per key. The item contains the list of values for that key from all the fields. When output is dict a dict is returned otherwise list.
group_by_key (bool) – When True the output is grouped by key as described in output.
flatten_dict (bool) – When True and output is dict, for each field if any of the values in the returned dict is itself a dict, it is flattened to depth 1 by concatenating the keys with a dot. For example, if the returned dict is {"a": {"x": 1, "y": 2}, "b": 3}, it becomes {"a.x": 1, "a.y": 2, "b": 3}. This option is ignored when output is not dict.
remapping (dict, optional) –
Create new metadata keys from existing ones. E.g. to define a new key “param_level” as the concatenated value of the “parameter.variable” and “vertical.level” keys use:
```
remapping={"param_level": "{parameter.variable}{vertical.level}"}
```
patch (dict, optional) – A dictionary of patch to be applied to the returned values.

Returns:

The returned value depends on the output and group_by_key parameters. See above.

Return type:

list, dict

Raises:

KeyError – If raise_on_missing is True and any of keys is not found.

Examples

>>> import earthkit.data
>>> ds = earthkit.data.from_source("file", "docs/tutorials/test.grib")
>>> ds.get("parameter.variable")
['2t', 'msl']
>>> ds.get(["parameter.variable", "parameter.units"])
[('2t', 'K'), ('msl', 'Pa')]
>>> ds.get(("parameter.variable", "parameter.units"))
[['2t', 'K'], ['msl', 'Pa']]

graph(depth=0)¶

group_by(*keys, sort=True)¶

Iterate through the object in groups defined by metadata keys.

Parameters:

*keys (tuple) – Positional arguments specifying the metadata keys to group by. Keys can be a single or multiple str, or a list or tuple of str.
sort (bool, optional) – If True (default), the object is sorted by the metadata keys before grouping. Sorting is only applied if the object is supporting the sorting operation.

Returns:

Returns an iterator yielding batches of elements grouped by the metadata keys. Each batch is a new object containing a view to the data in the original object, so no data is copied. It generates a new group every time the value of the keys change.

Return type:

object

head(n=5, **kwargs)¶

Generate a list like summary of the first n Fields. Same as calling ls with n.

Parameters:

n (int, None) – The number of messages (n > 0) to be printed from the front.
**kwargs (dict, optional) – Other keyword arguments passed to ls.

Returns:

See ls.

Return type:

Pandas DataFrame

See also

head, tail

classmethod merge(sources)¶

metadata(keys, **kwargs)¶

Return the metadata values for each field.

Parameters:

*args (tuple) – Positional arguments defining the metadata keys. Passed to Field.metadata()
**kwargs (dict, optional) – Keyword arguments passed to Field.metadata()

Returns:

List with one item per Field

Return type:

list

Examples

>>> import earthkit.data
>>> ds = earthkit.data.from_source("file", "docs/tutorials/test.grib")
>>> ds.metadata("param")
['2t', 'msl']
>>> ds.metadata("param", "units")
[('2t', 'K'), ('msl', 'Pa')]
>>> ds.metadata(["param", "units"])
[['2t', 'K'], ['msl', 'Pa']]

mutate()¶

mutate_source()¶

name = None¶

classmethod new_mask_index(*args, **kwargs)¶

order_by(*args, remapping=None, patch=None, **kwargs)¶

Change the order of the elements in a fieldlist.

Parameters:

*args (tuple) – Positional arguments specifying the metadata keys to perform the ordering on. (See below for details)
remapping (dict) –
Defines new metadata keys from existing ones that we can refer to in *args and **kwargs. E.g. to define a new key “param_level” as the concatenated value of the “param” and “level” keys use:
```
remapping={"param_level": "{param}{level}"}
```
See below for a more elaborate example.
**kwargs (dict, optional) – Other keyword arguments specifying the metadata keys to perform the ordering on. (See below for details)

Returns:

Returns a new object with reordered elements. It contains a view to the data in the original object, so no data is copied.

Return type:

object

Examples

Ordering by a single metadata key (“param”). The default ordering direction is ascending:

>>> import earthkit.data as ekd
>>> ds = ekd.from_source("sample", "test6.grib").to_fieldlist()
>>> for f in ds.order_by("parameter.variable"):
...     print(f)
...
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)
Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)
Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)

Ordering by multiple keys (first by “vertical.level” then by “parameter.variable”):

>>> for f in ds.order_by(["vertical.level", "parameter.variable"]):
...     print(f)
...
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)
Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)
Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)

Specifying the ordering direction:

>>> for f in ds.order_by(**{"parameter.variable": "ascending", "vertical.level": "descending"}):
...     print(f)
...
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)
Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)
Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)

Using the list of all the values of a key (“parameter.variable”) to define the order:

>>> for f in ds.order_by(**{"parameter.variable": ["u", "t", "v"]}):
...     print(f)
...
Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)
Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)
Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)

Using remapping to specify the order by a key created from two other keys (we created key “param_level” from “param” and “levelist”):

>>> ordering = ["t850", "t1000", "u1000", "v850", "v1000", "u850"]
>>> remapping = {"param_level": "{parameter.variable}{vertical.level}"}
>>> for f in ds.order_by(param_level=ordering, remapping=remapping):
...     print(f)
...
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)
Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)
Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)
Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)

property parent¶: The parent source, if any.

sel(*args, remapping=None, **kwargs)¶

Uses metadata values to select a subset of the elements from a fieldlist object.

Parameters:

*args (tuple) – Positional arguments specifying the filter condition as dict. (See below for details).
remapping (dict) –
Creates new metadata keys from existing ones that we can refer to in *args and **kwargs. E.g. to define a new key “param_level” as the concatenated value of the “param” and “level” keys use:
```
remapping={"param_level": "{param}{level}"}
```
See below for a more elaborate example.
**kwargs (dict, optional) – Other keyword arguments specifying the filter conditions. (See below for details).

Returns:

Returns a new object with the filtered elements. It contains a view to the data in the original object, so no data is copied.

Return type:

object

Notes

Filter conditions are specified by a set of metadata keys either by a dictionary (in *args) or a set of **kwargs. Both single or multiple keys are allowed to use and each can specify the following type of filter values:

single value:
```
ds.sel({"parameter.variable": "t"})
```

list of values:

ds.sel({"parameter.variable": ["u", "v"]})

slice of values (defines a closed interval, so treated as inclusive of both the start

and stop values, unlike normal Python indexing):

# filter levels between 300 and 500 inclusively
ds.sel({"vertical.level": slice(300, 500)})

Examples

>>> import earthkit.data
>>> fl = earthkit.data.from_source("sample", "tuv_pl.grib").to_fieldlist()
>>> len(fl)
18

Selecting by a single key (“parameter.variable”) with a single value:

>>> fl1 = fl.sel({"parameter.variable": "t"})
>>> for f in fl1:
...     print(f)
...
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 700, pressure, 0, regular_ll)
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 500, pressure, 0, regular_ll)
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 400, pressure, 0, regular_ll)
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 300, pressure, 0, regular_ll)

Selecting by multiple keys (“parameter.variable”, “vertical.level”) with a list and slice of values:

>>> fl1 = fl.sel({"parameter.variable": ["u", "v"], "vertical.level": slice(400, 700)})
>>> for f in fl1:
...     print(f)
...
Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 700, pressure, 0, regular_ll)
Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 700, pressure, 0, regular_ll)
Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 500, pressure, 0, regular_ll)
Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 500, pressure, 0, regular_ll)
Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 400, pressure, 0, regular_ll)
Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 400, pressure, 0, regular_ll)

Using remapping to specify the selection by a key created from two other keys (we created key “param_level” from “parameter.variable” and “vertical.level”):

>>> fl1 = fl.sel(
...    {"param_level": ["t850", "u1000"],
...    "remapping": {"param_level": "{parameter.variable}{vertical.level}"}})
... )
>>> for f in fl1:
...     print(f)
...
Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)

source_filename = None¶

tail(n=5, **kwargs)¶

Generate a list like summary of the last n Fields. Same as calling ls with -n.

Parameters:

n (int, None) – The number of messages (n > 0) to be printed from the back.
**kwargs (dict, optional) – Other keyword arguments passed to ls.

Returns:

See ls.

Return type:

Pandas DataFrame