earthkit.data.readers.pandas.featurelist¶
Classes¶
Base class for all sources. |
Module Contents¶
- class earthkit.data.readers.pandas.featurelist.PandasList(df)¶
Bases:
earthkit.data.featurelist.simple.IndexFeatureListBaseBase class for all sources.
- batched(n)¶
Iterate through the object in batches of
n.- Parameters:
n (
int) – Batch size.- Returns:
Returns an iterator yielding batches of
nelements. Each batch is a new object containing a view to the data in the original object, so no data is copied. The last batch may contain fewer thannelements.- Return type:
object
- describe(*args, **kwargs)¶
Generate a summary of the fieldlist.
- get(keys, default=None, astype=None, raise_on_missing=False, output='auto', group_by_key=False, flatten_dict=False, remapping=None, patch=None)¶
Return values for the specified keys from all the fields.
- Parameters:
keys (
str,list,tuple) – Specify the field metadata keys to extract. Can be a single key (str) or multiple keys as a list/tuple of str. Keys are assumed to be of the form “component.key”. For example, “time.valid_datetime” or “parameter.name”. It is also allowed to specify just the component name like “time” or “parameter”. In this case the corresponding component’sto_dict()method is called and its result is returned. For other keys, the method looks for them in the private components of the fields (if any) and returns the value from the first private component that contains it.default (
Any,None) – Specify the default value(s) forkeys. Returned when the given key is not found andraise_on_missingis False. Whendefaultis a single value, it is used for all the keys. Otherwise it must be a list/tuple of the same length askeys.astype (
type as str,intorfloat) – Return type forkeys. Whenastypeis a single type, it is used for all the keys. Otherwise it must be a list/tuple of the same length askeys.raise_on_missing (
bool) – When True, raises KeyError if any ofkeysis not found.output (
type,str) –Specify the output structure type in conjunction with
group_by_key. Whengroup_byis False (default) the output is a list with one item per field andoutputhas the following effect on the items:- ”auto” (default):
when
keysis a str returns a single value per fieldwhen
keysis a list/tuple returns a list/tuple of values per field
list or “list”: returns a list of values per field.
tuple or “tuple”: returns a tuple of values per field.
dict or “dict”: returns a dictionary with keys and their values per field.
When
group_by_keyis True the output is grouped by key as follows and return an object with one item per key. The item contains the list of values for that key from all the fields. Whenoutputis dict a dict is returned otherwise list.group_by_key (
bool) – When True the output is grouped by key as described inoutput.flatten_dict (
bool) – When True andoutputis dict, for each field if any of the values in the returned dict is itself a dict, it is flattened to depth 1 by concatenating the keys with a dot. For example, if the returned dict is{"a": {"x": 1, "y": 2}, "b": 3}, it becomes{"a.x": 1, "a.y": 2, "b": 3}. This option is ignored whenoutputis not dict.remapping (
dict, optional) –Create new metadata keys from existing ones. E.g. to define a new key “param_level” as the concatenated value of the “parameter.variable” and “vertical.level” keys use:
remapping={"param_level": "{parameter.variable}{vertical.level}"}
patch (
dict, optional) – A dictionary of patch to be applied to the returned values.
- Returns:
The returned value depends on the
outputandgroup_by_keyparameters. See above.- Return type:
list,dict- Raises:
KeyError – If
raise_on_missingis True and any ofkeysis not found.
Examples
>>> import earthkit.data >>> ds = earthkit.data.from_source("file", "docs/how-tos/test.grib") >>> ds.get("parameter.variable") ['2t', 'msl'] >>> ds.get(["parameter.variable", "parameter.units"]) [('2t', 'K'), ('msl', 'Pa')] >>> ds.get(("parameter.variable", "parameter.units")) [['2t', 'K'], ['msl', 'Pa']]
- graph(depth=0)¶
- group_by(*keys, sort=True)¶
Iterate through the object in groups defined by metadata keys.
- Parameters:
*keys (
tuple) – Positional arguments specifying the metadata keys to group by. Keys can be a single or multiple str, or a list or tuple of str.sort (
bool, optional) – IfTrue(default), the object is sorted by the metadatakeysbefore grouping. Sorting is only applied if the object is supporting the sorting operation.
- Returns:
Returns an iterator yielding batches of elements grouped by the metadata
keys. Each batch is a new object containing a view to the data in the original object, so no data is copied. It generates a new group every time the value of thekeyschange.- Return type:
object
- head(n=5, **kwargs)¶
Generate a list like summary of the first
nFields. Same as callinglswithn.- Parameters:
n (
int,None) – The number of messages (n> 0) to be printed from the front.**kwargs (
dict, optional) – Other keyword arguments passed tols.
- Returns:
See
ls.- Return type:
Pandas DataFrame
Notes
The following calls are equivalent:
ds.head() ds.head(5) ds.head(n=5) ds.ls(5) ds.ls(n=5)
- ignore()¶
Indicates to ignore this source in concatenation/merging.
- Return type:
bool
- ls(**kwargs)¶
Generate a list like summary using a set of metadata keys.
- Parameters:
n (
int,None) – The number ofFields to be listed. None means all the fields,n > 0means fields from the front, whilen < 0means fields from the back of the fieldlist.keys (
listofstr,dict,None) – Metadata keys used whennamespaceis None. Ifkeysis None the following default set of keys will be used: “centre”, “shortName”, “typeOfLevel”, “level”, “dataDate”, “dataTime”, “stepRange”, “dataType”, “number”, “gridType”. To specify a column title for each key in the output use a dict.extra_keys (
listofstr,dict,None) – List of additional keys. To specify a column title for each key in the output use a dict.namespace (
str,listofstr,None) – The namespace(s) to choose thekeysfrom. When it is setkeysare omitted.
- Returns:
DataFrame with one row per
Field.- Return type:
Pandas DataFrame
- classmethod merge(sources)¶
- metadata(keys, **kwargs)¶
Return the metadata values for each field.
- Parameters:
*args (
tuple) – Positional arguments defining the metadata keys. Passed toField.metadata()**kwargs (
dict, optional) – Keyword arguments passed toField.metadata()
- Returns:
List with one item per
Field- Return type:
list
Examples
>>> import earthkit.data >>> ds = earthkit.data.from_source("file", "docs/how-tos/test.grib") >>> ds.metadata("param") ['2t', 'msl'] >>> ds.metadata("param", "units") [('2t', 'K'), ('msl', 'Pa')] >>> ds.metadata(["param", "units"]) [['2t', 'K'], ['msl', 'Pa']]
- mutate()¶
- mutate_source()¶
- name = None¶
- classmethod new_mask_index(*args, **kwargs)¶
- order_by(*args, remapping=None, patch=None, **kwargs)¶
Change the order of the elements in a fieldlist.
- Parameters:
*args (
tuple) – Positional arguments specifying the metadata keys to perform the ordering on. (See below for details)remapping (
dict) –Defines new metadata keys from existing ones that we can refer to in
*argsand**kwargs. E.g. to define a new key “param_level” as the concatenated value of the “param” and “level” keys use:remapping={"param_level": "{param}{level}"}
See below for a more elaborate example.
**kwargs (
dict, optional) – Other keyword arguments specifying the metadata keys to perform the ordering on. (See below for details)
- Returns:
Returns a new object with reordered elements. It contains a view to the data in the original object, so no data is copied.
- Return type:
object
Examples
Ordering by a single metadata key (“param”). The default ordering direction is
ascending:>>> import earthkit.data as ekd >>> ds = ekd.from_source("sample", "test6.grib").to_fieldlist() >>> for f in ds.order_by("parameter.variable"): ... print(f) ... Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll) Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll) Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll) Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll) Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll) Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
Ordering by multiple keys (first by “vertical.level” then by “parameter.variable”):
>>> for f in ds.order_by(["vertical.level", "parameter.variable"]): ... print(f) ... Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll) Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll) Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll) Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll) Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll) Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)
Specifying the ordering direction:
>>> for f in ds.order_by(**{"parameter.variable": "ascending", "vertical.level": "descending"}): ... print(f) ... Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll) Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll) Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll) Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll) Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll) Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
Using the list of all the values of a key (“parameter.variable”) to define the order:
>>> for f in ds.order_by(**{"parameter.variable": ["u", "t", "v"]}): ... print(f) ... Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll) Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll) Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll) Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll) Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll) Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
Using
remappingto specify the order by a key created from two other keys (we created key “param_level” from “param” and “levelist”):>>> ordering = ["t850", "t1000", "u1000", "v850", "v1000", "u850"] >>> remapping = {"param_level": "{parameter.variable}{vertical.level}"} >>> for f in ds.order_by(param_level=ordering, remapping=remapping): ... print(f) ... Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll) Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll) Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll) Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll) Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll) Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
- property parent¶
The parent source, if any.
- sel(*args, remapping=None, **kwargs)¶
Uses metadata values to select a subset of the elements from a fieldlist object.
- Parameters:
*args (
tuple) – Positional arguments specifying the filter condition as dict. (See below for details).remapping (
dict) –Creates new metadata keys from existing ones that we can refer to in
*argsand**kwargs. E.g. to define a new key “param_level” as the concatenated value of the “param” and “level” keys use:remapping={"param_level": "{param}{level}"}
See below for a more elaborate example.
**kwargs (
dict, optional) – Other keyword arguments specifying the filter conditions. (See below for details).
- Returns:
Returns a new object with the filtered elements. It contains a view to the data in the original object, so no data is copied.
- Return type:
object
Notes
Filter conditions are specified by a set of metadata keys either by a dictionary (in
*args) or a set of**kwargs. Both single or multiple keys are allowed to use and each can specify the following type of filter values:single value:
ds.sel({"parameter.variable": "t"})
list of values:
ds.sel({"parameter.variable": ["u", "v"]})
slice of values (defines a closed interval, so treated as inclusive of both the start
and stop values, unlike normal Python indexing):
# filter levels between 300 and 500 inclusively ds.sel({"vertical.level": slice(300, 500)})
Examples
>>> import earthkit.data >>> fl = earthkit.data.from_source("sample", "tuv_pl.grib").to_fieldlist() >>> len(fl) 18
Selecting by a single key (“parameter.variable”) with a single value:
>>> fl1 = fl.sel({"parameter.variable": "t"}) >>> for f in fl1: ... print(f) ... Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll) Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll) Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 700, pressure, 0, regular_ll) Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 500, pressure, 0, regular_ll) Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 400, pressure, 0, regular_ll) Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 300, pressure, 0, regular_ll)
Selecting by multiple keys (“parameter.variable”, “vertical.level”) with a list and slice of values:
>>> fl1 = fl.sel({"parameter.variable": ["u", "v"], "vertical.level": slice(400, 700)}) >>> for f in fl1: ... print(f) ... Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 700, pressure, 0, regular_ll) Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 700, pressure, 0, regular_ll) Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 500, pressure, 0, regular_ll) Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 500, pressure, 0, regular_ll) Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 400, pressure, 0, regular_ll) Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 400, pressure, 0, regular_ll)
Using
remappingto specify the selection by a key created from two other keys (we created key “param_level” from “parameter.variable” and “vertical.level”):>>> fl1 = fl.sel( ... {"param_level": ["t850", "u1000"], ... "remapping": {"param_level": "{parameter.variable}{vertical.level}"}}) ... ) >>> for f in fl1: ... print(f) ... Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll) Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
- source_filename = None¶
- tail(n=5, **kwargs)¶
Generate a list like summary of the last
nFields. Same as callinglswith-n.- Parameters:
n (
int,None) – The number of messages (n> 0) to be printed from the back.**kwargs (
dict, optional) – Other keyword arguments passed tols.
- Returns:
See
ls.- Return type:
Pandas DataFrame
Notes
The following calls are equivalent:
ds.tail() ds.tail(5) ds.tail(n=5) ds.ls(-5) ds.ls(n=-5)
- to_data_object()¶
Convert this source into a data object, if possible.
- to_numpy(*args, **kwargs)¶
- to_pandas(**kwargs)¶
- to_target(target, *args, **kwargs)¶
- to_xarray(**kwargs)¶
- unique(*args, sort=False, drop_none=True, squeeze=False, unwrap_single=False, remapping=None, patch=None, progress_bar=False, cache=True)¶
Given a list of metadata attributes, such as date, param, levels, returns the list of unique values for each attributes.
- Parameters:
*args (
tuple) – Positional arguments specifying the metadata keys to collect unique values for.sort (
bool, optional) – Whether to sort the collected unique values. Default is False.drop_none (
bool, optional) – Whether to drop None values from the collected unique values. Default is True.squeeze (
bool, optional) – Whether to return a single value instead of a list if there is only one unique value for a key. Default is False.remapping (
dict, optional) – A dictionary for remapping keys or values during collection. Default is None.patch (
dict, optional) – A dictionary for patching key values during collection. Default is None.progress_bar (
bool, optional) – Whether to display a progress bar during collection. Default is False.cache (
bool, optional) – Whether to use a cached collector. Default is False.