earthkit.data.readers.shapefile.file

Classes

ShapeFileList

Base class for all sources.

Module Contents

class earthkit.data.readers.shapefile.file.ShapeFileList(path)

Bases: earthkit.data.featurelist.simple.IndexFeatureListBase, earthkit.data.readers.shapefile.core.ShapefileReaderBase

Base class for all sources.

property appendable
batched(n)

Iterate through the object in batches of n.

Parameters:

n (int) – Batch size.

Returns:

Returns an iterator yielding batches of n elements. Each batch is a new object containing a view to the data in the original object, so no data is copied. The last batch may contain fewer than n elements.

Return type:

object

property binary
bounding_box()
describe(*args, **kwargs)

Generate a summary of the fieldlist.

property filter
get(keys, default=None, astype=None, raise_on_missing=False, output='auto', group_by_key=False, flatten_dict=False, remapping=None, patch=None)

Return values for the specified keys from all the fields.

Parameters:
  • keys (str, list, tuple) – Specify the field metadata keys to extract. Can be a single key (str) or multiple keys as a list/tuple of str. Keys are assumed to be of the form “component.key”. For example, “time.valid_datetime” or “parameter.name”. It is also allowed to specify just the component name like “time” or “parameter”. In this case the corresponding component’s to_dict() method is called and its result is returned. For other keys, the method looks for them in the private components of the fields (if any) and returns the value from the first private component that contains it.

  • default (Any, None) – Specify the default value(s) for keys. Returned when the given key is not found and raise_on_missing is False. When default is a single value, it is used for all the keys. Otherwise it must be a list/tuple of the same length as keys.

  • astype (type as str, int or float) – Return type for keys. When astype is a single type, it is used for all the keys. Otherwise it must be a list/tuple of the same length as keys.

  • raise_on_missing (bool) – When True, raises KeyError if any of keys is not found.

  • output (type, str) –

    Specify the output structure type in conjunction with group_by_key. When group_by is False (default) the output is a list with one item per field and output has the following effect on the items:

    • ”auto” (default):
      • when keys is a str returns a single value per field

      • when keys is a list/tuple returns a list/tuple of values per field

    • list or “list”: returns a list of values per field.

    • tuple or “tuple”: returns a tuple of values per field.

    • dict or “dict”: returns a dictionary with keys and their values per field.

    When group_by_key is True the output is grouped by key as follows and return an object with one item per key. The item contains the list of values for that key from all the fields. When output is dict a dict is returned otherwise list.

  • group_by_key (bool) – When True the output is grouped by key as described in output.

  • flatten_dict (bool) – When True and output is dict, for each field if any of the values in the returned dict is itself a dict, it is flattened to depth 1 by concatenating the keys with a dot. For example, if the returned dict is {"a": {"x": 1, "y": 2}, "b": 3}, it becomes {"a.x": 1, "a.y": 2, "b": 3}. This option is ignored when output is not dict.

  • remapping (dict, optional) –

    Create new metadata keys from existing ones. E.g. to define a new key “param_level” as the concatenated value of the “parameter.variable” and “vertical.level” keys use:

    remapping={"param_level": "{parameter.variable}{vertical.level}"}
    

  • patch (dict, optional) – A dictionary of patch to be applied to the returned values.

Returns:

The returned value depends on the output and group_by_key parameters. See above.

Return type:

list, dict

Raises:

KeyError – If raise_on_missing is True and any of keys is not found.

Examples

>>> import earthkit.data
>>> ds = earthkit.data.from_source("file", "docs/how-tos/test.grib")
>>> ds.get("parameter.variable")
['2t', 'msl']
>>> ds.get(["parameter.variable", "parameter.units"])
[('2t', 'K'), ('msl', 'Pa')]
>>> ds.get(("parameter.variable", "parameter.units"))
[['2t', 'K'], ['msl', 'Pa']]
graph(depth=0)
group_by(*keys, sort=True)

Iterate through the object in groups defined by metadata keys.

Parameters:
  • *keys (tuple) – Positional arguments specifying the metadata keys to group by. Keys can be a single or multiple str, or a list or tuple of str.

  • sort (bool, optional) – If True (default), the object is sorted by the metadata keys before grouping. Sorting is only applied if the object is supporting the sorting operation.

Returns:

Returns an iterator yielding batches of elements grouped by the metadata keys. Each batch is a new object containing a view to the data in the original object, so no data is copied. It generates a new group every time the value of the keys change.

Return type:

object

head(n=5, **kwargs)

Generate a list like summary of the first n Fields. Same as calling ls with n.

Parameters:
  • n (int, None) – The number of messages (n > 0) to be printed from the front.

  • **kwargs (dict, optional) – Other keyword arguments passed to ls.

Returns:

See ls.

Return type:

Pandas DataFrame

See also

ls, tail

Notes

The following calls are equivalent:

ds.head()
ds.head(5)
ds.head(n=5)
ds.ls(5)
ds.ls(n=5)
ignore()

Indicates to ignore this source in concatenation/merging.

Return type:

bool

ls(**kwargs)

Generate a list like summary using a set of metadata keys.

Parameters:
  • n (int, None) – The number of Fields to be listed. None means all the fields, n > 0 means fields from the front, while n < 0 means fields from the back of the fieldlist.

  • keys (list of str, dict, None) – Metadata keys used when namespace is None. If keys is None the following default set of keys will be used: “centre”, “shortName”, “typeOfLevel”, “level”, “dataDate”, “dataTime”, “stepRange”, “dataType”, “number”, “gridType”. To specify a column title for each key in the output use a dict.

  • extra_keys (list of str, dict, None) – List of additional keys. To specify a column title for each key in the output use a dict.

  • namespace (str, list of str, None) – The namespace(s) to choose the keys from. When it is set keys are omitted.

Returns:

DataFrame with one row per Field.

Return type:

Pandas DataFrame

See also

head, tail

classmethod merge(sources)
property merger
metadata(keys, **kwargs)

Return the metadata values for each field.

Parameters:
  • *args (tuple) – Positional arguments defining the metadata keys. Passed to Field.metadata()

  • **kwargs (dict, optional) – Keyword arguments passed to Field.metadata()

Returns:

List with one item per Field

Return type:

list

Examples

>>> import earthkit.data
>>> ds = earthkit.data.from_source("file", "docs/how-tos/test.grib")
>>> ds.metadata("param")
['2t', 'msl']
>>> ds.metadata("param", "units")
[('2t', 'K'), ('msl', 'Pa')]
>>> ds.metadata(["param", "units"])
[['2t', 'K'], ['msl', 'Pa']]
mutate()
mutate_source()
name = None
classmethod new_mask_index(*args, **kwargs)
order_by(*args, remapping=None, patch=None, **kwargs)

Change the order of the elements in a fieldlist.

Parameters:
  • *args (tuple) – Positional arguments specifying the metadata keys to perform the ordering on. (See below for details)

  • remapping (dict) –

    Defines new metadata keys from existing ones that we can refer to in *args and **kwargs. E.g. to define a new key “param_level” as the concatenated value of the “param” and “level” keys use:

    remapping={"param_level": "{param}{level}"}
    

    See below for a more elaborate example.

  • **kwargs (dict, optional) – Other keyword arguments specifying the metadata keys to perform the ordering on. (See below for details)

Returns:

Returns a new object with reordered elements. It contains a view to the data in the original object, so no data is copied.

Return type:

object

Examples

Ordering by a single metadata key (“param”). The default ordering direction is ascending:

>>> import earthkit.data as ekd
>>> ds = ekd.from_source("sample", "test6.grib").to_fieldlist()
>>> for f in ds.order_by("parameter.variable"):
...     print(f)
...
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)
Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)
Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)

Ordering by multiple keys (first by “vertical.level” then by “parameter.variable”):

>>> for f in ds.order_by(["vertical.level", "parameter.variable"]):
...     print(f)
...
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)
Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)
Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)

Specifying the ordering direction:

>>> for f in ds.order_by(**{"parameter.variable": "ascending", "vertical.level": "descending"}):
...     print(f)
...
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)
Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)
Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)

Using the list of all the values of a key (“parameter.variable”) to define the order:

>>> for f in ds.order_by(**{"parameter.variable": ["u", "t", "v"]}):
...     print(f)
...
Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)
Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)
Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)

Using remapping to specify the order by a key created from two other keys (we created key “param_level” from “param” and “levelist”):

>>> ordering = ["t850", "t1000", "u1000", "v850", "v1000", "u850"]
>>> remapping = {"param_level": "{parameter.variable}{vertical.level}"}
>>> for f in ds.order_by(param_level=ordering, remapping=remapping):
...     print(f)
...
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)
Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)
Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)
Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
property parent

The parent source, if any.

property parts
path
sel(*args, remapping=None, **kwargs)

Uses metadata values to select a subset of the elements from a fieldlist object.

Parameters:
  • *args (tuple) – Positional arguments specifying the filter condition as dict. (See below for details).

  • remapping (dict) –

    Creates new metadata keys from existing ones that we can refer to in *args and **kwargs. E.g. to define a new key “param_level” as the concatenated value of the “param” and “level” keys use:

    remapping={"param_level": "{param}{level}"}
    

    See below for a more elaborate example.

  • **kwargs (dict, optional) – Other keyword arguments specifying the filter conditions. (See below for details).

Returns:

Returns a new object with the filtered elements. It contains a view to the data in the original object, so no data is copied.

Return type:

object

Notes

Filter conditions are specified by a set of metadata keys either by a dictionary (in *args) or a set of **kwargs. Both single or multiple keys are allowed to use and each can specify the following type of filter values:

  • single value:

    ds.sel({"parameter.variable": "t"})
    
  • list of values:

    ds.sel({"parameter.variable": ["u", "v"]})
    
  • slice of values (defines a closed interval, so treated as inclusive of both the start

and stop values, unlike normal Python indexing):

# filter levels between 300 and 500 inclusively
ds.sel({"vertical.level": slice(300, 500)})

Examples

>>> import earthkit.data
>>> fl = earthkit.data.from_source("sample", "tuv_pl.grib").to_fieldlist()
>>> len(fl)
18

Selecting by a single key (“parameter.variable”) with a single value:

>>> fl1 = fl.sel({"parameter.variable": "t"})
>>> for f in fl1:
...     print(f)
...
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 700, pressure, 0, regular_ll)
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 500, pressure, 0, regular_ll)
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 400, pressure, 0, regular_ll)
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 300, pressure, 0, regular_ll)

Selecting by multiple keys (“parameter.variable”, “vertical.level”) with a list and slice of values:

>>> fl1 = fl.sel({"parameter.variable": ["u", "v"], "vertical.level": slice(400, 700)})
>>> for f in fl1:
...     print(f)
...
Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 700, pressure, 0, regular_ll)
Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 700, pressure, 0, regular_ll)
Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 500, pressure, 0, regular_ll)
Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 500, pressure, 0, regular_ll)
Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 400, pressure, 0, regular_ll)
Field(v, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 400, pressure, 0, regular_ll)

Using remapping to specify the selection by a key created from two other keys (we created key “param_level” from “parameter.variable” and “vertical.level”):

>>> fl1 = fl.sel(
...    {"param_level": ["t850", "u1000"],
...    "remapping": {"param_level": "{parameter.variable}{vertical.level}"}})
... )
>>> for f in fl1:
...     print(f)
...
Field(u, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 1000, pressure, 0, regular_ll)
Field(t, 2018-08-01 12:00:00, 2018-08-01 12:00:00, 0:00:00, 850, pressure, 0, regular_ll)
property source
source_filename = None
property stream
tail(n=5, **kwargs)

Generate a list like summary of the last n Fields. Same as calling ls with -n.

Parameters:
  • n (int, None) – The number of messages (n > 0) to be printed from the back.

  • **kwargs (dict, optional) – Other keyword arguments passed to ls.

Returns:

See ls.

Return type:

Pandas DataFrame

See also

head, ls

Notes

The following calls are equivalent:

ds.tail()
ds.tail(5)
ds.tail(n=5)
ds.ls(-5)
ds.ls(n=-5)
to_data_object()

Convert this source into a data object, if possible.

to_geopandas(**kwargs)
to_numpy(flatten=False, **kwargs)
to_pandas(**kwargs)
classmethod to_pandas_from_multi_paths(paths, **kwargs)
to_target(target, *args, **kwargs)
to_xarray(**kwargs)
unique(*args, sort=False, drop_none=True, squeeze=False, unwrap_single=False, remapping=None, patch=None, progress_bar=False, cache=True)

Given a list of metadata attributes, such as date, param, levels, returns the list of unique values for each attributes.

Parameters:
  • *args (tuple) – Positional arguments specifying the metadata keys to collect unique values for.

  • sort (bool, optional) – Whether to sort the collected unique values. Default is False.

  • drop_none (bool, optional) – Whether to drop None values from the collected unique values. Default is True.

  • squeeze (bool, optional) – Whether to return a single value instead of a list if there is only one unique value for a key. Default is False.

  • remapping (dict, optional) – A dictionary for remapping keys or values during collection. Default is None.

  • patch (dict, optional) – A dictionary for patching key values during collection. Default is None.

  • progress_bar (bool, optional) – Whether to display a progress bar during collection. Default is False.

  • cache (bool, optional) – Whether to use a cached collector. Default is False.