earthkit.data.readers.bufr.file

Attributes

Classes

BUFRList

Represent a list of

BUFRListInFile

Represent a list of

BUFRReader

Base class for all sources.

MaskBUFRList

Represent a list of

MultiBUFRList

Represent a list of

MultiBUFRReader

Base class for all sources.

Module Contents

class earthkit.data.readers.bufr.file.BUFRList(*args, **kwargs)

Bases: earthkit.data.featurelist.indexed.IndexFeatureListBase

Represent a list of BUFRMessages.

batched(n)

Iterate through the object in batches of n.

Parameters:

n (int) – Batch size.

Returns:

Returns an iterator yielding batches of n elements. Each batch is a new object containing a view to the data in the original object, so no data is copied. The last batch may contain fewer than n elements.

Return type:

object

describe(*args, **kwargs)

Generate a summary of the fieldlist.

get(keys, default=None, astype=None, raise_on_missing=False, output='auto', group_by_key=False, flatten_dict=False)

Return values for the specified keys from all the messages.

Parameters:
  • keys (str, list, tuple) – Specify the metadata keys to extract. Can be a single key (str) or multiple keys as a list/tuple of str. Keys are assumed to be of the form “component.key”. For example, “time.valid_datetime” or “parameter.name”. It is also allowed to specify just the component name like “time” or “parameter”. In this case the corresponding component’s to_dict() method is called and its result is returned. For other keys, the method looks for them in the private components of the fields (if any) and returns the value from the first private component that contains it.

  • default (Any, None) – Specify the default value(s) for keys. Returned when the given key is not found and raise_on_missing is False. When default is a single value, it is used for all the keys. Otherwise it must be a list/tuple of the same length as keys.

  • astype (type as str, int or float) – Return type for keys. When astype is a single type, it is used for all the keys. Otherwise it must be a list/tuple of the same length as keys.

  • raise_on_missing (bool) – When True, raises KeyError if any of keys is not found.

  • output (type, str) –

    Specify the output structure type in conjunction with group_by_key. When group_by is False (default) the output is a list with one item per field and output has the following effect on the items:

    • ”auto” (default):
      • when keys is a str returns a single value per field

      • when keys is a list/tuple returns a list/tuple of values per field

    • list or “list”: returns a list of values per field.

    • tuple or “tuple”: returns a tuple of values per field.

    • dict or “dict”: returns a dictionary with keys and their values per field.

    When group_by_key is True the output is grouped by key as follows and return an object with one item per key. The item contains the list of values for that key from all the fields. When output is dict a dict is returned otherwise list.

  • group_by_key (bool) – When True the output is grouped by key as described in output.

  • flatten_dict (bool) – When True and output is dict, for each field if any of the values in the returned dict is itself a dict, it is flattened to depth 1 by concatenating the keys with a dot. For example, if the returned dict is {"a": {"x": 1, "y": 2}, "b": 3}, it becomes {"a.x": 1, "a.y": 2, "b": 3}. This option is ignored when output is not dict.

  • remapping (dict, optional) –

    Create new metadata keys from existing ones. E.g. to define a new key “param_level” as the concatenated value of the “parameter.variable” and “vertical.level” keys use:

    remapping={"param_level": "{parameter.variable}{vertical.level}"}
    

  • patch (dict, optional) – A dictionary of patch to be applied to the returned values.

Returns:

The returned value depends on the output and group_by_key parameters. See above.

Return type:

list, dict

Raises:

KeyError – If raise_on_missing is True and any of keys is not found.

Examples

>>> import earthkit.data
>>> ds = earthkit.data.from_source("file", "docs/how-tos/test.grib")
>>> ds.get("parameter.variable")
['2t', 'msl']
>>> ds.get(["parameter.variable", "parameter.units"])
[('2t', 'K'), ('msl', 'Pa')]
>>> ds.get(("parameter.variable", "parameter.units"))
[['2t', 'K'], ['msl', 'Pa']]
graph(depth=0)
group_by(*keys, sort=True)

Iterate through the object in groups defined by metadata keys.

Parameters:
  • *keys (tuple) – Positional arguments specifying the metadata keys to group by. Keys can be a single or multiple str, or a list or tuple of str.

  • sort (bool, optional) – If True (default), the object is sorted by the metadata keys before grouping. Sorting is only applied if the object is supporting the sorting operation.

Returns:

Returns an iterator yielding batches of elements grouped by the metadata keys. Each batch is a new object containing a view to the data in the original object, so no data is copied. It generates a new group every time the value of the keys change.

Return type:

object

head(n=5, **kwargs)

Generate a list like summary of the first n BUFRMEssages using a set of metadata keys. Same as calling ls with n.

Parameters:
  • n (int, None) – The number of messages (n > 0) to be printed from the front.

  • **kwargs (dict, optional) – Other keyword arguments passed to ls.

Returns:

See ls.

Return type:

Pandas DataFrame

Notes

The following calls are equivalent:

ds.head()
ds.head(5)
ds.head(n=5)
ds.ls(5)
ds.ls(n=5)
ignore()

Indicates to ignore this source in concatenation/merging.

Return type:

bool

ls(n=None, keys='default', extra_keys=None)

Generate a list like summary of the BUFR message list using a set of metadata keys.

Parameters:
  • n (int, None) – The number of BUFRMEssages to be listed. None means all the messages, n > 0 means messages from the front, while n < 0 means messages from the back of the list.

  • keys (list of str, dict, None) –

    Metadata keys. To specify a column title for each key in the output use a dict with keys as the metadata keys and values as the column titles. If keys is None the following dict will be used to define the titles and the keys:

    [
        "edition",
        "dataCategory",
        "dataSubCategory",
        "bufrHeaderCentre",
        "masterTablesVersionNumber",
        "localTablesVersionNumber",
        "numberOfSubsets",
        "compressedData",
        "typicalDate",
        "typicalTime",
        "ident",
        "localLatitude",
        "localLongitude",
    ]
    

  • extra_keys (list of str, dict, None) – List of additional keys to ``keys``s. To specify a column title for each key in the output use a dict.

Returns:

DataFrame with one row per BUFRMEssage.

Return type:

Pandas DataFrame

Examples

BUFR: using TEMP data

classmethod merge(sources)
metadata(*args, **kwargs)

Return the metadata values for each message.

Parameters:
Returns:

List with one item per BUFRMessage

Return type:

list

mutate()
mutate_source()
name = None
classmethod new_mask_index(*args, **kwargs)
order_by(*args, **kwargs)

Change the order of the messages in a BUFRList object.

Parameters:
  • *args (tuple) – Positional arguments specifying the metadata keys to perform the ordering on. (See below for details)

  • **kwargs (dict, optional) – Other keyword arguments specifying the metadata keys to perform the ordering on. (See below for details)

Returns:

Returns a new object with reordered messages. It contains a view to the data in the original object, so no data is copied.

Return type:

object

property parent

The parent source, if any.

sel(*args, remapping=None, **kwargs)

Use header metadata values to select only certain messages from a BUFRList object.

Parameters:
  • *args (tuple) – Positional arguments specifying the filter condition as dict. (See below for details).

  • **kwargs (dict, optional) – Other keyword arguments specifying the filter conditions. (See below for details).

Returns:

Returns a new object with the filtered elements. It contains a view to the data in the original object, so no data is copied.

Return type:

object

Notes

Filter conditions are specified by a set of metadata keys either by a dictionary (in *args) or a set of **kwargs. Both single or multiple keys are allowed to use and each can specify the following type of filter values:

  • single value:

    ds.sel(dataCategory="2")
    
  • list of values:

    ds.sel(dataCategory=[1, 2])
    
  • slice of values (defines a closed interval, so treated as inclusive of both the start

and stop values, unlike normal Python indexing):

# filter dataCategory between 1 and 4 inclusively
ds.sel(dataCategory=slice(1,4))
source_filename = None
tail(n=5, **kwargs)

Generate a list like summary of the last n BUFRMEssages using a set of metadata keys. Same as calling ls with -n.

Parameters:
  • n (int, None) – The number of messages (n > 0) to be printed from the back.

  • **kwargs (dict, optional) – Other keyword arguments passed to ls.

Returns:

See ls.

Return type:

Pandas DataFrame

Notes

The following calls are equivalent:

ds.tail()
ds.tail(5)
ds.tail(n=5)
ds.ls(-5)
ds.ls(n=-5)
to_data_object()

Convert this source into a data object, if possible.

to_numpy(*args, **kwargs)
to_pandas(columns=None, filters=None, **kwargs)

Extract BUFR data into a pandas DataFrame using pdbufr.

Parameters:
  • columns (str, sequence[str]) – List of ecCodes BUFR keys to extract for each BUFR message/subset. See: pdbufr.read_bufr() for details.

  • filters (dict) – Defines the conditions when to extract the specified columns. See: pdbufr.read_bufr() for details.

  • **kwargs (dict, optional) – Other keyword arguments passed to pdbufr.read_bufr().

Return type:

Pandas DataFrame

Examples

to_target(target, *args, **kwargs)
unique(*args, sort=False, drop_none=True, squeeze=False, unwrap_single=False, remapping=None, patch=None, progress_bar=False, cache=True)

Given a list of metadata attributes, such as date, param, levels, returns the list of unique values for each attributes.

Parameters:
  • *args (tuple) – Positional arguments specifying the metadata keys to collect unique values for.

  • sort (bool, optional) – Whether to sort the collected unique values. Default is False.

  • drop_none (bool, optional) – Whether to drop None values from the collected unique values. Default is True.

  • squeeze (bool, optional) – Whether to return a single value instead of a list if there is only one unique value for a key. Default is False.

  • remapping (dict, optional) – A dictionary for remapping keys or values during collection. Default is None.

  • patch (dict, optional) – A dictionary for patching key values during collection. Default is None.

  • progress_bar (bool, optional) – Whether to display a progress bar during collection. Default is False.

  • cache (bool, optional) – Whether to use a cached collector. Default is False.

class earthkit.data.readers.bufr.file.BUFRListInFile(path, parts=None, positions=None)

Bases: BUFRList, earthkit.data.readers.bufr.core.BUFRReaderBase

Represent a list of BUFRMessages.

property appendable
batched(n)

Iterate through the object in batches of n.

Parameters:

n (int) – Batch size.

Returns:

Returns an iterator yielding batches of n elements. Each batch is a new object containing a view to the data in the original object, so no data is copied. The last batch may contain fewer than n elements.

Return type:

object

property binary
describe(*args, **kwargs)

Generate a summary of the fieldlist.

property filter
get(keys, default=None, astype=None, raise_on_missing=False, output='auto', group_by_key=False, flatten_dict=False)

Return values for the specified keys from all the messages.

Parameters:
  • keys (str, list, tuple) – Specify the metadata keys to extract. Can be a single key (str) or multiple keys as a list/tuple of str. Keys are assumed to be of the form “component.key”. For example, “time.valid_datetime” or “parameter.name”. It is also allowed to specify just the component name like “time” or “parameter”. In this case the corresponding component’s to_dict() method is called and its result is returned. For other keys, the method looks for them in the private components of the fields (if any) and returns the value from the first private component that contains it.

  • default (Any, None) – Specify the default value(s) for keys. Returned when the given key is not found and raise_on_missing is False. When default is a single value, it is used for all the keys. Otherwise it must be a list/tuple of the same length as keys.

  • astype (type as str, int or float) – Return type for keys. When astype is a single type, it is used for all the keys. Otherwise it must be a list/tuple of the same length as keys.

  • raise_on_missing (bool) – When True, raises KeyError if any of keys is not found.

  • output (type, str) –

    Specify the output structure type in conjunction with group_by_key. When group_by is False (default) the output is a list with one item per field and output has the following effect on the items:

    • ”auto” (default):
      • when keys is a str returns a single value per field

      • when keys is a list/tuple returns a list/tuple of values per field

    • list or “list”: returns a list of values per field.

    • tuple or “tuple”: returns a tuple of values per field.

    • dict or “dict”: returns a dictionary with keys and their values per field.

    When group_by_key is True the output is grouped by key as follows and return an object with one item per key. The item contains the list of values for that key from all the fields. When output is dict a dict is returned otherwise list.

  • group_by_key (bool) – When True the output is grouped by key as described in output.

  • flatten_dict (bool) – When True and output is dict, for each field if any of the values in the returned dict is itself a dict, it is flattened to depth 1 by concatenating the keys with a dot. For example, if the returned dict is {"a": {"x": 1, "y": 2}, "b": 3}, it becomes {"a.x": 1, "a.y": 2, "b": 3}. This option is ignored when output is not dict.

  • remapping (dict, optional) –

    Create new metadata keys from existing ones. E.g. to define a new key “param_level” as the concatenated value of the “parameter.variable” and “vertical.level” keys use:

    remapping={"param_level": "{parameter.variable}{vertical.level}"}
    

  • patch (dict, optional) – A dictionary of patch to be applied to the returned values.

Returns:

The returned value depends on the output and group_by_key parameters. See above.

Return type:

list, dict

Raises:

KeyError – If raise_on_missing is True and any of keys is not found.

Examples

>>> import earthkit.data
>>> ds = earthkit.data.from_source("file", "docs/how-tos/test.grib")
>>> ds.get("parameter.variable")
['2t', 'msl']
>>> ds.get(["parameter.variable", "parameter.units"])
[('2t', 'K'), ('msl', 'Pa')]
>>> ds.get(("parameter.variable", "parameter.units"))
[['2t', 'K'], ['msl', 'Pa']]
graph(depth=0)
group_by(*keys, sort=True)

Iterate through the object in groups defined by metadata keys.

Parameters:
  • *keys (tuple) – Positional arguments specifying the metadata keys to group by. Keys can be a single or multiple str, or a list or tuple of str.

  • sort (bool, optional) – If True (default), the object is sorted by the metadata keys before grouping. Sorting is only applied if the object is supporting the sorting operation.

Returns:

Returns an iterator yielding batches of elements grouped by the metadata keys. Each batch is a new object containing a view to the data in the original object, so no data is copied. It generates a new group every time the value of the keys change.

Return type:

object

head(n=5, **kwargs)

Generate a list like summary of the first n BUFRMEssages using a set of metadata keys. Same as calling ls with n.

Parameters:
  • n (int, None) – The number of messages (n > 0) to be printed from the front.

  • **kwargs (dict, optional) – Other keyword arguments passed to ls.

Returns:

See ls.

Return type:

Pandas DataFrame

Notes

The following calls are equivalent:

ds.head()
ds.head(5)
ds.head(n=5)
ds.ls(5)
ds.ls(n=5)
ignore()

Indicates to ignore this source in concatenation/merging.

Return type:

bool

ls(n=None, keys='default', extra_keys=None)

Generate a list like summary of the BUFR message list using a set of metadata keys.

Parameters:
  • n (int, None) – The number of BUFRMEssages to be listed. None means all the messages, n > 0 means messages from the front, while n < 0 means messages from the back of the list.

  • keys (list of str, dict, None) –

    Metadata keys. To specify a column title for each key in the output use a dict with keys as the metadata keys and values as the column titles. If keys is None the following dict will be used to define the titles and the keys:

    [
        "edition",
        "dataCategory",
        "dataSubCategory",
        "bufrHeaderCentre",
        "masterTablesVersionNumber",
        "localTablesVersionNumber",
        "numberOfSubsets",
        "compressedData",
        "typicalDate",
        "typicalTime",
        "ident",
        "localLatitude",
        "localLongitude",
    ]
    

  • extra_keys (list of str, dict, None) – List of additional keys to ``keys``s. To specify a column title for each key in the output use a dict.

Returns:

DataFrame with one row per BUFRMEssage.

Return type:

Pandas DataFrame

Examples

BUFR: using TEMP data

classmethod merge(sources)
property merger
metadata(*args, **kwargs)

Return the metadata values for each message.

Parameters:
Returns:

List with one item per BUFRMessage

Return type:

list

mutate()
mutate_source()
name = None
classmethod new_mask_index(*args, **kwargs)
number_of_parts()
order_by(*args, **kwargs)

Change the order of the messages in a BUFRList object.

Parameters:
  • *args (tuple) – Positional arguments specifying the metadata keys to perform the ordering on. (See below for details)

  • **kwargs (dict, optional) – Other keyword arguments specifying the metadata keys to perform the ordering on. (See below for details)

Returns:

Returns a new object with reordered messages. It contains a view to the data in the original object, so no data is copied.

Return type:

object

property parent

The parent source, if any.

part(n)
property parts
path
sel(*args, remapping=None, **kwargs)

Use header metadata values to select only certain messages from a BUFRList object.

Parameters:
  • *args (tuple) – Positional arguments specifying the filter condition as dict. (See below for details).

  • **kwargs (dict, optional) – Other keyword arguments specifying the filter conditions. (See below for details).

Returns:

Returns a new object with the filtered elements. It contains a view to the data in the original object, so no data is copied.

Return type:

object

Notes

Filter conditions are specified by a set of metadata keys either by a dictionary (in *args) or a set of **kwargs. Both single or multiple keys are allowed to use and each can specify the following type of filter values:

  • single value:

    ds.sel(dataCategory="2")
    
  • list of values:

    ds.sel(dataCategory=[1, 2])
    
  • slice of values (defines a closed interval, so treated as inclusive of both the start

and stop values, unlike normal Python indexing):

# filter dataCategory between 1 and 4 inclusively
ds.sel(dataCategory=slice(1,4))
property source
source_filename = None
property stream
tail(n=5, **kwargs)

Generate a list like summary of the last n BUFRMEssages using a set of metadata keys. Same as calling ls with -n.

Parameters:
  • n (int, None) – The number of messages (n > 0) to be printed from the back.

  • **kwargs (dict, optional) – Other keyword arguments passed to ls.

Returns:

See ls.

Return type:

Pandas DataFrame

Notes

The following calls are equivalent:

ds.tail()
ds.tail(5)
ds.tail(n=5)
ds.ls(-5)
ds.ls(n=-5)
to_data_object()

Convert this source into a data object, if possible.

to_numpy(*args, **kwargs)
to_pandas(columns=None, filters=None, **kwargs)

Extract BUFR data into a pandas DataFrame using pdbufr.

Parameters:
  • columns (str, sequence[str]) – List of ecCodes BUFR keys to extract for each BUFR message/subset. See: pdbufr.read_bufr() for details.

  • filters (dict) – Defines the conditions when to extract the specified columns. See: pdbufr.read_bufr() for details.

  • **kwargs (dict, optional) – Other keyword arguments passed to pdbufr.read_bufr().

Return type:

Pandas DataFrame

Examples

to_target(target, *args, **kwargs)
unique(*args, sort=False, drop_none=True, squeeze=False, unwrap_single=False, remapping=None, patch=None, progress_bar=False, cache=True)

Given a list of metadata attributes, such as date, param, levels, returns the list of unique values for each attributes.

Parameters:
  • *args (tuple) – Positional arguments specifying the metadata keys to collect unique values for.

  • sort (bool, optional) – Whether to sort the collected unique values. Default is False.

  • drop_none (bool, optional) – Whether to drop None values from the collected unique values. Default is True.

  • squeeze (bool, optional) – Whether to return a single value instead of a list if there is only one unique value for a key. Default is False.

  • remapping (dict, optional) – A dictionary for remapping keys or values during collection. Default is None.

  • patch (dict, optional) – A dictionary for patching key values during collection. Default is None.

  • progress_bar (bool, optional) – Whether to display a progress bar during collection. Default is False.

  • cache (bool, optional) – Whether to use a cached collector. Default is False.

class earthkit.data.readers.bufr.file.BUFRReader(source, path, parts=None, positions=None)

Bases: earthkit.data.sources.Source, earthkit.data.readers.bufr.core.BUFRReaderBase

Base class for all sources.

property appendable
property binary
property filter
graph(depth=0)
ignore()

Indicates to ignore this source in concatenation/merging.

Return type:

bool

is_streamable_file()
classmethod merge(sources)
property merger
mutate()
mutate_source()
name = None
property parent

The parent source, if any.

property parts
path
property source
source_filename = None
property stream
to_data_object()

Convert this source into a data object, if possible.

to_featurelist()
to_pandas(*args, **kwargs)
to_target(target, *args, **kwargs)
earthkit.data.readers.bufr.file.BUFR_LS_KEYS = ['edition', 'dataCategory', 'dataSubCategory', 'bufrHeaderCentre', 'masterTablesVersionNumber',...
earthkit.data.readers.bufr.file.COLUMNS = ('latitude', 'longitude', 'data_datetime')
class earthkit.data.readers.bufr.file.MaskBUFRList(*args, **kwargs)

Bases: BUFRList, earthkit.data.core.index.MaskIndex

Represent a list of BUFRMessages.

batched(n)

Iterate through the object in batches of n.

Parameters:

n (int) – Batch size.

Returns:

Returns an iterator yielding batches of n elements. Each batch is a new object containing a view to the data in the original object, so no data is copied. The last batch may contain fewer than n elements.

Return type:

object

describe(*args, **kwargs)

Generate a summary of the fieldlist.

get(keys, default=None, astype=None, raise_on_missing=False, output='auto', group_by_key=False, flatten_dict=False)

Return values for the specified keys from all the messages.

Parameters:
  • keys (str, list, tuple) – Specify the metadata keys to extract. Can be a single key (str) or multiple keys as a list/tuple of str. Keys are assumed to be of the form “component.key”. For example, “time.valid_datetime” or “parameter.name”. It is also allowed to specify just the component name like “time” or “parameter”. In this case the corresponding component’s to_dict() method is called and its result is returned. For other keys, the method looks for them in the private components of the fields (if any) and returns the value from the first private component that contains it.

  • default (Any, None) – Specify the default value(s) for keys. Returned when the given key is not found and raise_on_missing is False. When default is a single value, it is used for all the keys. Otherwise it must be a list/tuple of the same length as keys.

  • astype (type as str, int or float) – Return type for keys. When astype is a single type, it is used for all the keys. Otherwise it must be a list/tuple of the same length as keys.

  • raise_on_missing (bool) – When True, raises KeyError if any of keys is not found.

  • output (type, str) –

    Specify the output structure type in conjunction with group_by_key. When group_by is False (default) the output is a list with one item per field and output has the following effect on the items:

    • ”auto” (default):
      • when keys is a str returns a single value per field

      • when keys is a list/tuple returns a list/tuple of values per field

    • list or “list”: returns a list of values per field.

    • tuple or “tuple”: returns a tuple of values per field.

    • dict or “dict”: returns a dictionary with keys and their values per field.

    When group_by_key is True the output is grouped by key as follows and return an object with one item per key. The item contains the list of values for that key from all the fields. When output is dict a dict is returned otherwise list.

  • group_by_key (bool) – When True the output is grouped by key as described in output.

  • flatten_dict (bool) – When True and output is dict, for each field if any of the values in the returned dict is itself a dict, it is flattened to depth 1 by concatenating the keys with a dot. For example, if the returned dict is {"a": {"x": 1, "y": 2}, "b": 3}, it becomes {"a.x": 1, "a.y": 2, "b": 3}. This option is ignored when output is not dict.

  • remapping (dict, optional) –

    Create new metadata keys from existing ones. E.g. to define a new key “param_level” as the concatenated value of the “parameter.variable” and “vertical.level” keys use:

    remapping={"param_level": "{parameter.variable}{vertical.level}"}
    

  • patch (dict, optional) – A dictionary of patch to be applied to the returned values.

Returns:

The returned value depends on the output and group_by_key parameters. See above.

Return type:

list, dict

Raises:

KeyError – If raise_on_missing is True and any of keys is not found.

Examples

>>> import earthkit.data
>>> ds = earthkit.data.from_source("file", "docs/how-tos/test.grib")
>>> ds.get("parameter.variable")
['2t', 'msl']
>>> ds.get(["parameter.variable", "parameter.units"])
[('2t', 'K'), ('msl', 'Pa')]
>>> ds.get(("parameter.variable", "parameter.units"))
[['2t', 'K'], ['msl', 'Pa']]
graph(depth=0)
group_by(*keys, sort=True)

Iterate through the object in groups defined by metadata keys.

Parameters:
  • *keys (tuple) – Positional arguments specifying the metadata keys to group by. Keys can be a single or multiple str, or a list or tuple of str.

  • sort (bool, optional) – If True (default), the object is sorted by the metadata keys before grouping. Sorting is only applied if the object is supporting the sorting operation.

Returns:

Returns an iterator yielding batches of elements grouped by the metadata keys. Each batch is a new object containing a view to the data in the original object, so no data is copied. It generates a new group every time the value of the keys change.

Return type:

object

head(n=5, **kwargs)

Generate a list like summary of the first n BUFRMEssages using a set of metadata keys. Same as calling ls with n.

Parameters:
  • n (int, None) – The number of messages (n > 0) to be printed from the front.

  • **kwargs (dict, optional) – Other keyword arguments passed to ls.

Returns:

See ls.

Return type:

Pandas DataFrame

Notes

The following calls are equivalent:

ds.head()
ds.head(5)
ds.head(n=5)
ds.ls(5)
ds.ls(n=5)
ignore()

Indicates to ignore this source in concatenation/merging.

Return type:

bool

ls(n=None, keys='default', extra_keys=None)

Generate a list like summary of the BUFR message list using a set of metadata keys.

Parameters:
  • n (int, None) – The number of BUFRMEssages to be listed. None means all the messages, n > 0 means messages from the front, while n < 0 means messages from the back of the list.

  • keys (list of str, dict, None) –

    Metadata keys. To specify a column title for each key in the output use a dict with keys as the metadata keys and values as the column titles. If keys is None the following dict will be used to define the titles and the keys:

    [
        "edition",
        "dataCategory",
        "dataSubCategory",
        "bufrHeaderCentre",
        "masterTablesVersionNumber",
        "localTablesVersionNumber",
        "numberOfSubsets",
        "compressedData",
        "typicalDate",
        "typicalTime",
        "ident",
        "localLatitude",
        "localLongitude",
    ]
    

  • extra_keys (list of str, dict, None) – List of additional keys to ``keys``s. To specify a column title for each key in the output use a dict.

Returns:

DataFrame with one row per BUFRMEssage.

Return type:

Pandas DataFrame

Examples

BUFR: using TEMP data

classmethod merge(sources)
metadata(*args, **kwargs)

Return the metadata values for each message.

Parameters:
Returns:

List with one item per BUFRMessage

Return type:

list

mutate()
mutate_source()
name = None
classmethod new_mask_index(*args, **kwargs)
order_by(*args, **kwargs)

Change the order of the messages in a BUFRList object.

Parameters:
  • *args (tuple) – Positional arguments specifying the metadata keys to perform the ordering on. (See below for details)

  • **kwargs (dict, optional) – Other keyword arguments specifying the metadata keys to perform the ordering on. (See below for details)

Returns:

Returns a new object with reordered messages. It contains a view to the data in the original object, so no data is copied.

Return type:

object

property parent

The parent source, if any.

sel(*args, remapping=None, **kwargs)

Use header metadata values to select only certain messages from a BUFRList object.

Parameters:
  • *args (tuple) – Positional arguments specifying the filter condition as dict. (See below for details).

  • **kwargs (dict, optional) – Other keyword arguments specifying the filter conditions. (See below for details).

Returns:

Returns a new object with the filtered elements. It contains a view to the data in the original object, so no data is copied.

Return type:

object

Notes

Filter conditions are specified by a set of metadata keys either by a dictionary (in *args) or a set of **kwargs. Both single or multiple keys are allowed to use and each can specify the following type of filter values:

  • single value:

    ds.sel(dataCategory="2")
    
  • list of values:

    ds.sel(dataCategory=[1, 2])
    
  • slice of values (defines a closed interval, so treated as inclusive of both the start

and stop values, unlike normal Python indexing):

# filter dataCategory between 1 and 4 inclusively
ds.sel(dataCategory=slice(1,4))
source_filename = None
tail(n=5, **kwargs)

Generate a list like summary of the last n BUFRMEssages using a set of metadata keys. Same as calling ls with -n.

Parameters:
  • n (int, None) – The number of messages (n > 0) to be printed from the back.

  • **kwargs (dict, optional) – Other keyword arguments passed to ls.

Returns:

See ls.

Return type:

Pandas DataFrame

Notes

The following calls are equivalent:

ds.tail()
ds.tail(5)
ds.tail(n=5)
ds.ls(-5)
ds.ls(n=-5)
to_data_object()

Convert this source into a data object, if possible.

to_numpy(*args, **kwargs)
to_pandas(columns=None, filters=None, **kwargs)

Extract BUFR data into a pandas DataFrame using pdbufr.

Parameters:
  • columns (str, sequence[str]) – List of ecCodes BUFR keys to extract for each BUFR message/subset. See: pdbufr.read_bufr() for details.

  • filters (dict) – Defines the conditions when to extract the specified columns. See: pdbufr.read_bufr() for details.

  • **kwargs (dict, optional) – Other keyword arguments passed to pdbufr.read_bufr().

Return type:

Pandas DataFrame

Examples

to_target(target, *args, **kwargs)
unique(*args, sort=False, drop_none=True, squeeze=False, unwrap_single=False, remapping=None, patch=None, progress_bar=False, cache=True)

Given a list of metadata attributes, such as date, param, levels, returns the list of unique values for each attributes.

Parameters:
  • *args (tuple) – Positional arguments specifying the metadata keys to collect unique values for.

  • sort (bool, optional) – Whether to sort the collected unique values. Default is False.

  • drop_none (bool, optional) – Whether to drop None values from the collected unique values. Default is True.

  • squeeze (bool, optional) – Whether to return a single value instead of a list if there is only one unique value for a key. Default is False.

  • remapping (dict, optional) – A dictionary for remapping keys or values during collection. Default is None.

  • patch (dict, optional) – A dictionary for patching key values during collection. Default is None.

  • progress_bar (bool, optional) – Whether to display a progress bar during collection. Default is False.

  • cache (bool, optional) – Whether to use a cached collector. Default is False.

class earthkit.data.readers.bufr.file.MultiBUFRList(*args, **kwargs)

Bases: BUFRList, earthkit.data.core.index.MultiIndex

Represent a list of BUFRMessages.

batched(n)

Iterate through the object in batches of n.

Parameters:

n (int) – Batch size.

Returns:

Returns an iterator yielding batches of n elements. Each batch is a new object containing a view to the data in the original object, so no data is copied. The last batch may contain fewer than n elements.

Return type:

object

describe(*args, **kwargs)

Generate a summary of the fieldlist.

get(keys, default=None, astype=None, raise_on_missing=False, output='auto', group_by_key=False, flatten_dict=False)

Return values for the specified keys from all the messages.

Parameters:
  • keys (str, list, tuple) – Specify the metadata keys to extract. Can be a single key (str) or multiple keys as a list/tuple of str. Keys are assumed to be of the form “component.key”. For example, “time.valid_datetime” or “parameter.name”. It is also allowed to specify just the component name like “time” or “parameter”. In this case the corresponding component’s to_dict() method is called and its result is returned. For other keys, the method looks for them in the private components of the fields (if any) and returns the value from the first private component that contains it.

  • default (Any, None) – Specify the default value(s) for keys. Returned when the given key is not found and raise_on_missing is False. When default is a single value, it is used for all the keys. Otherwise it must be a list/tuple of the same length as keys.

  • astype (type as str, int or float) – Return type for keys. When astype is a single type, it is used for all the keys. Otherwise it must be a list/tuple of the same length as keys.

  • raise_on_missing (bool) – When True, raises KeyError if any of keys is not found.

  • output (type, str) –

    Specify the output structure type in conjunction with group_by_key. When group_by is False (default) the output is a list with one item per field and output has the following effect on the items:

    • ”auto” (default):
      • when keys is a str returns a single value per field

      • when keys is a list/tuple returns a list/tuple of values per field

    • list or “list”: returns a list of values per field.

    • tuple or “tuple”: returns a tuple of values per field.

    • dict or “dict”: returns a dictionary with keys and their values per field.

    When group_by_key is True the output is grouped by key as follows and return an object with one item per key. The item contains the list of values for that key from all the fields. When output is dict a dict is returned otherwise list.

  • group_by_key (bool) – When True the output is grouped by key as described in output.

  • flatten_dict (bool) – When True and output is dict, for each field if any of the values in the returned dict is itself a dict, it is flattened to depth 1 by concatenating the keys with a dot. For example, if the returned dict is {"a": {"x": 1, "y": 2}, "b": 3}, it becomes {"a.x": 1, "a.y": 2, "b": 3}. This option is ignored when output is not dict.

  • remapping (dict, optional) –

    Create new metadata keys from existing ones. E.g. to define a new key “param_level” as the concatenated value of the “parameter.variable” and “vertical.level” keys use:

    remapping={"param_level": "{parameter.variable}{vertical.level}"}
    

  • patch (dict, optional) – A dictionary of patch to be applied to the returned values.

Returns:

The returned value depends on the output and group_by_key parameters. See above.

Return type:

list, dict

Raises:

KeyError – If raise_on_missing is True and any of keys is not found.

Examples

>>> import earthkit.data
>>> ds = earthkit.data.from_source("file", "docs/how-tos/test.grib")
>>> ds.get("parameter.variable")
['2t', 'msl']
>>> ds.get(["parameter.variable", "parameter.units"])
[('2t', 'K'), ('msl', 'Pa')]
>>> ds.get(("parameter.variable", "parameter.units"))
[['2t', 'K'], ['msl', 'Pa']]
graph(depth=0)
group_by(*keys, sort=True)

Iterate through the object in groups defined by metadata keys.

Parameters:
  • *keys (tuple) – Positional arguments specifying the metadata keys to group by. Keys can be a single or multiple str, or a list or tuple of str.

  • sort (bool, optional) – If True (default), the object is sorted by the metadata keys before grouping. Sorting is only applied if the object is supporting the sorting operation.

Returns:

Returns an iterator yielding batches of elements grouped by the metadata keys. Each batch is a new object containing a view to the data in the original object, so no data is copied. It generates a new group every time the value of the keys change.

Return type:

object

head(n=5, **kwargs)

Generate a list like summary of the first n BUFRMEssages using a set of metadata keys. Same as calling ls with n.

Parameters:
  • n (int, None) – The number of messages (n > 0) to be printed from the front.

  • **kwargs (dict, optional) – Other keyword arguments passed to ls.

Returns:

See ls.

Return type:

Pandas DataFrame

Notes

The following calls are equivalent:

ds.head()
ds.head(5)
ds.head(n=5)
ds.ls(5)
ds.ls(n=5)
ignore()

Indicates to ignore this source in concatenation/merging.

Return type:

bool

ls(n=None, keys='default', extra_keys=None)

Generate a list like summary of the BUFR message list using a set of metadata keys.

Parameters:
  • n (int, None) – The number of BUFRMEssages to be listed. None means all the messages, n > 0 means messages from the front, while n < 0 means messages from the back of the list.

  • keys (list of str, dict, None) –

    Metadata keys. To specify a column title for each key in the output use a dict with keys as the metadata keys and values as the column titles. If keys is None the following dict will be used to define the titles and the keys:

    [
        "edition",
        "dataCategory",
        "dataSubCategory",
        "bufrHeaderCentre",
        "masterTablesVersionNumber",
        "localTablesVersionNumber",
        "numberOfSubsets",
        "compressedData",
        "typicalDate",
        "typicalTime",
        "ident",
        "localLatitude",
        "localLongitude",
    ]
    

  • extra_keys (list of str, dict, None) – List of additional keys to ``keys``s. To specify a column title for each key in the output use a dict.

Returns:

DataFrame with one row per BUFRMEssage.

Return type:

Pandas DataFrame

Examples

BUFR: using TEMP data

classmethod merge(sources)
metadata(*args, **kwargs)

Return the metadata values for each message.

Parameters:
Returns:

List with one item per BUFRMessage

Return type:

list

mutate()
mutate_source()
name = None
classmethod new_mask_index(*args, **kwargs)
order_by(*args, **kwargs)

Change the order of the messages in a BUFRList object.

Parameters:
  • *args (tuple) – Positional arguments specifying the metadata keys to perform the ordering on. (See below for details)

  • **kwargs (dict, optional) – Other keyword arguments specifying the metadata keys to perform the ordering on. (See below for details)

Returns:

Returns a new object with reordered messages. It contains a view to the data in the original object, so no data is copied.

Return type:

object

property parent

The parent source, if any.

sel(*args, remapping=None, **kwargs)

Use header metadata values to select only certain messages from a BUFRList object.

Parameters:
  • *args (tuple) – Positional arguments specifying the filter condition as dict. (See below for details).

  • **kwargs (dict, optional) – Other keyword arguments specifying the filter conditions. (See below for details).

Returns:

Returns a new object with the filtered elements. It contains a view to the data in the original object, so no data is copied.

Return type:

object

Notes

Filter conditions are specified by a set of metadata keys either by a dictionary (in *args) or a set of **kwargs. Both single or multiple keys are allowed to use and each can specify the following type of filter values:

  • single value:

    ds.sel(dataCategory="2")
    
  • list of values:

    ds.sel(dataCategory=[1, 2])
    
  • slice of values (defines a closed interval, so treated as inclusive of both the start

and stop values, unlike normal Python indexing):

# filter dataCategory between 1 and 4 inclusively
ds.sel(dataCategory=slice(1,4))
source_filename = None
tail(n=5, **kwargs)

Generate a list like summary of the last n BUFRMEssages using a set of metadata keys. Same as calling ls with -n.

Parameters:
  • n (int, None) – The number of messages (n > 0) to be printed from the back.

  • **kwargs (dict, optional) – Other keyword arguments passed to ls.

Returns:

See ls.

Return type:

Pandas DataFrame

Notes

The following calls are equivalent:

ds.tail()
ds.tail(5)
ds.tail(n=5)
ds.ls(-5)
ds.ls(n=-5)
to_data_object()

Convert this source into a data object, if possible.

to_numpy(*args, **kwargs)
to_pandas(columns=None, filters=None, **kwargs)

Extract BUFR data into a pandas DataFrame using pdbufr.

Parameters:
  • columns (str, sequence[str]) – List of ecCodes BUFR keys to extract for each BUFR message/subset. See: pdbufr.read_bufr() for details.

  • filters (dict) – Defines the conditions when to extract the specified columns. See: pdbufr.read_bufr() for details.

  • **kwargs (dict, optional) – Other keyword arguments passed to pdbufr.read_bufr().

Return type:

Pandas DataFrame

Examples

to_target(target, *args, **kwargs)
unique(*args, sort=False, drop_none=True, squeeze=False, unwrap_single=False, remapping=None, patch=None, progress_bar=False, cache=True)

Given a list of metadata attributes, such as date, param, levels, returns the list of unique values for each attributes.

Parameters:
  • *args (tuple) – Positional arguments specifying the metadata keys to collect unique values for.

  • sort (bool, optional) – Whether to sort the collected unique values. Default is False.

  • drop_none (bool, optional) – Whether to drop None values from the collected unique values. Default is True.

  • squeeze (bool, optional) – Whether to return a single value instead of a list if there is only one unique value for a key. Default is False.

  • remapping (dict, optional) – A dictionary for remapping keys or values during collection. Default is None.

  • patch (dict, optional) – A dictionary for patching key values during collection. Default is None.

  • progress_bar (bool, optional) – Whether to display a progress bar during collection. Default is False.

  • cache (bool, optional) – Whether to use a cached collector. Default is False.

class earthkit.data.readers.bufr.file.MultiBUFRReader(sources)

Bases: earthkit.data.sources.Source, earthkit.data.readers.bufr.core.BUFRReaderBase

Base class for all sources.

property appendable
property binary
property filter
graph(depth=0)
ignore()

Indicates to ignore this source in concatenation/merging.

Return type:

bool

classmethod merge(sources)
property merger
mutate()
mutate_source()
name = None
property parent

The parent source, if any.

property parts
path
property source
source_filename = None
sources
property stream
to_data_object()

Convert this source into a data object, if possible.

to_featurelist()
to_pandas(*args, **kwargs)
to_target(target, *args, **kwargs)