GRIB: selection using metadata

We read a GRIB file containing 18 messages. First we ensure the example file is available.

[1]:
import earthkit.data as ekd
ekd.download_example_file("tuv_pl.grib")
[2]:
ds = ekd.from_source("file", "tuv_pl.grib")
[3]:
len(ds)
[3]:
18

Using sel

Calling sel() provides a “view”:

[4]:
a = ds.sel(level=500)
a.ls()
[4]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType
0 ecmf t isobaricInhPa 500 20180801 1200 0 an 0 regular_ll
1 ecmf u isobaricInhPa 500 20180801 1200 0 an 0 regular_ll
2 ecmf v isobaricInhPa 500 20180801 1200 0 an 0 regular_ll
[5]:
type(a)
[5]:
earthkit.data.readers.grib.index.GribMaskFieldList

We can use a dict instead of keyword arguments:

[6]:
a = ds.sel({"level": 500, "shortName": "v"})
a.ls()
[6]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType
0 ecmf v isobaricInhPa 500 20180801 1200 0 an 0 regular_ll

Lists are accepted:

[7]:
a = ds.sel(level=[500, 850])
a.ls()
[7]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType
0 ecmf t isobaricInhPa 850 20180801 1200 0 an 0 regular_ll
1 ecmf u isobaricInhPa 850 20180801 1200 0 an 0 regular_ll
2 ecmf v isobaricInhPa 850 20180801 1200 0 an 0 regular_ll
3 ecmf t isobaricInhPa 500 20180801 1200 0 an 0 regular_ll
4 ecmf u isobaricInhPa 500 20180801 1200 0 an 0 regular_ll
5 ecmf v isobaricInhPa 500 20180801 1200 0 an 0 regular_ll

Slices can define closed intervals, so they are treated as inclusive of both the start and stop values, unlike normal Python indexing:

[8]:
a = ds.sel(param="t", level=slice(500, 850))
a.ls()
[8]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType
0 ecmf t isobaricInhPa 850 20180801 1200 0 an 0 regular_ll
1 ecmf t isobaricInhPa 700 20180801 1200 0 an 0 regular_ll
2 ecmf t isobaricInhPa 500 20180801 1200 0 an 0 regular_ll

Using isel

isel() works similarly to sel() but takes indices instead of values. Please note that the index values are sorted for sel().

[9]:
a = ds.isel(level=0)
a.ls()
[9]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType
0 ecmf t isobaricInhPa 300 20180801 1200 0 an 0 regular_ll
1 ecmf u isobaricInhPa 300 20180801 1200 0 an 0 regular_ll
2 ecmf v isobaricInhPa 300 20180801 1200 0 an 0 regular_ll
[10]:
a = ds.isel({"level": 2, "shortName": 1})
a.ls()
[10]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType
0 ecmf u isobaricInhPa 500 20180801 1200 0 an 0 regular_ll
[11]:
a = ds.isel(level=[2,3], param=0)
a.ls()
[11]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType
0 ecmf t isobaricInhPa 700 20180801 1200 0 an 0 regular_ll
1 ecmf t isobaricInhPa 500 20180801 1200 0 an 0 regular_ll

Slices are used as in normal Python indexing:

[12]:
a = ds.isel(level=slice(2,5), param=0)
a.ls()
[12]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType
0 ecmf t isobaricInhPa 850 20180801 1200 0 an 0 regular_ll
1 ecmf t isobaricInhPa 700 20180801 1200 0 an 0 regular_ll
2 ecmf t isobaricInhPa 500 20180801 1200 0 an 0 regular_ll

Using order_by

Calling order_by() provides a “view”:

[13]:
b = a.order_by()
type(b)
[13]:
earthkit.data.readers.grib.index.GribMaskFieldList
[14]:
b.ls()
[14]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType
0 ecmf t isobaricInhPa 850 20180801 1200 0 an 0 regular_ll
1 ecmf t isobaricInhPa 700 20180801 1200 0 an 0 regular_ll
2 ecmf t isobaricInhPa 500 20180801 1200 0 an 0 regular_ll

The sorting keys can be specified as a list:

[15]:
b = a.order_by(["shortName"])
b.ls()
[15]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType
0 ecmf t isobaricInhPa 850 20180801 1200 0 an 0 regular_ll
1 ecmf t isobaricInhPa 700 20180801 1200 0 an 0 regular_ll
2 ecmf t isobaricInhPa 500 20180801 1200 0 an 0 regular_ll

We can prescribe the actual order within a key. It only works when all the possible values are specified:

[16]:
a = a.order_by(shortName=["v", "t", "u"])
a.ls()
[16]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType
0 ecmf t isobaricInhPa 850 20180801 1200 0 an 0 regular_ll
1 ecmf t isobaricInhPa 700 20180801 1200 0 an 0 regular_ll
2 ecmf t isobaricInhPa 500 20180801 1200 0 an 0 regular_ll

Combining sel and order_by

[17]:
a = ds.sel(level=[500, 850]).order_by(["shortName"])
a.ls()
[17]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType
0 ecmf t isobaricInhPa 850 20180801 1200 0 an 0 regular_ll
1 ecmf t isobaricInhPa 500 20180801 1200 0 an 0 regular_ll
2 ecmf u isobaricInhPa 850 20180801 1200 0 an 0 regular_ll
3 ecmf u isobaricInhPa 500 20180801 1200 0 an 0 regular_ll
4 ecmf v isobaricInhPa 850 20180801 1200 0 an 0 regular_ll
5 ecmf v isobaricInhPa 500 20180801 1200 0 an 0 regular_ll

Using indices

To get the unique values for predefined set of metadata keys (the MARS ecCodes keys) we can call indices(). By default it returns all the keys having valid values:

[18]:
ds.indices()
[18]:
{'class': ['od'],
 'stream': ['oper'],
 'levtype': ['pl'],
 'type': ['an'],
 'expver': ['0001'],
 'date': [20180801],
 'time': [1200],
 'domain': ['g'],
 'number': [0],
 'levelist': [300, 400, 500, 700, 850, 1000],
 'param': ['t', 'u', 'v'],
 'level': [300, 400, 500, 700, 850, 1000],
 'shortName': ['t', 'u', 'v']}

We can use the squeeze option to see only the keys having more than one values:

[19]:
ds.indices(squeeze=True)
[19]:
{'levelist': [300, 400, 500, 700, 850, 1000],
 'param': ['t', 'u', 'v'],
 'level': [300, 400, 500, 700, 850, 1000],
 'shortName': ['t', 'u', 'v']}

We can get the unique values for a given key with index() :

[20]:
ds.index("param")
[20]:
['t', 'u', 'v']
[21]:
ds.index("date")
[21]:
[20180801]

Aliases can be used. E.g. instead of levelist we can use level:

[22]:
ds.index("level")
[22]:
[300, 400, 500, 700, 850, 1000]

Count the number of fields for each available level:

[23]:
for level in ds.index("level"):
    print(f"level={level} len={len(ds.sel(level=level))}")
level=300 len=3
level=400 len=3
level=500 len=3
level=700 len=3
level=850 len=3
level=1000 len=3
[ ]: