GRIB: selection using metadata
We read a GRIB file containing 18 messages. First we ensure the example file is available.
[1]:
import earthkit.data as ekd
ekd.download_example_file("tuv_pl.grib")
[2]:
ds = ekd.from_source("file", "tuv_pl.grib")
[3]:
len(ds)
[3]:
18
Using sel
Calling sel() provides a “view”:
[4]:
a = ds.sel(level=500)
a.ls()
[4]:
| centre | shortName | typeOfLevel | level | dataDate | dataTime | stepRange | dataType | number | gridType | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ecmf | t | isobaricInhPa | 500 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 1 | ecmf | u | isobaricInhPa | 500 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 2 | ecmf | v | isobaricInhPa | 500 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
[5]:
type(a)
[5]:
earthkit.data.readers.grib.index.GribMaskFieldList
We can use a dict instead of keyword arguments:
[6]:
a = ds.sel({"level": 500, "shortName": "v"})
a.ls()
[6]:
| centre | shortName | typeOfLevel | level | dataDate | dataTime | stepRange | dataType | number | gridType | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ecmf | v | isobaricInhPa | 500 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
Lists are accepted:
[7]:
a = ds.sel(level=[500, 850])
a.ls()
[7]:
| centre | shortName | typeOfLevel | level | dataDate | dataTime | stepRange | dataType | number | gridType | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ecmf | t | isobaricInhPa | 850 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 1 | ecmf | u | isobaricInhPa | 850 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 2 | ecmf | v | isobaricInhPa | 850 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 3 | ecmf | t | isobaricInhPa | 500 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 4 | ecmf | u | isobaricInhPa | 500 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 5 | ecmf | v | isobaricInhPa | 500 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
Slices can define closed intervals, so they are treated as inclusive of both the start and stop values, unlike normal Python indexing:
[8]:
a = ds.sel(param="t", level=slice(500, 850))
a.ls()
[8]:
| centre | shortName | typeOfLevel | level | dataDate | dataTime | stepRange | dataType | number | gridType | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ecmf | t | isobaricInhPa | 850 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 1 | ecmf | t | isobaricInhPa | 700 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 2 | ecmf | t | isobaricInhPa | 500 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
Using isel
isel() works similarly to sel() but takes indices instead of values. Please note that the index values are sorted for sel().
[9]:
a = ds.isel(level=0)
a.ls()
[9]:
| centre | shortName | typeOfLevel | level | dataDate | dataTime | stepRange | dataType | number | gridType | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ecmf | t | isobaricInhPa | 300 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 1 | ecmf | u | isobaricInhPa | 300 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 2 | ecmf | v | isobaricInhPa | 300 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
[10]:
a = ds.isel({"level": 2, "shortName": 1})
a.ls()
[10]:
| centre | shortName | typeOfLevel | level | dataDate | dataTime | stepRange | dataType | number | gridType | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ecmf | u | isobaricInhPa | 500 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
[11]:
a = ds.isel(level=[2,3], param=0)
a.ls()
[11]:
| centre | shortName | typeOfLevel | level | dataDate | dataTime | stepRange | dataType | number | gridType | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ecmf | t | isobaricInhPa | 700 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 1 | ecmf | t | isobaricInhPa | 500 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
Slices are used as in normal Python indexing:
[12]:
a = ds.isel(level=slice(2,5), param=0)
a.ls()
[12]:
| centre | shortName | typeOfLevel | level | dataDate | dataTime | stepRange | dataType | number | gridType | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ecmf | t | isobaricInhPa | 850 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 1 | ecmf | t | isobaricInhPa | 700 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 2 | ecmf | t | isobaricInhPa | 500 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
Using order_by
Calling order_by() provides a “view”:
[13]:
b = a.order_by()
type(b)
[13]:
earthkit.data.readers.grib.index.GribMaskFieldList
[14]:
b.ls()
[14]:
| centre | shortName | typeOfLevel | level | dataDate | dataTime | stepRange | dataType | number | gridType | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ecmf | t | isobaricInhPa | 850 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 1 | ecmf | t | isobaricInhPa | 700 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 2 | ecmf | t | isobaricInhPa | 500 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
The sorting keys can be specified as a list:
[15]:
b = a.order_by(["shortName"])
b.ls()
[15]:
| centre | shortName | typeOfLevel | level | dataDate | dataTime | stepRange | dataType | number | gridType | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ecmf | t | isobaricInhPa | 850 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 1 | ecmf | t | isobaricInhPa | 700 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 2 | ecmf | t | isobaricInhPa | 500 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
We can prescribe the actual order within a key. It only works when all the possible values are specified:
[16]:
a = a.order_by(shortName=["v", "t", "u"])
a.ls()
[16]:
| centre | shortName | typeOfLevel | level | dataDate | dataTime | stepRange | dataType | number | gridType | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ecmf | t | isobaricInhPa | 850 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 1 | ecmf | t | isobaricInhPa | 700 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 2 | ecmf | t | isobaricInhPa | 500 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
Combining sel and order_by
[17]:
a = ds.sel(level=[500, 850]).order_by(["shortName"])
a.ls()
[17]:
| centre | shortName | typeOfLevel | level | dataDate | dataTime | stepRange | dataType | number | gridType | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ecmf | t | isobaricInhPa | 850 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 1 | ecmf | t | isobaricInhPa | 500 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 2 | ecmf | u | isobaricInhPa | 850 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 3 | ecmf | u | isobaricInhPa | 500 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 4 | ecmf | v | isobaricInhPa | 850 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 5 | ecmf | v | isobaricInhPa | 500 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
Using indices
To get the unique values for predefined set of metadata keys (the MARS ecCodes keys) we can call indices(). By default it returns all the keys having valid values:
[18]:
ds.indices()
[18]:
{'class': ['od'],
'stream': ['oper'],
'levtype': ['pl'],
'type': ['an'],
'expver': ['0001'],
'date': [20180801],
'time': [1200],
'domain': ['g'],
'number': [0],
'levelist': [300, 400, 500, 700, 850, 1000],
'param': ['t', 'u', 'v'],
'level': [300, 400, 500, 700, 850, 1000],
'shortName': ['t', 'u', 'v']}
We can use the squeeze option to see only the keys having more than one values:
[19]:
ds.indices(squeeze=True)
[19]:
{'levelist': [300, 400, 500, 700, 850, 1000],
'param': ['t', 'u', 'v'],
'level': [300, 400, 500, 700, 850, 1000],
'shortName': ['t', 'u', 'v']}
We can get the unique values for a given key with index() :
[20]:
ds.index("param")
[20]:
['t', 'u', 'v']
[21]:
ds.index("date")
[21]:
[20180801]
Aliases can be used. E.g. instead of levelist we can use level:
[22]:
ds.index("level")
[22]:
[300, 400, 500, 700, 850, 1000]
Count the number of fields for each available level:
[23]:
for level in ds.index("level"):
print(f"level={level} len={len(ds.sel(level=level))}")
level=300 len=3
level=400 len=3
level=500 len=3
level=700 len=3
level=850 len=3
level=1000 len=3
[ ]: