GRIB: selection using metadata

We read a GRIB file containing 18 messages. First we ensure the example file is available.

[1]:

import earthkit.data as ekd
ekd.download_example_file("tuv_pl.grib")

[2]:

ds = ekd.from_source("file", "tuv_pl.grib")

[3]:

len(ds)

[3]:

Using sel

Calling sel() provides a “view”:

[4]:

a = ds.sel(level=500)
a.ls()

[4]:

	centre	shortName	typeOfLevel	level	dataDate	dataTime	dataType	gridType
0	ecmf	t	isobaricInhPa	500	20180801	1200	an	regular_ll
1	ecmf	u	isobaricInhPa	500	20180801	1200	an	regular_ll
2	ecmf	v	isobaricInhPa	500	20180801	1200	an	regular_ll

[5]:

type(a)

[5]:

earthkit.data.readers.grib.index.GribMaskFieldList

We can use a dict instead of keyword arguments:

[6]:

a = ds.sel({"level": 500, "shortName": "v"})
a.ls()

[6]:

	centre	shortName	typeOfLevel	level	dataDate	dataTime	stepRange	dataType	number	gridType
0	ecmf	v	isobaricInhPa	500	20180801	1200	0	an	0	regular_ll

Lists are accepted:

[7]:

a = ds.sel(level=[500, 850])
a.ls()

[7]:

	centre	shortName	typeOfLevel	level	dataDate	dataTime	dataType	gridType
0	ecmf	t	isobaricInhPa	850	20180801	1200	an	regular_ll
1	ecmf	u	isobaricInhPa	850	20180801	1200	an	regular_ll
2	ecmf	v	isobaricInhPa	850	20180801	1200	an	regular_ll
3	ecmf	t	isobaricInhPa	500	20180801	1200	an	regular_ll
4	ecmf	u	isobaricInhPa	500	20180801	1200	an	regular_ll
5	ecmf	v	isobaricInhPa	500	20180801	1200	an	regular_ll

Slices can define closed intervals, so they are treated as inclusive of both the start and stop values, unlike normal Python indexing:

[8]:

a = ds.sel(param="t", level=slice(500, 850))
a.ls()

[8]:

	centre	shortName	typeOfLevel	level	dataDate	dataTime	dataType	gridType
0	ecmf	t	isobaricInhPa	850	20180801	1200	an	regular_ll
1	ecmf	t	isobaricInhPa	700	20180801	1200	an	regular_ll
2	ecmf	t	isobaricInhPa	500	20180801	1200	an	regular_ll

Using isel

isel() works similarly to sel() but takes indices instead of values. Please note that the index values are sorted for sel().

[9]:

a = ds.isel(level=0)
a.ls()

[9]:

	centre	shortName	typeOfLevel	level	dataDate	dataTime	dataType	gridType
0	ecmf	t	isobaricInhPa	300	20180801	1200	an	regular_ll
1	ecmf	u	isobaricInhPa	300	20180801	1200	an	regular_ll
2	ecmf	v	isobaricInhPa	300	20180801	1200	an	regular_ll

[10]:

a = ds.isel({"level": 2, "shortName": 1})
a.ls()

[10]:

	centre	shortName	typeOfLevel	level	dataDate	dataTime	stepRange	dataType	number	gridType
0	ecmf	u	isobaricInhPa	500	20180801	1200	0	an	0	regular_ll

[11]:

a = ds.isel(level=[2,3], param=0)
a.ls()

[11]:

	centre	shortName	typeOfLevel	level	dataDate	dataTime	stepRange	dataType	number	gridType
0	ecmf	t	isobaricInhPa	700	20180801	1200	0	an	0	regular_ll
1	ecmf	t	isobaricInhPa	500	20180801	1200	0	an	0	regular_ll

Slices are used as in normal Python indexing:

[12]:

a = ds.isel(level=slice(2,5), param=0)
a.ls()

[12]:

	centre	shortName	typeOfLevel	level	dataDate	dataTime	dataType	gridType
0	ecmf	t	isobaricInhPa	850	20180801	1200	an	regular_ll
1	ecmf	t	isobaricInhPa	700	20180801	1200	an	regular_ll
2	ecmf	t	isobaricInhPa	500	20180801	1200	an	regular_ll

Using order_by

Calling order_by() provides a “view”:

[13]:

b = a.order_by()
type(b)

[13]:

earthkit.data.readers.grib.index.GribMaskFieldList

[14]:

b.ls()

[14]:

	centre	shortName	typeOfLevel	level	dataDate	dataTime	dataType	gridType
0	ecmf	t	isobaricInhPa	850	20180801	1200	an	regular_ll
1	ecmf	t	isobaricInhPa	700	20180801	1200	an	regular_ll
2	ecmf	t	isobaricInhPa	500	20180801	1200	an	regular_ll

The sorting keys can be specified as a list:

[15]:

b = a.order_by(["shortName"])
b.ls()

[15]:

	centre	shortName	typeOfLevel	level	dataDate	dataTime	dataType	gridType
0	ecmf	t	isobaricInhPa	850	20180801	1200	an	regular_ll
1	ecmf	t	isobaricInhPa	700	20180801	1200	an	regular_ll
2	ecmf	t	isobaricInhPa	500	20180801	1200	an	regular_ll

We can prescribe the actual order within a key. It only works when all the possible values are specified:

[16]:

a = a.order_by(shortName=["v", "t", "u"])
a.ls()

[16]:

	centre	shortName	typeOfLevel	level	dataDate	dataTime	dataType	gridType
0	ecmf	t	isobaricInhPa	850	20180801	1200	an	regular_ll
1	ecmf	t	isobaricInhPa	700	20180801	1200	an	regular_ll
2	ecmf	t	isobaricInhPa	500	20180801	1200	an	regular_ll

Combining sel and order_by

[17]:

a = ds.sel(level=[500, 850]).order_by(["shortName"])
a.ls()

[17]:

	centre	shortName	typeOfLevel	level	dataDate	dataTime	dataType	gridType
0	ecmf	t	isobaricInhPa	850	20180801	1200	an	regular_ll
1	ecmf	t	isobaricInhPa	500	20180801	1200	an	regular_ll
2	ecmf	u	isobaricInhPa	850	20180801	1200	an	regular_ll
3	ecmf	u	isobaricInhPa	500	20180801	1200	an	regular_ll
4	ecmf	v	isobaricInhPa	850	20180801	1200	an	regular_ll
5	ecmf	v	isobaricInhPa	500	20180801	1200	an	regular_ll

Using indices

To get the unique values for predefined set of metadata keys (the MARS ecCodes keys) we can call indices(). By default it returns all the keys having valid values:

[18]:

ds.indices()

[18]:

{'class': ['od'],
 'stream': ['oper'],
 'levtype': ['pl'],
 'type': ['an'],
 'expver': ['0001'],
 'date': [20180801],
 'time': [1200],
 'domain': ['g'],
 'number': [0],
 'levelist': [300, 400, 500, 700, 850, 1000],
 'param': ['t', 'u', 'v'],
 'level': [300, 400, 500, 700, 850, 1000],
 'shortName': ['t', 'u', 'v']}

We can use the squeeze option to see only the keys having more than one values:

[19]:

ds.indices(squeeze=True)

[19]:

{'levelist': [300, 400, 500, 700, 850, 1000],
 'param': ['t', 'u', 'v'],
 'level': [300, 400, 500, 700, 850, 1000],
 'shortName': ['t', 'u', 'v']}

We can get the unique values for a given key with index() :

[20]:

ds.index("param")

[20]:

['t', 'u', 'v']

[21]:

ds.index("date")

[21]:

[20180801]

Aliases can be used. E.g. instead of levelist we can use level:

[22]:

ds.index("level")

[22]:

[300, 400, 500, 700, 850, 1000]

Count the number of fields for each available level:

[23]:

for level in ds.index("level"):
    print(f"level={level} len={len(ds.sel(level=level))}")

level=300 len=3
level=400 len=3
level=500 len=3
level=700 len=3
level=850 len=3
level=1000 len=3

[ ]: