GRIB: indexing

[1]:

import earthkit.data as ekd

First we prepare the data:

[2]:

ekd.download_example_file(["test.grib", "tuv_pl.grib"])

[3]:

!test -d _grib_dir_with_sql || mkdir -p _grib_dir_with_sql
!cp test.grib tuv_pl.grib _grib_dir_with_sql/

Indexing

We can perform indexing on the input GRIB data by using the indexing option in from_source(). The indexing is performed on first data access using the MARS ecCodes keys. The index-data is stored in a sqlite database, which is located in the earthkit-data cache. Subsequent loading of the same data is very fast because it will use the cached index-data.

[4]:

fs = ekd.from_source("file", "tuv_pl.grib", indexing=True)

  0%|                                                                                                                                                                    | 0/1 [00:00<?, ?it/s]
Parsing tuv_pl.grib:   0%|                                                                                                                                         | 0.00/4.22k [00:00<?, ?B/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 12.27it/s]

[5]:

len(fs)

[5]:

Indexing also works for a list of files or directories (here “_grib_dir_with_sql”) is a directory:

[6]:

fs = ekd.from_source("file", "./_grib_dir_with_sql", indexing=True)

  0%|                                                                                                                                                                    | 0/2 [00:00<?, ?it/s]
Parsing ./_grib_dir_with_sql/tuv_pl.grib:   0%|                                                                                                                    | 0.00/4.22k [00:00<?, ?B/s]

Parsing ./_grib_dir_with_sql/test.grib:   0%|                                                                                                                      | 0.00/1.03k [00:00<?, ?B/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 92.85it/s]

[7]:

len(fs)

[7]:

[8]:

type(fs)

[8]:

earthkit.data.readers.grib.index.sql.FieldListInFilesWithSqlIndex

Methods using the SQL database

When calling sel(), isel() or order_by() the metadata is directly read from the index database, so there is no need to load/open any of the GRIB messages.

[9]:

a = fs.sel(level=500)
len(a)

[9]:

[10]:

a = fs.order_by("param")
len(a)

[10]:

[11]:

a = fs.isel(level=2)
len(a)

[11]:

For keys not present in the index db the functions above do not work:

[12]:

try:
    a = fs.sel(gridType="regular_ll")
    print(len(a))
except KeyError as e:
    print(f"error: {e}")

error: 'i_gridType'

### Methods not using the SQL database

Most of the other methods still need to load/open the GRIB messages to extract the required metadata(). Ideally they should all use the index database.

[13]:

print(fs[1])

GribField(u,1000,20180801,1200,0,0)

[14]:

fs[2:4].metadata("param")

[14]:

['v', 't']

from_source() arguments

Selection and sorting arguments can be directly passed to from_source():

[15]:

fs = ekd.from_source("file", "./_grib_dir_with_sql", indexing=True, level=500)
fs.ls()

[15]:

	centre	shortName	typeOfLevel	level	dataDate	dataTime	dataType	gridType
0	ecmf	t	isobaricInhPa	500	20180801	1200	an	regular_ll
1	ecmf	u	isobaricInhPa	500	20180801	1200	an	regular_ll
2	ecmf	v	isobaricInhPa	500	20180801	1200	an	regular_ll

[16]:

fs = ekd.from_source("file", "./_grib_dir_with_sql", indexing=True, level=[500, 850],
                               order_by="variable")
fs.ls()

[16]:

	centre	shortName	typeOfLevel	level	dataDate	dataTime	dataType	gridType
0	ecmf	t	isobaricInhPa	850	20180801	1200	an	regular_ll
1	ecmf	t	isobaricInhPa	500	20180801	1200	an	regular_ll
2	ecmf	u	isobaricInhPa	850	20180801	1200	an	regular_ll
3	ecmf	u	isobaricInhPa	500	20180801	1200	an	regular_ll
4	ecmf	v	isobaricInhPa	850	20180801	1200	an	regular_ll
5	ecmf	v	isobaricInhPa	500	20180801	1200	an	regular_ll