GRIB: indexing

[1]:
import earthkit.data as ekd

First we prepare the data:

[2]:
ekd.download_example_file(["test.grib", "tuv_pl.grib"])
[3]:
!test -d _grib_dir_with_sql || mkdir -p _grib_dir_with_sql
!cp test.grib tuv_pl.grib _grib_dir_with_sql/

Indexing

We can perform indexing on the input GRIB data by using the indexing option in from_source(). The indexing is performed on first data access using the MARS ecCodes keys. The index-data is stored in a sqlite database, which is located in the earthkit-data cache. Subsequent loading of the same data is very fast because it will use the cached index-data.

[4]:
fs = ekd.from_source("file", "tuv_pl.grib", indexing=True)
  0%|                                                                                                                                                                    | 0/1 [00:00<?, ?it/s]
Parsing tuv_pl.grib:   0%|                                                                                                                                         | 0.00/4.22k [00:00<?, ?B/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 12.27it/s]
[5]:
len(fs)
[5]:
18

Indexing also works for a list of files or directories (here “_grib_dir_with_sql”) is a directory:

[6]:
fs = ekd.from_source("file", "./_grib_dir_with_sql", indexing=True)
  0%|                                                                                                                                                                    | 0/2 [00:00<?, ?it/s]
Parsing ./_grib_dir_with_sql/tuv_pl.grib:   0%|                                                                                                                    | 0.00/4.22k [00:00<?, ?B/s]

Parsing ./_grib_dir_with_sql/test.grib:   0%|                                                                                                                      | 0.00/1.03k [00:00<?, ?B/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 92.85it/s]
[7]:
len(fs)
[7]:
20
[8]:
type(fs)
[8]:
earthkit.data.readers.grib.index.sql.FieldListInFilesWithSqlIndex

Methods using the SQL database

When calling sel(), isel() or order_by() the metadata is directly read from the index database, so there is no need to load/open any of the GRIB messages.

[9]:
a = fs.sel(level=500)
len(a)
[9]:
3
[10]:
a = fs.order_by("param")
len(a)
[10]:
20
[11]:
a = fs.isel(level=2)
len(a)
[11]:
3

For keys not present in the index db the functions above do not work:

[12]:
try:
    a = fs.sel(gridType="regular_ll")
    print(len(a))
except KeyError as e:
    print(f"error: {e}")
error: 'i_gridType'

### Methods not using the SQL database

Most of the other methods still need to load/open the GRIB messages to extract the required metadata(). Ideally they should all use the index database.

[13]:
print(fs[1])
GribField(u,1000,20180801,1200,0,0)
[14]:
fs[2:4].metadata("param")
[14]:
['v', 't']

from_source() arguments

Selection and sorting arguments can be directly passed to from_source():

[15]:
fs = ekd.from_source("file", "./_grib_dir_with_sql", indexing=True, level=500)
fs.ls()
[15]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType
0 ecmf t isobaricInhPa 500 20180801 1200 0 an 0 regular_ll
1 ecmf u isobaricInhPa 500 20180801 1200 0 an 0 regular_ll
2 ecmf v isobaricInhPa 500 20180801 1200 0 an 0 regular_ll
[16]:
fs = ekd.from_source("file", "./_grib_dir_with_sql", indexing=True, level=[500, 850],
                               order_by="variable")
fs.ls()
[16]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType
0 ecmf t isobaricInhPa 850 20180801 1200 0 an 0 regular_ll
1 ecmf t isobaricInhPa 500 20180801 1200 0 an 0 regular_ll
2 ecmf u isobaricInhPa 850 20180801 1200 0 an 0 regular_ll
3 ecmf u isobaricInhPa 500 20180801 1200 0 an 0 regular_ll
4 ecmf v isobaricInhPa 850 20180801 1200 0 an 0 regular_ll
5 ecmf v isobaricInhPa 500 20180801 1200 0 an 0 regular_ll