GRIB: indexing
[1]:
import earthkit.data as ekd
First we prepare the data:
[2]:
ekd.download_example_file(["test.grib", "tuv_pl.grib"])
[3]:
!test -d _grib_dir_with_sql || mkdir -p _grib_dir_with_sql
!cp test.grib tuv_pl.grib _grib_dir_with_sql/
Indexing
We can perform indexing on the input GRIB data by using the indexing option in from_source(). The indexing is performed on first data access using the MARS ecCodes keys. The index-data is stored in a sqlite database, which is located in the earthkit-data cache. Subsequent loading of the same data is very fast because it will use the cached index-data.
[4]:
fs = ekd.from_source("file", "tuv_pl.grib", indexing=True)
0%| | 0/1 [00:00<?, ?it/s]
Parsing tuv_pl.grib: 0%| | 0.00/4.22k [00:00<?, ?B/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 12.27it/s]
[5]:
len(fs)
[5]:
18
Indexing also works for a list of files or directories (here “_grib_dir_with_sql”) is a directory:
[6]:
fs = ekd.from_source("file", "./_grib_dir_with_sql", indexing=True)
0%| | 0/2 [00:00<?, ?it/s]
Parsing ./_grib_dir_with_sql/tuv_pl.grib: 0%| | 0.00/4.22k [00:00<?, ?B/s]
Parsing ./_grib_dir_with_sql/test.grib: 0%| | 0.00/1.03k [00:00<?, ?B/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 92.85it/s]
[7]:
len(fs)
[7]:
20
[8]:
type(fs)
[8]:
earthkit.data.readers.grib.index.sql.FieldListInFilesWithSqlIndex
Methods using the SQL database
When calling sel(), isel() or order_by() the metadata is directly read from the index database, so there is no need to load/open any of the GRIB messages.
[9]:
a = fs.sel(level=500)
len(a)
[9]:
3
[10]:
a = fs.order_by("param")
len(a)
[10]:
20
[11]:
a = fs.isel(level=2)
len(a)
[11]:
3
For keys not present in the index db the functions above do not work:
[12]:
try:
a = fs.sel(gridType="regular_ll")
print(len(a))
except KeyError as e:
print(f"error: {e}")
error: 'i_gridType'
### Methods not using the SQL database
Most of the other methods still need to load/open the GRIB messages to extract the required metadata(). Ideally they should all use the index database.
[13]:
print(fs[1])
GribField(u,1000,20180801,1200,0,0)
[14]:
fs[2:4].metadata("param")
[14]:
['v', 't']
from_source() arguments
Selection and sorting arguments can be directly passed to from_source():
[15]:
fs = ekd.from_source("file", "./_grib_dir_with_sql", indexing=True, level=500)
fs.ls()
[15]:
| centre | shortName | typeOfLevel | level | dataDate | dataTime | stepRange | dataType | number | gridType | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ecmf | t | isobaricInhPa | 500 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 1 | ecmf | u | isobaricInhPa | 500 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 2 | ecmf | v | isobaricInhPa | 500 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
[16]:
fs = ekd.from_source("file", "./_grib_dir_with_sql", indexing=True, level=[500, 850],
order_by="variable")
fs.ls()
[16]:
| centre | shortName | typeOfLevel | level | dataDate | dataTime | stepRange | dataType | number | gridType | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ecmf | t | isobaricInhPa | 850 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 1 | ecmf | t | isobaricInhPa | 500 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 2 | ecmf | u | isobaricInhPa | 850 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 3 | ecmf | u | isobaricInhPa | 500 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 4 | ecmf | v | isobaricInhPa | 850 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 5 | ecmf | v | isobaricInhPa | 500 | 20180801 | 1200 | 0 | an | 0 | regular_ll |