Reading file parts

[1]:
import earthkit.data as ekd

This notebook demonstrates how to use only parts (byte ranges) from GRIB files.

First we ensure the example files are available.

[2]:
ekd.download_example_file(["test.grib", "test6.grib", "tuv_pl.grib"])

We load one of the files and inspect the contents with ls(). By using the “offset” key we can get the byte positions where each message starts within the file.

[3]:
ds = ekd.from_source("file", "test6.grib")
ds.to_fieldlist().ls(keys="metadata.offset")
[3]:
metadata.offset
0 0.0
1 240.0
2 480.0
3 720.0
4 960.0
5 1200.0

Single files

The parts option in from_source() specifies the byte range(s) we want to read from a file. A single part is a tuple or list in the following format: (offset, length).

Using the offsets from the example above we can specify the part for the fist message.

[4]:
ds = ekd.from_source("file", "test6.grib", parts=(0, 240))
ds.to_fieldlist().ls()
[4]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 t 2018-08-01 12:00:00 2018-08-01 12:00:00 0 days 1000 pressure 0 regular_ll

The call above can also be written as:

[5]:
ds = ekd.from_source("file", "test6.grib", parts=[(0, 240)])
ds.to_fieldlist().ls()
[5]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 t 2018-08-01 12:00:00 2018-08-01 12:00:00 0 days 1000 pressure 0 regular_ll

A part can go over a message boundary. Here bytes 240-244 belong to the second message, which is not read because not all of its bytes are specified.

[6]:
ds = ekd.from_source("file", "test6.grib", parts=[(0, 245)]).to_fieldlist()
ds.to_fieldlist().ls()
[6]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 t 2018-08-01 12:00:00 2018-08-01 12:00:00 0 days 1000 pressure 0 regular_ll

Multiple parts can be used.

[7]:
ds = ekd.from_source("file", "test6.grib", parts=[(0, 240), (480, 480)])
ds.to_fieldlist().ls()
[7]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 t 2018-08-01 12:00:00 2018-08-01 12:00:00 0 days 1000 pressure 0 regular_ll
1 v 2018-08-01 12:00:00 2018-08-01 12:00:00 0 days 1000 pressure 0 regular_ll
2 t 2018-08-01 12:00:00 2018-08-01 12:00:00 0 days 850 pressure 0 regular_ll

Parts cannot overlap.

[8]:
try:
    ds = ekd.from_source("file", "test6.grib", parts=[(0, 240), (220, 240)])
except Exception as e:
    print(e)
Offsets and lengths must be in order, and not overlapping: offset=220, end of previous part=240

Multiple files

When using multiple files we can specify the ref:part <parts> for each file with the following syntax:

[9]:
ds = ekd.from_source("file", [["test.grib", (0, 526)], ["test6.grib", [(0, 240), (480, 240)]]])
ds.to_fieldlist().ls()
[9]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 2t 2020-05-13 12:00:00 2020-05-13 12:00:00 0 days 0 surface 0 regular_ll
1 t 2018-08-01 12:00:00 2018-08-01 12:00:00 0 days 1000 pressure 0 regular_ll
2 v 2018-08-01 12:00:00 2018-08-01 12:00:00 0 days 1000 pressure 0 regular_ll

When a part is None for a given file the whole file will be used.

[10]:
ds = ekd.from_source("file", [["test.grib", None], ["test6.grib", [(0, 240), (480, 240)]]])
ds.to_fieldlist().ls()
[10]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 2t 2020-05-13 12:00:00 2020-05-13 12:00:00 0 days 0 surface 0 regular_ll
1 msl 2020-05-13 12:00:00 2020-05-13 12:00:00 0 days 0 surface 0 regular_ll
2 t 2018-08-01 12:00:00 2018-08-01 12:00:00 0 days 1000 pressure 0 regular_ll
3 v 2018-08-01 12:00:00 2018-08-01 12:00:00 0 days 1000 pressure 0 regular_ll

The parts kwarg can still be used for multiple files; in this case it will be applied to each of them one by one.

[11]:
ds = ekd.from_source("file", ["test6.grib", "tuv_pl.grib"], parts=(0, 240))
ds.to_fieldlist().ls()
[11]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 t 2018-08-01 12:00:00 2018-08-01 12:00:00 0 days 1000 pressure 0 regular_ll
1 t 2018-08-01 12:00:00 2018-08-01 12:00:00 0 days 1000 pressure 0 regular_ll
[ ]: