Reading file parts¶

[1]:

import earthkit.data as ekd

This notebook demonstrates how to use only parts (byte ranges) from GRIB files.

First we ensure the example files are available.

[2]:

ekd.download_example_file(["test.grib", "test6.grib", "tuv_pl.grib"])

We load one of the files and inspect the contents with ls(). By using the “offset” key we can get the byte positions where each message starts within the file.

[3]:

ds = ekd.from_source("file", "test6.grib")
ds.to_fieldlist().ls(keys="metadata.offset")

[3]:

	metadata.offset
0	0.0
1	240.0
2	480.0
3	720.0
4	960.0
5	1200.0

Single files¶

The parts option in from_source() specifies the byte range(s) we want to read from a file. A single part is a tuple or list in the following format: (offset, length).

Using the offsets from the example above we can specify the part for the fist message.

[4]:

ds = ekd.from_source("file", "test6.grib", parts=(0, 240))
ds.to_fieldlist().ls()

[4]:

	parameter.variable	time.valid_datetime	time.base_datetime	time.step	vertical.level	vertical.level_type	ensemble.member	geography.grid_type
0	t	2018-08-01 12:00:00	2018-08-01 12:00:00	0 days	1000	pressure	0	regular_ll

The call above can also be written as:

[5]:

ds = ekd.from_source("file", "test6.grib", parts=[(0, 240)])
ds.to_fieldlist().ls()

[5]:

	parameter.variable	time.valid_datetime	time.base_datetime	time.step	vertical.level	vertical.level_type	ensemble.member	geography.grid_type
0	t	2018-08-01 12:00:00	2018-08-01 12:00:00	0 days	1000	pressure	0	regular_ll

A part can go over a message boundary. Here bytes 240-244 belong to the second message, which is not read because not all of its bytes are specified.

[6]:

ds = ekd.from_source("file", "test6.grib", parts=[(0, 245)]).to_fieldlist()
ds.to_fieldlist().ls()

[6]:

	parameter.variable	time.valid_datetime	time.base_datetime	time.step	vertical.level	vertical.level_type	ensemble.member	geography.grid_type
0	t	2018-08-01 12:00:00	2018-08-01 12:00:00	0 days	1000	pressure	0	regular_ll

Multiple parts can be used.

[7]:

ds = ekd.from_source("file", "test6.grib", parts=[(0, 240), (480, 480)])
ds.to_fieldlist().ls()

[7]:

	parameter.variable	time.valid_datetime	time.base_datetime	vertical.level	vertical.level_type	geography.grid_type
0	t	2018-08-01 12:00:00	2018-08-01 12:00:00	1000	pressure	regular_ll
1	v	2018-08-01 12:00:00	2018-08-01 12:00:00	1000	pressure	regular_ll
2	t	2018-08-01 12:00:00	2018-08-01 12:00:00	850	pressure	regular_ll

Parts cannot overlap.

[8]:

try:
    ds = ekd.from_source("file", "test6.grib", parts=[(0, 240), (220, 240)])
except Exception as e:
    print(e)

Offsets and lengths must be in order, and not overlapping: offset=220, end of previous part=240

Multiple files¶

When using multiple files we can specify the ref:part <parts> for each file with the following syntax:

[9]:

ds = ekd.from_source("file", [["test.grib", (0, 526)], ["test6.grib", [(0, 240), (480, 240)]]])
ds.to_fieldlist().ls()

[9]:

	parameter.variable	time.valid_datetime	time.base_datetime	vertical.level	vertical.level_type	geography.grid_type
0	2t	2020-05-13 12:00:00	2020-05-13 12:00:00	0	surface	regular_ll
1	t	2018-08-01 12:00:00	2018-08-01 12:00:00	1000	pressure	regular_ll
2	v	2018-08-01 12:00:00	2018-08-01 12:00:00	1000	pressure	regular_ll

When a part is None for a given file the whole file will be used.

[10]:

ds = ekd.from_source("file", [["test.grib", None], ["test6.grib", [(0, 240), (480, 240)]]])
ds.to_fieldlist().ls()

[10]:

	parameter.variable	time.valid_datetime	time.base_datetime	vertical.level	vertical.level_type	geography.grid_type
0	2t	2020-05-13 12:00:00	2020-05-13 12:00:00	0	surface	regular_ll
1	msl	2020-05-13 12:00:00	2020-05-13 12:00:00	0	surface	regular_ll
2	t	2018-08-01 12:00:00	2018-08-01 12:00:00	1000	pressure	regular_ll
3	v	2018-08-01 12:00:00	2018-08-01 12:00:00	1000	pressure	regular_ll

The parts kwarg can still be used for multiple files; in this case it will be applied to each of them one by one.

[11]:

ds = ekd.from_source("file", ["test6.grib", "tuv_pl.grib"], parts=(0, 240))
ds.to_fieldlist().ls()

[11]:

	parameter.variable	time.valid_datetime	time.base_datetime	time.step	vertical.level	vertical.level_type	ensemble.member	geography.grid_type
0	t	2018-08-01 12:00:00	2018-08-01 12:00:00	0 days	1000	pressure	0	regular_ll
1	t	2018-08-01 12:00:00	2018-08-01 12:00:00	0 days	1000	pressure	0	regular_ll

[ ]: