Reading file parts

[1]:
import earthkit.data as ekd

This notebook demonstrates how to use only parts (byte ranges) from GRIB files.

First we ensure the example files are available.

[2]:
ekd.download_example_file(["test.grib", "test6.grib", "tuv_pl.grib"])

We load one of the files and inspect the contents with ls(). By using the “offset” key we can get the byte positions where each message starts within the file.

[3]:
ds = ekd.from_source("file", "test6.grib")
ds.ls(extra_keys="offset")
[3]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType offset
0 ecmf t isobaricInhPa 1000 20180801 1200 0 an 0 regular_ll 0.0
1 ecmf u isobaricInhPa 1000 20180801 1200 0 an 0 regular_ll 240.0
2 ecmf v isobaricInhPa 1000 20180801 1200 0 an 0 regular_ll 480.0
3 ecmf t isobaricInhPa 850 20180801 1200 0 an 0 regular_ll 720.0
4 ecmf u isobaricInhPa 850 20180801 1200 0 an 0 regular_ll 960.0
5 ecmf v isobaricInhPa 850 20180801 1200 0 an 0 regular_ll 1200.0

Single files

The parts option in from_source() specifies the byte range(s) we want to read from a file. A single part is a tuple or list in the following format: (offset, length).

Using the offsets from the example above we can specify the part for the fist message.

[4]:
ds = ekd.from_source("file", "test6.grib", parts=(0, 240))
ds.ls()
[4]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType
0 ecmf t isobaricInhPa 1000 20180801 1200 0 an 0 regular_ll

The call above can also be written as:

[5]:
ds = ekd.from_source("file", "test6.grib", parts=[(0, 240)])
ds.ls()
[5]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType
0 ecmf t isobaricInhPa 1000 20180801 1200 0 an 0 regular_ll

A part can go over a message boundary. Here bytes 240-244 belong to the second message, which is not read because not all of its bytes are specified.

[6]:
ds = ekd.from_source("file", "test6.grib", parts=[(0, 245)])
ds.ls()
[6]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType
0 ecmf t isobaricInhPa 1000 20180801 1200 0 an 0 regular_ll

Multiple parts can be used.

[7]:
ds = ekd.from_source("file", "test6.grib", parts=[(0, 240), (480, 480)])
ds.ls()
[7]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType
0 ecmf t isobaricInhPa 1000 20180801 1200 0 an 0 regular_ll
1 ecmf v isobaricInhPa 1000 20180801 1200 0 an 0 regular_ll
2 ecmf t isobaricInhPa 850 20180801 1200 0 an 0 regular_ll

Parts cannot overlap.

[8]:
try:
    ds = ekd.from_source("file", "test6.grib", parts=[(0, 240), (220, 240)])
except Exception as e:
    print(e)
Offsets and lengths must be in order, and not overlapping: offset=220, end of previous part=240

Multiple files

When using multiple files we can specify the ref:part <parts> for each file with the following syntax:

[9]:
ds = ekd.from_source("file", [
                               ["test.grib", (0,526)],
                               ["test6.grib", [(0, 240), (480, 240)]]
                              ])
ds.ls()
[9]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType
0 ecmf 2t surface 0 20200513 1200 0 an 0 regular_ll
1 ecmf t isobaricInhPa 1000 20180801 1200 0 an 0 regular_ll
2 ecmf v isobaricInhPa 1000 20180801 1200 0 an 0 regular_ll

When a part is None for a given file the whole file will be used.

[10]:
ds = ekd.from_source("file", [
                               ["test.grib", None],
                               ["test6.grib", [(0,240), (480, 240)]]
                              ])
ds.ls()
[10]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType
0 ecmf 2t surface 0 20200513 1200 0 an 0 regular_ll
1 ecmf msl surface 0 20200513 1200 0 an 0 regular_ll
2 ecmf t isobaricInhPa 1000 20180801 1200 0 an 0 regular_ll
3 ecmf v isobaricInhPa 1000 20180801 1200 0 an 0 regular_ll

The parts kwarg can still be used for multiple files; in this case it will be applied to each of them one by one.

[11]:
ds = ekd.from_source("file", ["test6.grib", "tuv_pl.grib"], parts=(0,240))
ds.ls()
[11]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType
0 ecmf t isobaricInhPa 1000 20180801 1200 0 an 0 regular_ll
1 ecmf t isobaricInhPa 1000 20180801 1200 0 an 0 regular_ll
[ ]: