Reading file parts¶
[1]:
import earthkit.data as ekd
This notebook demonstrates how to use only parts (byte ranges) from GRIB files.
First we ensure the example files are available.
[2]:
ekd.download_example_file(["test.grib", "test6.grib", "tuv_pl.grib"])
We load one of the files and inspect the contents with ls(). By using the “offset” key we can get the byte positions where each message starts within the file.
[3]:
ds = ekd.from_source("file", "test6.grib")
ds.to_fieldlist().ls(keys="metadata.offset")
[3]:
| metadata.offset | |
|---|---|
| 0 | 0.0 |
| 1 | 240.0 |
| 2 | 480.0 |
| 3 | 720.0 |
| 4 | 960.0 |
| 5 | 1200.0 |
Single files¶
The parts option in from_source() specifies the byte range(s) we want to read from a file. A single part is a tuple or list in the following format: (offset, length).
Using the offsets from the example above we can specify the part for the fist message.
[4]:
ds = ekd.from_source("file", "test6.grib", parts=(0, 240))
ds.to_fieldlist().ls()
[4]:
| parameter.variable | time.valid_datetime | time.base_datetime | time.step | vertical.level | vertical.level_type | ensemble.member | geography.grid_type | |
|---|---|---|---|---|---|---|---|---|
| 0 | t | 2018-08-01 12:00:00 | 2018-08-01 12:00:00 | 0 days | 1000 | pressure | 0 | regular_ll |
The call above can also be written as:
[5]:
ds = ekd.from_source("file", "test6.grib", parts=[(0, 240)])
ds.to_fieldlist().ls()
[5]:
| parameter.variable | time.valid_datetime | time.base_datetime | time.step | vertical.level | vertical.level_type | ensemble.member | geography.grid_type | |
|---|---|---|---|---|---|---|---|---|
| 0 | t | 2018-08-01 12:00:00 | 2018-08-01 12:00:00 | 0 days | 1000 | pressure | 0 | regular_ll |
A part can go over a message boundary. Here bytes 240-244 belong to the second message, which is not read because not all of its bytes are specified.
[6]:
ds = ekd.from_source("file", "test6.grib", parts=[(0, 245)]).to_fieldlist()
ds.to_fieldlist().ls()
[6]:
| parameter.variable | time.valid_datetime | time.base_datetime | time.step | vertical.level | vertical.level_type | ensemble.member | geography.grid_type | |
|---|---|---|---|---|---|---|---|---|
| 0 | t | 2018-08-01 12:00:00 | 2018-08-01 12:00:00 | 0 days | 1000 | pressure | 0 | regular_ll |
Multiple parts can be used.
[7]:
ds = ekd.from_source("file", "test6.grib", parts=[(0, 240), (480, 480)])
ds.to_fieldlist().ls()
[7]:
| parameter.variable | time.valid_datetime | time.base_datetime | time.step | vertical.level | vertical.level_type | ensemble.member | geography.grid_type | |
|---|---|---|---|---|---|---|---|---|
| 0 | t | 2018-08-01 12:00:00 | 2018-08-01 12:00:00 | 0 days | 1000 | pressure | 0 | regular_ll |
| 1 | v | 2018-08-01 12:00:00 | 2018-08-01 12:00:00 | 0 days | 1000 | pressure | 0 | regular_ll |
| 2 | t | 2018-08-01 12:00:00 | 2018-08-01 12:00:00 | 0 days | 850 | pressure | 0 | regular_ll |
Parts cannot overlap.
[8]:
try:
ds = ekd.from_source("file", "test6.grib", parts=[(0, 240), (220, 240)])
except Exception as e:
print(e)
Offsets and lengths must be in order, and not overlapping: offset=220, end of previous part=240
Multiple files¶
When using multiple files we can specify the ref:part <parts> for each file with the following syntax:
[9]:
ds = ekd.from_source("file", [["test.grib", (0, 526)], ["test6.grib", [(0, 240), (480, 240)]]])
ds.to_fieldlist().ls()
[9]:
| parameter.variable | time.valid_datetime | time.base_datetime | time.step | vertical.level | vertical.level_type | ensemble.member | geography.grid_type | |
|---|---|---|---|---|---|---|---|---|
| 0 | 2t | 2020-05-13 12:00:00 | 2020-05-13 12:00:00 | 0 days | 0 | surface | 0 | regular_ll |
| 1 | t | 2018-08-01 12:00:00 | 2018-08-01 12:00:00 | 0 days | 1000 | pressure | 0 | regular_ll |
| 2 | v | 2018-08-01 12:00:00 | 2018-08-01 12:00:00 | 0 days | 1000 | pressure | 0 | regular_ll |
When a part is None for a given file the whole file will be used.
[10]:
ds = ekd.from_source("file", [["test.grib", None], ["test6.grib", [(0, 240), (480, 240)]]])
ds.to_fieldlist().ls()
[10]:
| parameter.variable | time.valid_datetime | time.base_datetime | time.step | vertical.level | vertical.level_type | ensemble.member | geography.grid_type | |
|---|---|---|---|---|---|---|---|---|
| 0 | 2t | 2020-05-13 12:00:00 | 2020-05-13 12:00:00 | 0 days | 0 | surface | 0 | regular_ll |
| 1 | msl | 2020-05-13 12:00:00 | 2020-05-13 12:00:00 | 0 days | 0 | surface | 0 | regular_ll |
| 2 | t | 2018-08-01 12:00:00 | 2018-08-01 12:00:00 | 0 days | 1000 | pressure | 0 | regular_ll |
| 3 | v | 2018-08-01 12:00:00 | 2018-08-01 12:00:00 | 0 days | 1000 | pressure | 0 | regular_ll |
The parts kwarg can still be used for multiple files; in this case it will be applied to each of them one by one.
[11]:
ds = ekd.from_source("file", ["test6.grib", "tuv_pl.grib"], parts=(0, 240))
ds.to_fieldlist().ls()
[11]:
| parameter.variable | time.valid_datetime | time.base_datetime | time.step | vertical.level | vertical.level_type | ensemble.member | geography.grid_type | |
|---|---|---|---|---|---|---|---|---|
| 0 | t | 2018-08-01 12:00:00 | 2018-08-01 12:00:00 | 0 days | 1000 | pressure | 0 | regular_ll |
| 1 | t | 2018-08-01 12:00:00 | 2018-08-01 12:00:00 | 0 days | 1000 | pressure | 0 | regular_ll |
[ ]: