Reading file parts
[1]:
import earthkit.data as ekd
This notebook demonstrates how to use only parts (byte ranges) from GRIB files.
First we ensure the example files are available.
[2]:
ekd.download_example_file(["test.grib", "test6.grib", "tuv_pl.grib"])
We load one of the files and inspect the contents with ls(). By using the “offset” key we can get the byte positions where each message starts within the file.
[3]:
ds = ekd.from_source("file", "test6.grib")
ds.ls(extra_keys="offset")
[3]:
| centre | shortName | typeOfLevel | level | dataDate | dataTime | stepRange | dataType | number | gridType | offset | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ecmf | t | isobaricInhPa | 1000 | 20180801 | 1200 | 0 | an | 0 | regular_ll | 0.0 |
| 1 | ecmf | u | isobaricInhPa | 1000 | 20180801 | 1200 | 0 | an | 0 | regular_ll | 240.0 |
| 2 | ecmf | v | isobaricInhPa | 1000 | 20180801 | 1200 | 0 | an | 0 | regular_ll | 480.0 |
| 3 | ecmf | t | isobaricInhPa | 850 | 20180801 | 1200 | 0 | an | 0 | regular_ll | 720.0 |
| 4 | ecmf | u | isobaricInhPa | 850 | 20180801 | 1200 | 0 | an | 0 | regular_ll | 960.0 |
| 5 | ecmf | v | isobaricInhPa | 850 | 20180801 | 1200 | 0 | an | 0 | regular_ll | 1200.0 |
Single files
The parts option in from_source() specifies the byte range(s) we want to read from a file. A single part is a tuple or list in the following format: (offset, length).
Using the offsets from the example above we can specify the part for the fist message.
[4]:
ds = ekd.from_source("file", "test6.grib", parts=(0, 240))
ds.ls()
[4]:
| centre | shortName | typeOfLevel | level | dataDate | dataTime | stepRange | dataType | number | gridType | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ecmf | t | isobaricInhPa | 1000 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
The call above can also be written as:
[5]:
ds = ekd.from_source("file", "test6.grib", parts=[(0, 240)])
ds.ls()
[5]:
| centre | shortName | typeOfLevel | level | dataDate | dataTime | stepRange | dataType | number | gridType | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ecmf | t | isobaricInhPa | 1000 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
A part can go over a message boundary. Here bytes 240-244 belong to the second message, which is not read because not all of its bytes are specified.
[6]:
ds = ekd.from_source("file", "test6.grib", parts=[(0, 245)])
ds.ls()
[6]:
| centre | shortName | typeOfLevel | level | dataDate | dataTime | stepRange | dataType | number | gridType | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ecmf | t | isobaricInhPa | 1000 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
Multiple parts can be used.
[7]:
ds = ekd.from_source("file", "test6.grib", parts=[(0, 240), (480, 480)])
ds.ls()
[7]:
| centre | shortName | typeOfLevel | level | dataDate | dataTime | stepRange | dataType | number | gridType | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ecmf | t | isobaricInhPa | 1000 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 1 | ecmf | v | isobaricInhPa | 1000 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 2 | ecmf | t | isobaricInhPa | 850 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
Parts cannot overlap.
[8]:
try:
ds = ekd.from_source("file", "test6.grib", parts=[(0, 240), (220, 240)])
except Exception as e:
print(e)
Offsets and lengths must be in order, and not overlapping: offset=220, end of previous part=240
Multiple files
When using multiple files we can specify the ref:part <parts> for each file with the following syntax:
[9]:
ds = ekd.from_source("file", [
["test.grib", (0,526)],
["test6.grib", [(0, 240), (480, 240)]]
])
ds.ls()
[9]:
| centre | shortName | typeOfLevel | level | dataDate | dataTime | stepRange | dataType | number | gridType | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ecmf | 2t | surface | 0 | 20200513 | 1200 | 0 | an | 0 | regular_ll |
| 1 | ecmf | t | isobaricInhPa | 1000 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 2 | ecmf | v | isobaricInhPa | 1000 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
When a part is None for a given file the whole file will be used.
[10]:
ds = ekd.from_source("file", [
["test.grib", None],
["test6.grib", [(0,240), (480, 240)]]
])
ds.ls()
[10]:
| centre | shortName | typeOfLevel | level | dataDate | dataTime | stepRange | dataType | number | gridType | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ecmf | 2t | surface | 0 | 20200513 | 1200 | 0 | an | 0 | regular_ll |
| 1 | ecmf | msl | surface | 0 | 20200513 | 1200 | 0 | an | 0 | regular_ll |
| 2 | ecmf | t | isobaricInhPa | 1000 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 3 | ecmf | v | isobaricInhPa | 1000 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
The parts kwarg can still be used for multiple files; in this case it will be applied to each of them one by one.
[11]:
ds = ekd.from_source("file", ["test6.grib", "tuv_pl.grib"], parts=(0,240))
ds.ls()
[11]:
| centre | shortName | typeOfLevel | level | dataDate | dataTime | stepRange | dataType | number | gridType | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ecmf | t | isobaricInhPa | 1000 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 1 | ecmf | t | isobaricInhPa | 1000 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
[ ]: