Writing GRIB data to Zarr
[1]:
# get input GRIB data
import earthkit.data as ekd
ds = ekd.from_source("sample", "pl.grib")
This data contains 32 fields: several forecasts on pressure levels for 2 parameters. We can check its content with describe().
[2]:
ds.describe()
[2]:
| level | date | time | step | paramId | class | stream | type | experimentVersionNumber | ||
|---|---|---|---|---|---|---|---|---|---|---|
| shortName | typeOfLevel | |||||||||
| r | isobaricInhPa | 700,500 | 20240603,20240604 | 0,1200 | 0,6 | 157 | od | oper | fc | 0001 |
| t | isobaricInhPa | 700,500 | 20240603,20240604 | 0,1200 | 0,6 | 130 | od | oper | fc | 0001 |
Using to_target() on the data object
We use to_target() to write the GRIB fieldlist/field into a zarr store. First, the data is converted to Xarray then xarray.Dataset.to_zarr() is called to generate the zarr store. We need to set the kwargs accordingly.
[3]:
# with these options each field will be a separate chunk
ds.to_target("zarr",
earthkit_to_xarray_kwargs={"chunks": {"forecast_reference_time": 1,
"step": 1,
"level": 1}},
xarray_to_zarr_kwargs={"store": "_pl.zarr", "mode": "w"})
/opt/homebrew/Caskroom/miniforge/base/envs/dev/lib/python3.11/site-packages/zarr/api/asynchronous.py:205: UserWarning: Consolidated metadata is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.
warnings.warn(
[4]:
import zarr
root = zarr.group("_pl.zarr")
root.tree()
[4]:
/ ├── forecast_reference_time (4,) int64 ├── latitude (19,) float64 ├── level (2,) int64 ├── longitude (36,) float64 ├── r (4, 2, 2, 19, 36) float64 ├── step (2,) int64 └── t (4, 2, 2, 19, 36) float64
[5]:
root["t"].info
[5]:
Type : Array
Zarr format : 3
Data type : DataType.float64
Shape : (4, 2, 2, 19, 36)
Chunk shape : (1, 1, 1, 19, 36)
Order : C
Read-only : False
Store type : LocalStore
Filters : ()
Serializer : BytesCodec(endian=<Endian.little: 'little'>)
Compressors : (ZstdCodec(level=0, checksum=False),)
No. bytes : 87552 (85.5K)
The zarr store can be loaded to Xarray to check its content.
[6]:
import xarray
xarray.open_dataset("_pl.zarr")
/var/folders/93/w0p869rx17q98wxk83gn9ys40000gn/T/ipykernel_45349/754541422.py:2: FutureWarning: In a future version of xarray decode_timedelta will default to False rather than None. To silence this warning, set decode_timedelta to True, False, or a 'CFTimedeltaCoder' instance.
xarray.open_dataset("_pl.zarr")
[6]:
<xarray.Dataset> Size: 176kB
Dimensions: (step: 2, longitude: 36,
forecast_reference_time: 4, latitude: 19, level: 2)
Coordinates:
* step (step) timedelta64[ns] 16B 00:00:00 06:00:00
* longitude (longitude) float64 288B 0.0 10.0 ... 340.0 350.0
* forecast_reference_time (forecast_reference_time) datetime64[ns] 32B 202...
* latitude (latitude) float64 152B 90.0 80.0 ... -80.0 -90.0
* level (level) int64 16B 500 700
Data variables:
r (forecast_reference_time, step, level, latitude, longitude) float64 88kB ...
t (forecast_reference_time, step, level, latitude, longitude) float64 88kB ...
Attributes:
class: od
stream: oper
levtype: pl
type: fc
expver: 0001
date: 20240603
time: 0
domain: g
number: 0
Conventions: CF-1.8
institution: ECMWF[ ]: