Writing GRIB data to Zarr

[1]:
# get input GRIB data
import earthkit.data as ekd
ds = ekd.from_source("sample", "pl.grib")

This data contains 32 fields: several forecasts on pressure levels for 2 parameters. We can check its content with describe().

[2]:
ds.describe()
[2]:
    level date time step paramId class stream type experimentVersionNumber
shortName typeOfLevel                  
r isobaricInhPa 700,500 20240603,20240604 0,1200 0,6 157 od oper fc 0001
t isobaricInhPa 700,500 20240603,20240604 0,1200 0,6 130 od oper fc 0001

Using to_target() on the data object

We use to_target() to write the GRIB fieldlist/field into a zarr store. First, the data is converted to Xarray then xarray.Dataset.to_zarr() is called to generate the zarr store. We need to set the kwargs accordingly.

[3]:
# with these options each field will be a separate chunk
ds.to_target("zarr",
             earthkit_to_xarray_kwargs={"chunks": {"forecast_reference_time": 1,
                                                   "step": 1,
                                                   "level": 1}},
             xarray_to_zarr_kwargs={"store": "_pl.zarr", "mode": "w"})
/opt/homebrew/Caskroom/miniforge/base/envs/dev/lib/python3.11/site-packages/zarr/api/asynchronous.py:205: UserWarning: Consolidated metadata is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.
  warnings.warn(
[4]:
import zarr
root = zarr.group("_pl.zarr")
root.tree()
[4]:
/
├── forecast_reference_time (4,) int64
├── latitude (19,) float64
├── level (2,) int64
├── longitude (36,) float64
├── r (4, 2, 2, 19, 36) float64
├── step (2,) int64
└── t (4, 2, 2, 19, 36) float64
[5]:
root["t"].info
[5]:
Type               : Array
Zarr format        : 3
Data type          : DataType.float64
Shape              : (4, 2, 2, 19, 36)
Chunk shape        : (1, 1, 1, 19, 36)
Order              : C
Read-only          : False
Store type         : LocalStore
Filters            : ()
Serializer         : BytesCodec(endian=<Endian.little: 'little'>)
Compressors        : (ZstdCodec(level=0, checksum=False),)
No. bytes          : 87552 (85.5K)

The zarr store can be loaded to Xarray to check its content.

[6]:
import xarray
xarray.open_dataset("_pl.zarr")
/var/folders/93/w0p869rx17q98wxk83gn9ys40000gn/T/ipykernel_45349/754541422.py:2: FutureWarning: In a future version of xarray decode_timedelta will default to False rather than None. To silence this warning, set decode_timedelta to True, False, or a 'CFTimedeltaCoder' instance.
  xarray.open_dataset("_pl.zarr")
[6]:
<xarray.Dataset> Size: 176kB
Dimensions:                  (step: 2, longitude: 36,
                              forecast_reference_time: 4, latitude: 19, level: 2)
Coordinates:
  * step                     (step) timedelta64[ns] 16B 00:00:00 06:00:00
  * longitude                (longitude) float64 288B 0.0 10.0 ... 340.0 350.0
  * forecast_reference_time  (forecast_reference_time) datetime64[ns] 32B 202...
  * latitude                 (latitude) float64 152B 90.0 80.0 ... -80.0 -90.0
  * level                    (level) int64 16B 500 700
Data variables:
    r                        (forecast_reference_time, step, level, latitude, longitude) float64 88kB ...
    t                        (forecast_reference_time, step, level, latitude, longitude) float64 88kB ...
Attributes:
    class:        od
    stream:       oper
    levtype:      pl
    type:         fc
    expver:       0001
    date:         20240603
    time:         0
    domain:       g
    number:       0
    Conventions:  CF-1.8
    institution:  ECMWF
[ ]: