Writing GRIB data to Zarr

[1]:

# get input GRIB data
import earthkit.data as ekd
ds = ekd.from_source("sample", "pl.grib")

This data contains 32 fields: several forecasts on pressure levels for 2 parameters. We can check its content with describe().

[2]:

ds.describe()

[2]:

		level	date	time	step	paramId	class	stream	type	experimentVersionNumber
shortName	typeOfLevel
r	isobaricInhPa	700,500	20240603,20240604	0,1200	0,6	157	od	oper	fc	0001
t	isobaricInhPa	700,500	20240603,20240604	0,1200	0,6	130	od	oper	fc	0001

Using to_target() on the data object

We use to_target() to write the GRIB fieldlist/field into a zarr store. First, the data is converted to Xarray then xarray.Dataset.to_zarr() is called to generate the zarr store. We need to set the kwargs accordingly.

[3]:

# with these options each field will be a separate chunk
ds.to_target("zarr",
             earthkit_to_xarray_kwargs={"chunks": {"forecast_reference_time": 1,
                                                   "step": 1,
                                                   "level": 1}},
             xarray_to_zarr_kwargs={"store": "_pl.zarr", "mode": "w"})

/opt/homebrew/Caskroom/miniforge/base/envs/dev/lib/python3.11/site-packages/zarr/api/asynchronous.py:205: UserWarning: Consolidated metadata is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.
  warnings.warn(

[4]:

import zarr
root = zarr.group("_pl.zarr")
root.tree()

[4]:

/
├── forecast_reference_time (4,) int64
├── latitude (19,) float64
├── level (2,) int64
├── longitude (36,) float64
├── r (4, 2, 2, 19, 36) float64
├── step (2,) int64
└── t (4, 2, 2, 19, 36) float64

[5]:

root["t"].info

[5]:

Type               : Array
Zarr format        : 3
Data type          : DataType.float64
Shape              : (4, 2, 2, 19, 36)
Chunk shape        : (1, 1, 1, 19, 36)
Order              : C
Read-only          : False
Store type         : LocalStore
Filters            : ()
Serializer         : BytesCodec(endian=<Endian.little: 'little'>)
Compressors        : (ZstdCodec(level=0, checksum=False),)
No. bytes          : 87552 (85.5K)

The zarr store can be loaded to Xarray to check its content.

[6]:

import xarray
xarray.open_dataset("_pl.zarr")

/var/folders/93/w0p869rx17q98wxk83gn9ys40000gn/T/ipykernel_45349/754541422.py:2: FutureWarning: In a future version of xarray decode_timedelta will default to False rather than None. To silence this warning, set decode_timedelta to True, False, or a 'CFTimedeltaCoder' instance.
  xarray.open_dataset("_pl.zarr")

[ ]: