Xarray engine: mono variable with remapping¶
This notebook demonstrates how to generate an Xarray with a single DataArray containing all the parameters from a GRIB fieldlist. This data structure is often needed for machine learning.
First, we get GRIB data containing multiple forecasts on the surface and pressure levels. We select a single forecast out of it.
[1]:
import earthkit.data as ekd
ds_fl = ekd.from_source("sample", "mixed_pl_sfc.grib").to_fieldlist().sel({"time.base_datetime": "2024-06-03T00"})
Next, we convert the GRIB Fieldlist to Xarray with to_xarray(). The goal is to create a single variable in the dataset called “data”. Since we have both surface and pressure level parameters, the input data does not form a full hypercube. To overcome this problem we use the remapping option to merge the “parameter.variable” and “vertical.level” metadata keys into a single key. With fixed_dims we define the dimensions and their order to use and mono_variable=True ensures a single DataArray will be created.
[2]:
ds = ds_fl.to_xarray(
fixed_dims=["time.valid_datetime", "param", "ensemble.member"],
mono_variable=True,
chunks={"valid_time": 1},
flatten_values=True,
add_earthkit_attrs=False,
remapping={"param": "{parameter.variable}_{vertical.level}"},
)
ds
[2]:
<xarray.Dataset> Size: 362kB
Dimensions: (valid_datetime: 2, param: 32, member: 1, values: 684)
Coordinates:
* valid_datetime (valid_datetime) datetime64[ns] 16B 2024-06-03 2024-06-03...
* param (param) <U6 768B '2t_0' 'msl_0' 'r_1000' ... 'z_700' 'z_850'
* member (member) <U1 4B '0'
latitude (values) float64 5kB dask.array<chunksize=(684,), meta=np.ndarray>
longitude (values) float64 5kB dask.array<chunksize=(684,), meta=np.ndarray>
Dimensions without coordinates: values
Data variables:
data (valid_datetime, param, member, values) float64 350kB dask.array<chunksize=(2, 32, 1, 684), meta=np.ndarray>
Attributes:
Conventions: CF-1.8
institution: ECMWFWhen generating the Xarray we flattened the field values and chose the chunking so that one chunk would contain all the data belonging to a given valid time.
[3]:
ds["data"]
[3]:
<xarray.DataArray 'data' (valid_datetime: 2, param: 32, member: 1, values: 684)> Size: 350kB
dask.array<open_dataset-data, shape=(2, 32, 1, 684), dtype=float64, chunksize=(2, 32, 1, 684), chunktype=numpy.ndarray>
Coordinates:
* valid_datetime (valid_datetime) datetime64[ns] 16B 2024-06-03 2024-06-03...
* param (param) <U6 768B '2t_0' 'msl_0' 'r_1000' ... 'z_700' 'z_850'
* member (member) <U1 4B '0'
latitude (values) float64 5kB dask.array<chunksize=(684,), meta=np.ndarray>
longitude (values) float64 5kB dask.array<chunksize=(684,), meta=np.ndarray>
Dimensions without coordinates: values
Attributes:
standard_name: unknown
long_name: 2 metre temperature
units: kelvin
level_type: surface[ ]: