Xarray engine: mono variable
This notebook demonstrates how to generate an Xarray with a single dataarray containing all the parameters from a GRIB fieldlist. This data structure is often needed for machine learning.
First, we get 2m temperature and dewpoint data for a whole year on a low resolution regular latitude-longitude grid. It contains 2 fields per day (at 0 and 12 UTC) per parameter.
[1]:
import earthkit.data as ekd
ds_fl = ekd.from_source("sample", "t2_td2_1_year.grib")
len(ds_fl)
[1]:
1464
Next, we convert the GRIB Fieldlist to Xarray with to_xarray(). There will be a single variable in the dataset called “data”.
[2]:
ds = ds_fl.to_xarray(fixed_dims=["valid_time", "param"],
mono_variable=True,
chunks={"valid_time": 1},
flatten_values=True,
add_earthkit_attrs=False,
)
ds
[2]:
<xarray.Dataset> Size: 111kB
Dimensions: (valid_time: 732, param: 2, values: 9)
Coordinates:
* valid_time (valid_time) datetime64[ns] 6kB 2020-01-01 ... 2020-12-31T12:...
* param (param) <U2 16B '2d' '2t'
latitude (values) float64 72B dask.array<chunksize=(9,), meta=np.ndarray>
longitude (values) float64 72B dask.array<chunksize=(9,), meta=np.ndarray>
Dimensions without coordinates: values
Data variables:
data (valid_time, param, values) float64 105kB dask.array<chunksize=(1, 2, 9), meta=np.ndarray>
Attributes:
paramId: 168
class: d1
stream: clte
levtype: sfc
type: fc
expver: 0001
date: 20200101
time: 0
domain: g
Conventions: CF-1.8
institution: ECMWFWhen generating the Xarray we flattened the field values and chose the chunking so that one chunk would contain all the data belonging to a given valid time.
[3]:
ds["data"]
[3]:
<xarray.DataArray 'data' (valid_time: 732, param: 2, values: 9)> Size: 105kB
dask.array<open_dataset-data, shape=(732, 2, 9), dtype=float64, chunksize=(1, 2, 9), chunktype=numpy.ndarray>
Coordinates:
* valid_time (valid_time) datetime64[ns] 6kB 2020-01-01 ... 2020-12-31T12:...
* param (param) <U2 16B '2d' '2t'
latitude (values) float64 72B dask.array<chunksize=(9,), meta=np.ndarray>
longitude (values) float64 72B dask.array<chunksize=(9,), meta=np.ndarray>
Dimensions without coordinates: values
Attributes:
standard_name: unknown
long_name: 2 metre dewpoint temperature
units: KAdding ensemble dimension
We add the ensemble member as an additional dimension to the generated Xarray. Because the input is not ensemble data the value of the “number” ecCodes key can be missing. So we need to provide a meaningful default with the fill_metadata kwarg to be able to build the “number” dimension.
[4]:
ds = ds_fl.to_xarray(fixed_dims=["valid_time", "param", "number"],
mono_variable=True,
chunks={"valid_time": 1},
flatten_values=True,
add_earthkit_attrs=False,
fill_metadata={"number": 0},
)
ds
[4]:
<xarray.Dataset> Size: 111kB
Dimensions: (valid_time: 732, param: 2, number: 1, values: 9)
Coordinates:
* valid_time (valid_time) datetime64[ns] 6kB 2020-01-01 ... 2020-12-31T12:...
* param (param) <U2 16B '2d' '2t'
* number (number) int64 8B 0
latitude (values) float64 72B dask.array<chunksize=(9,), meta=np.ndarray>
longitude (values) float64 72B dask.array<chunksize=(9,), meta=np.ndarray>
Dimensions without coordinates: values
Data variables:
data (valid_time, param, number, values) float64 105kB dask.array<chunksize=(1, 2, 1, 9), meta=np.ndarray>
Attributes:
paramId: 168
class: d1
stream: clte
levtype: sfc
type: fc
expver: 0001
date: 20200101
time: 0
domain: g
Conventions: CF-1.8
institution: ECMWF