Xarray engine: mono variable¶
This notebook demonstrates how to generate an Xarray with a single DataArray containing all the parameters from a GRIB fieldlist. This data structure is often needed for machine learning.
First, we get 2m temperature and dewpoint data for a whole year on a low resolution regular latitude-longitude grid. It contains 2 fields per day (at 0 and 12 UTC) per parameter.
[1]:
import earthkit.data as ekd
ds_fl = ekd.from_source("sample", "t2_td2_1_year.grib").to_fieldlist()
len(ds_fl)
[1]:
1464
Next, we convert the GRIB Fieldlist to Xarray with to_xarray(). There will be a single variable in the dataset called “data”.
[2]:
ds = ds_fl.to_xarray(
fixed_dims=["time.valid_datetime", "parameter.variable"],
mono_variable=True,
chunks={"valid_time": 1},
flatten_values=True,
add_earthkit_attrs=False,
)
ds
[2]:
<xarray.Dataset> Size: 111kB
Dimensions: (valid_datetime: 732, variable: 2, values: 9)
Coordinates:
* valid_datetime (valid_datetime) datetime64[ns] 6kB 2020-01-01 ... 2020-1...
* variable (variable) <U2 16B '2d' '2t'
latitude (values) float64 72B dask.array<chunksize=(9,), meta=np.ndarray>
longitude (values) float64 72B dask.array<chunksize=(9,), meta=np.ndarray>
Dimensions without coordinates: values
Data variables:
data (valid_datetime, variable, values) float64 105kB dask.array<chunksize=(732, 2, 9), meta=np.ndarray>
Attributes:
Conventions: CF-1.8
institution: ECMWFWhen generating the Xarray we flattened the field values and chose the chunking so that one chunk would contain all the data belonging to a given valid time.
[3]:
ds["data"]
[3]:
<xarray.DataArray 'data' (valid_datetime: 732, variable: 2, values: 9)> Size: 105kB
dask.array<open_dataset-data, shape=(732, 2, 9), dtype=float64, chunksize=(732, 2, 9), chunktype=numpy.ndarray>
Coordinates:
* valid_datetime (valid_datetime) datetime64[ns] 6kB 2020-01-01 ... 2020-1...
* variable (variable) <U2 16B '2d' '2t'
latitude (values) float64 72B dask.array<chunksize=(9,), meta=np.ndarray>
longitude (values) float64 72B dask.array<chunksize=(9,), meta=np.ndarray>
Dimensions without coordinates: values
Attributes:
standard_name: unknown
long_name: 2 metre dewpoint temperature
units: kelvin
level_type: height_above_ground_levelAdding ensemble dimension¶
We add the ensemble member as an additional dimension to the generated Xarray. Because the input is not ensemble data the value of the ensemble.member can be None for some/all fields preventing us from forming this dimension. To overcome this problem we provide a meaningful default with the fill_metadata kwarg.
[4]:
ds = ds_fl.to_xarray(
fixed_dims=["time.valid_datetime", "parameter.variable", "ensemble.member"],
mono_variable=True,
chunks={"valid_time": 1},
flatten_values=True,
add_earthkit_attrs=False,
fill_metadata={"ensemble.member": "0"},
)
ds
[4]:
<xarray.Dataset> Size: 111kB
Dimensions: (valid_datetime: 732, variable: 2, member: 1, values: 9)
Coordinates:
* valid_datetime (valid_datetime) datetime64[ns] 6kB 2020-01-01 ... 2020-1...
* variable (variable) <U2 16B '2d' '2t'
* member (member) <U1 4B '0'
latitude (values) float64 72B dask.array<chunksize=(9,), meta=np.ndarray>
longitude (values) float64 72B dask.array<chunksize=(9,), meta=np.ndarray>
Dimensions without coordinates: values
Data variables:
data (valid_datetime, variable, member, values) float64 105kB dask.array<chunksize=(732, 2, 1, 9), meta=np.ndarray>
Attributes:
Conventions: CF-1.8
institution: ECMWF[ ]: