{ "cells": [ { "cell_type": "markdown", "id": "b74f6a93-148e-4668-a401-6104f54838e1", "metadata": {}, "source": [ "## Xarray engine: mono variable with remapping" ] }, { "cell_type": "markdown", "id": "8b3010be-7895-4fd2-8ad6-a57db7845396", "metadata": {}, "source": [ "This notebook demonstrates how to generate an Xarray with a single dataarray containing all the parameters from a GRIB fieldlist. This data structure is often needed for machine learning." ] }, { "cell_type": "markdown", "id": "190cab07-5f50-410c-9c56-604a95a64ef0", "metadata": {}, "source": [ "First, we get GRIB data containing multiple forecasts on the surface and pressure levels. We select a single forecast out of it." ] }, { "cell_type": "code", "execution_count": 1, "id": "613fc34e-0a19-49cb-9e12-129a9b50f08c", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "edad67397eaf4b6eb526d2e06a6717bd", "version_major": 2, "version_minor": 0 }, "text/plain": [ "mixed_pl_sfc.grib: 0%| | 0.00/390k [00:00, ?B/s]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import earthkit.data as ekd\n", "ds_fl = ekd.from_source(\"sample\", \"mixed_pl_sfc.grib\").sel(date=20240603, time=0)" ] }, { "cell_type": "raw", "id": "20e27e09-02fb-4f80-9224-03c3c54ecbf8", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "Next, we convert the GRIB Fieldlist to Xarray with :py:meth:`~data.readers.grib.index.GribFieldList.to_xarray`. The goal is to create a single variable in the dataset called \"data\". Since we have both surface and pressure level parameters the input data does not form a full hypercube. To overcome this problem we use the ``remapping`` option to merge the \"param\" and \"level\" metadata keys into a single key. With ``fixed_dims`` we define the dimensions and their order to use and ``mono_variable=True`` ensures a single dataarray will be created. " ] }, { "cell_type": "code", "execution_count": 2, "id": "9ec900dd-2670-41f6-be0c-cdcee6205ee1", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
<xarray.Dataset> Size: 362kB\n",
"Dimensions: (valid_time: 2, param: 32, number: 1, values: 684)\n",
"Coordinates:\n",
" * valid_time (valid_time) datetime64[ns] 16B 2024-06-03 2024-06-03T06:00:00\n",
" * param (param) <U6 768B '2t_0' 'msl_0' 'r_1000' ... 'z_700' 'z_850'\n",
" * number (number) int64 8B 0\n",
" latitude (values) float64 5kB dask.array<chunksize=(684,), meta=np.ndarray>\n",
" longitude (values) float64 5kB dask.array<chunksize=(684,), meta=np.ndarray>\n",
"Dimensions without coordinates: values\n",
"Data variables:\n",
" data (valid_time, param, number, values) float64 350kB dask.array<chunksize=(1, 32, 1, 684), meta=np.ndarray>\n",
"Attributes:\n",
" paramId: 167\n",
" class: od\n",
" stream: oper\n",
" levtype: sfc\n",
" type: fc\n",
" expver: 0001\n",
" date: 20240603\n",
" time: 0\n",
" domain: g\n",
" Conventions: CF-1.8\n",
" institution: ECMWF<xarray.DataArray 'data' (valid_time: 2, param: 32, number: 1, values: 684)> Size: 350kB\n",
"dask.array<open_dataset-data, shape=(2, 32, 1, 684), dtype=float64, chunksize=(1, 32, 1, 684), chunktype=numpy.ndarray>\n",
"Coordinates:\n",
" * valid_time (valid_time) datetime64[ns] 16B 2024-06-03 2024-06-03T06:00:00\n",
" * param (param) <U6 768B '2t_0' 'msl_0' 'r_1000' ... 'z_700' 'z_850'\n",
" * number (number) int64 8B 0\n",
" latitude (values) float64 5kB dask.array<chunksize=(684,), meta=np.ndarray>\n",
" longitude (values) float64 5kB dask.array<chunksize=(684,), meta=np.ndarray>\n",
"Dimensions without coordinates: values\n",
"Attributes:\n",
" standard_name: unknown\n",
" long_name: 2 metre temperature\n",
" units: K