{ "cells": [ { "cell_type": "markdown", "id": "b74f6a93-148e-4668-a401-6104f54838e1", "metadata": {}, "source": [ "## Xarray engine: mono variable" ] }, { "cell_type": "markdown", "id": "8b3010be-7895-4fd2-8ad6-a57db7845396", "metadata": {}, "source": [ "This notebook demonstrates how to generate an Xarray with a single dataarray containing all the parameters from a GRIB fieldlist. This data structure is often needed for machine learning." ] }, { "cell_type": "markdown", "id": "190cab07-5f50-410c-9c56-604a95a64ef0", "metadata": {}, "source": [ "First, we get 2m temperature and dewpoint data for a whole year on a low resolution regular latitude-longitude grid. It contains 2 fields per day (at 0 and 12 UTC) per parameter. " ] }, { "cell_type": "code", "execution_count": 1, "id": "613fc34e-0a19-49cb-9e12-129a9b50f08c", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "30e0db26f4d74fcdaa6088a2a2fbc6e9", "version_major": 2, "version_minor": 0 }, "text/plain": [ "t2_td2_1_year.grib: 0%| | 0.00/515k [00:00\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset> Size: 111kB\n",
       "Dimensions:     (valid_time: 732, param: 2, values: 9)\n",
       "Coordinates:\n",
       "  * valid_time  (valid_time) datetime64[ns] 6kB 2020-01-01 ... 2020-12-31T12:...\n",
       "  * param       (param) <U2 16B '2d' '2t'\n",
       "    latitude    (values) float64 72B dask.array<chunksize=(9,), meta=np.ndarray>\n",
       "    longitude   (values) float64 72B dask.array<chunksize=(9,), meta=np.ndarray>\n",
       "Dimensions without coordinates: values\n",
       "Data variables:\n",
       "    data        (valid_time, param, values) float64 105kB dask.array<chunksize=(1, 2, 9), meta=np.ndarray>\n",
       "Attributes:\n",
       "    paramId:      168\n",
       "    class:        d1\n",
       "    stream:       clte\n",
       "    levtype:      sfc\n",
       "    type:         fc\n",
       "    expver:       0001\n",
       "    date:         20200101\n",
       "    time:         0\n",
       "    domain:       g\n",
       "    Conventions:  CF-1.8\n",
       "    institution:  ECMWF
" ], "text/plain": [ " Size: 111kB\n", "Dimensions: (valid_time: 732, param: 2, values: 9)\n", "Coordinates:\n", " * valid_time (valid_time) datetime64[ns] 6kB 2020-01-01 ... 2020-12-31T12:...\n", " * param (param) \n", " longitude (values) float64 72B dask.array\n", "Dimensions without coordinates: values\n", "Data variables:\n", " data (valid_time, param, values) float64 105kB dask.array\n", "Attributes:\n", " paramId: 168\n", " class: d1\n", " stream: clte\n", " levtype: sfc\n", " type: fc\n", " expver: 0001\n", " date: 20200101\n", " time: 0\n", " domain: g\n", " Conventions: CF-1.8\n", " institution: ECMWF" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds = ds_fl.to_xarray(fixed_dims=[\"valid_time\", \"param\"],\n", " mono_variable=True,\n", " chunks={\"valid_time\": 1}, \n", " flatten_values=True, \n", " add_earthkit_attrs=False, \n", " )\n", "ds" ] }, { "cell_type": "markdown", "id": "312fcd49-3625-41ef-8b21-84d0cd24f9a0", "metadata": {}, "source": [ "When generating the Xarray we flattened the field values and chose the chunking so that one chunk would contain all the data belonging to a given valid time." ] }, { "cell_type": "code", "execution_count": 3, "id": "5b687a1d-2a9a-4584-894a-d2175b9b0317", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'data' (valid_time: 732, param: 2, values: 9)> Size: 105kB\n",
       "dask.array<open_dataset-data, shape=(732, 2, 9), dtype=float64, chunksize=(1, 2, 9), chunktype=numpy.ndarray>\n",
       "Coordinates:\n",
       "  * valid_time  (valid_time) datetime64[ns] 6kB 2020-01-01 ... 2020-12-31T12:...\n",
       "  * param       (param) <U2 16B '2d' '2t'\n",
       "    latitude    (values) float64 72B dask.array<chunksize=(9,), meta=np.ndarray>\n",
       "    longitude   (values) float64 72B dask.array<chunksize=(9,), meta=np.ndarray>\n",
       "Dimensions without coordinates: values\n",
       "Attributes:\n",
       "    standard_name:  unknown\n",
       "    long_name:      2 metre dewpoint temperature\n",
       "    units:          K
" ], "text/plain": [ " Size: 105kB\n", "dask.array\n", "Coordinates:\n", " * valid_time (valid_time) datetime64[ns] 6kB 2020-01-01 ... 2020-12-31T12:...\n", " * param (param) \n", " longitude (values) float64 72B dask.array\n", "Dimensions without coordinates: values\n", "Attributes:\n", " standard_name: unknown\n", " long_name: 2 metre dewpoint temperature\n", " units: K" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds[\"data\"]" ] }, { "cell_type": "markdown", "id": "ae085ca1-8899-4cf3-aef9-44a052353120", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "#### Adding ensemble dimension" ] }, { "cell_type": "markdown", "id": "7db65a70-e815-4e81-835e-2a705251355d", "metadata": {}, "source": [ "We add the ensemble member as an additional dimension to the generated Xarray. Because the input is not ensemble data the value of the \"number\" ecCodes key can be missing. So we need to provide a meaningful default with the ``fill_metadata`` kwarg to be able to build the \"number\" dimension. " ] }, { "cell_type": "code", "execution_count": 4, "id": "6f2a78d3-3ace-49e7-a99e-49154bd17620", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset> Size: 111kB\n",
       "Dimensions:     (valid_time: 732, param: 2, number: 1, values: 9)\n",
       "Coordinates:\n",
       "  * valid_time  (valid_time) datetime64[ns] 6kB 2020-01-01 ... 2020-12-31T12:...\n",
       "  * param       (param) <U2 16B '2d' '2t'\n",
       "  * number      (number) int64 8B 0\n",
       "    latitude    (values) float64 72B dask.array<chunksize=(9,), meta=np.ndarray>\n",
       "    longitude   (values) float64 72B dask.array<chunksize=(9,), meta=np.ndarray>\n",
       "Dimensions without coordinates: values\n",
       "Data variables:\n",
       "    data        (valid_time, param, number, values) float64 105kB dask.array<chunksize=(1, 2, 1, 9), meta=np.ndarray>\n",
       "Attributes:\n",
       "    paramId:      168\n",
       "    class:        d1\n",
       "    stream:       clte\n",
       "    levtype:      sfc\n",
       "    type:         fc\n",
       "    expver:       0001\n",
       "    date:         20200101\n",
       "    time:         0\n",
       "    domain:       g\n",
       "    Conventions:  CF-1.8\n",
       "    institution:  ECMWF
" ], "text/plain": [ " Size: 111kB\n", "Dimensions: (valid_time: 732, param: 2, number: 1, values: 9)\n", "Coordinates:\n", " * valid_time (valid_time) datetime64[ns] 6kB 2020-01-01 ... 2020-12-31T12:...\n", " * param (param) \n", " longitude (values) float64 72B dask.array\n", "Dimensions without coordinates: values\n", "Data variables:\n", " data (valid_time, param, number, values) float64 105kB dask.array\n", "Attributes:\n", " paramId: 168\n", " class: d1\n", " stream: clte\n", " levtype: sfc\n", " type: fc\n", " expver: 0001\n", " date: 20200101\n", " time: 0\n", " domain: g\n", " Conventions: CF-1.8\n", " institution: ECMWF" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds = ds_fl.to_xarray(fixed_dims=[\"valid_time\", \"param\", \"number\"],\n", " mono_variable=True,\n", " chunks={\"valid_time\": 1}, \n", " flatten_values=True, \n", " add_earthkit_attrs=False, \n", " fill_metadata={\"number\": 0},\n", " )\n", "ds" ] } ], "metadata": { "kernelspec": { "display_name": "dev", "language": "python", "name": "dev" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.12" } }, "nbformat": 4, "nbformat_minor": 5 }