{ "cells": [ { "cell_type": "markdown", "id": "b74f6a93-148e-4668-a401-6104f54838e1", "metadata": {}, "source": [ "## Xarray engine: mono variable with remapping" ] }, { "cell_type": "markdown", "id": "8b3010be-7895-4fd2-8ad6-a57db7845396", "metadata": {}, "source": [ "This notebook demonstrates how to generate an Xarray with a single dataarray containing all the parameters from a GRIB fieldlist. This data structure is often needed for machine learning." ] }, { "cell_type": "markdown", "id": "190cab07-5f50-410c-9c56-604a95a64ef0", "metadata": {}, "source": [ "First, we get GRIB data containing multiple forecasts on the surface and pressure levels. We select a single forecast out of it." ] }, { "cell_type": "code", "execution_count": 1, "id": "613fc34e-0a19-49cb-9e12-129a9b50f08c", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "edad67397eaf4b6eb526d2e06a6717bd", "version_major": 2, "version_minor": 0 }, "text/plain": [ "mixed_pl_sfc.grib: 0%| | 0.00/390k [00:00\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset> Size: 362kB\n",
       "Dimensions:     (valid_time: 2, param: 32, number: 1, values: 684)\n",
       "Coordinates:\n",
       "  * valid_time  (valid_time) datetime64[ns] 16B 2024-06-03 2024-06-03T06:00:00\n",
       "  * param       (param) <U6 768B '2t_0' 'msl_0' 'r_1000' ... 'z_700' 'z_850'\n",
       "  * number      (number) int64 8B 0\n",
       "    latitude    (values) float64 5kB dask.array<chunksize=(684,), meta=np.ndarray>\n",
       "    longitude   (values) float64 5kB dask.array<chunksize=(684,), meta=np.ndarray>\n",
       "Dimensions without coordinates: values\n",
       "Data variables:\n",
       "    data        (valid_time, param, number, values) float64 350kB dask.array<chunksize=(1, 32, 1, 684), meta=np.ndarray>\n",
       "Attributes:\n",
       "    paramId:      167\n",
       "    class:        od\n",
       "    stream:       oper\n",
       "    levtype:      sfc\n",
       "    type:         fc\n",
       "    expver:       0001\n",
       "    date:         20240603\n",
       "    time:         0\n",
       "    domain:       g\n",
       "    Conventions:  CF-1.8\n",
       "    institution:  ECMWF
" ], "text/plain": [ " Size: 362kB\n", "Dimensions: (valid_time: 2, param: 32, number: 1, values: 684)\n", "Coordinates:\n", " * valid_time (valid_time) datetime64[ns] 16B 2024-06-03 2024-06-03T06:00:00\n", " * param (param) \n", " longitude (values) float64 5kB dask.array\n", "Dimensions without coordinates: values\n", "Data variables:\n", " data (valid_time, param, number, values) float64 350kB dask.array\n", "Attributes:\n", " paramId: 167\n", " class: od\n", " stream: oper\n", " levtype: sfc\n", " type: fc\n", " expver: 0001\n", " date: 20240603\n", " time: 0\n", " domain: g\n", " Conventions: CF-1.8\n", " institution: ECMWF" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds = ds_fl.to_xarray(fixed_dims=[\"valid_time\", \"param\", \"number\"],\n", " mono_variable=True,\n", " chunks={\"valid_time\": 1}, \n", " flatten_values=True, \n", " add_earthkit_attrs=False, \n", " remapping={\"param\": \"{param}_{level}\"}\n", " )\n", "ds" ] }, { "cell_type": "markdown", "id": "312fcd49-3625-41ef-8b21-84d0cd24f9a0", "metadata": {}, "source": [ "When generating the Xarray we flattened the field values and chose the chunking so that one chunk would contain all the data belonging to a given valid time." ] }, { "cell_type": "code", "execution_count": 3, "id": "5b687a1d-2a9a-4584-894a-d2175b9b0317", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'data' (valid_time: 2, param: 32, number: 1, values: 684)> Size: 350kB\n",
       "dask.array<open_dataset-data, shape=(2, 32, 1, 684), dtype=float64, chunksize=(1, 32, 1, 684), chunktype=numpy.ndarray>\n",
       "Coordinates:\n",
       "  * valid_time  (valid_time) datetime64[ns] 16B 2024-06-03 2024-06-03T06:00:00\n",
       "  * param       (param) <U6 768B '2t_0' 'msl_0' 'r_1000' ... 'z_700' 'z_850'\n",
       "  * number      (number) int64 8B 0\n",
       "    latitude    (values) float64 5kB dask.array<chunksize=(684,), meta=np.ndarray>\n",
       "    longitude   (values) float64 5kB dask.array<chunksize=(684,), meta=np.ndarray>\n",
       "Dimensions without coordinates: values\n",
       "Attributes:\n",
       "    standard_name:  unknown\n",
       "    long_name:      2 metre temperature\n",
       "    units:          K
" ], "text/plain": [ " Size: 350kB\n", "dask.array\n", "Coordinates:\n", " * valid_time (valid_time) datetime64[ns] 16B 2024-06-03 2024-06-03T06:00:00\n", " * param (param) \n", " longitude (values) float64 5kB dask.array\n", "Dimensions without coordinates: values\n", "Attributes:\n", " standard_name: unknown\n", " long_name: 2 metre temperature\n", " units: K" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds[\"data\"]" ] }, { "cell_type": "code", "execution_count": null, "id": "a64a67a1-53f4-4576-b4d5-ef093a9ed58c", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "dev", "language": "python", "name": "dev" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.12" } }, "nbformat": 4, "nbformat_minor": 5 }