{ "cells": [ { "cell_type": "markdown", "id": "hourly-multimedia", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "## Retrieving data from FDB" ] }, { "cell_type": "code", "execution_count": 1, "id": "behind-carry", "metadata": {}, "outputs": [], "source": [ "import earthkit.data" ] }, { "cell_type": "markdown", "id": "numerous-france", "metadata": {}, "source": [ "FDB (Fields DataBase) is a domain-specific object store developed at ECMWF for storing, indexing and retrieving GRIB data. For more information on FBD please consult the following pages:\n", "\n", "- [FDB](https://fields-database.readthedocs.io/en/latest/)\n", "- [pyfdb](https://pyfdb.readthedocs.io/en/latest/)\n", "\n", "This example requires FDB access and the FDB_HOME environment variable has to be set correctly. " ] }, { "cell_type": "markdown", "id": "concrete-wallet", "metadata": {}, "source": [ "The following request was written to retrieve data from the operational FDB at ECMWF. Please note that the **date** must be adjusted since FDB at ECMWF only stores the most recent dates." ] }, { "cell_type": "code", "execution_count": 2, "id": "drawn-renewal", "metadata": {}, "outputs": [], "source": [ "request = {\n", " 'class': 'od',\n", " 'expver': '0001',\n", " 'stream': 'oper',\n", " 'date': '20240421',\n", " 'time': [0, 12],\n", " 'domain': 'g',\n", " 'type': 'an',\n", " 'levtype': 'sfc',\n", " 'step': 0,\n", " 'param': [151, 167, 168]\n", "}" ] }, { "cell_type": "markdown", "id": "compound-pastor", "metadata": {}, "source": [ "### Reading as a stream" ] }, { "cell_type": "raw", "id": "a4ceed12-9fa3-4766-a72c-28b32a1660c1", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "By default we retrieve data from an :ref:`FDB ` source with :ref:`from_source() ` as a stream." ] }, { "cell_type": "markdown", "id": "a8431339-1814-4f56-a071-84175ddf5775", "metadata": { "editable": true, "raw_mimetype": "", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "#### Iteration with one field at a time in memory" ] }, { "cell_type": "raw", "id": "eccee7c9-7769-436a-b764-18045c04d86b", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "When we use the default arguments in :ref:`from_source() ` the resulting object can only be used for iteration and only one field is kept in memory at a time. Fields created in the iteration get deleted when going out of scope." ] }, { "cell_type": "code", "execution_count": 3, "id": "signal-rocket", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "GribField(msl,None,20240421,0,0,0)\n", "GribField(2t,None,20240421,0,0,0)\n", "GribField(2d,None,20240421,0,0,0)\n", "GribField(msl,None,20240421,1200,0,0)\n", "GribField(2t,None,20240421,1200,0,0)\n", "GribField(2d,None,20240421,1200,0,0)\n" ] } ], "source": [ "ds = earthkit.data.from_source(\"fdb\", request=request)\n", "for f in ds:\n", " print(f)" ] }, { "cell_type": "markdown", "id": "white-lebanon", "metadata": {}, "source": [ "Once the iteration is completed, there is nothing left in *ds*." ] }, { "cell_type": "code", "execution_count": 4, "id": "blank-affiliate", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sum([1 for _ in ds])" ] }, { "cell_type": "markdown", "id": "7e63bb45-b15d-4a89-b882-bf4db27d4f39", "metadata": { "tags": [] }, "source": [ "#### Iteration with group_by" ] }, { "cell_type": "raw", "id": "1f3f2802-7af9-48da-a964-7769c951ce48", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "When we use the :py:meth:`group_by ` method we can iterate throught the stream in groups defined by metadata keys. Each iteration step results in a :py:class:`FieldList ` object, which is built by consuming GRIB messages from the stream until the values of the metadata keys change. The generated :py:class:`FieldList ` keeps GRIB messages in memory then gets deleted when going out of scope." ] }, { "cell_type": "code", "execution_count": 5, "id": "574757de-cc05-4d73-b71c-cd49af36f655", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "len=3 [('msl', 0), ('2t', 0), ('2d', 0)]\n", "len=3 [('msl', 0), ('2t', 0), ('2d', 0)]\n" ] } ], "source": [ "ds = earthkit.data.from_source(\"fdb\", request=request)\n", "for f in ds.group_by(\"time\"):\n", " print(f\"len={len(f)} {f.metadata(('param', 'level'))}\")" ] }, { "cell_type": "markdown", "id": "ethical-canyon", "metadata": {}, "source": [ "#### Iteration with batched" ] }, { "cell_type": "raw", "id": "731f4f11-d050-4754-8862-5ca5b2837934", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "When we use the :py:meth:`batched ` method we can iterate throught the stream in batches of fixed size. In this example we create a stream and read 2 fields from it at a time." ] }, { "cell_type": "code", "execution_count": 6, "id": "precise-guyana", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "len=2 [('msl', 0), ('2t', 0)]\n", "len=2 [('2d', 0), ('msl', 0)]\n", "len=2 [('2t', 0), ('2d', 0)]\n" ] } ], "source": [ "ds = earthkit.data.from_source(\"fdb\", request=request)\n", "for f in ds.batched(2):\n", " print(f\"len={len(f)} {f.metadata(('param', 'level'))}\")" ] }, { "cell_type": "markdown", "id": "through-seven", "metadata": {}, "source": [ "#### Storing all the fields in memory" ] }, { "cell_type": "raw", "id": "1391881c-3484-405a-ab6f-4ee9d032bc35", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "We can load the whole stream into memory by using ``read_all=True`` in :ref:`from_source() `. The resulting object will be a FieldList." ] }, { "cell_type": "code", "execution_count": 7, "id": "bizarre-basket", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "ds = earthkit.data.from_source(\"fdb\", request=request, read_all=True)" ] }, { "cell_type": "code", "execution_count": 8, "id": "exciting-accused", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "6" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(ds)" ] }, { "cell_type": "code", "execution_count": 9, "id": "minus-horizon", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
centreshortNametypeOfLevelleveldataDatedataTimestepRangedataTypenumbergridType
0ecmfmslsurface02024042100an0reduced_gg
1ecmf2tsurface02024042100an0reduced_gg
2ecmf2dsurface02024042100an0reduced_gg
3ecmfmslsurface02024042112000an0reduced_gg
4ecmf2tsurface02024042112000an0reduced_gg
5ecmf2dsurface02024042112000an0reduced_gg
\n", "
" ], "text/plain": [ " centre shortName typeOfLevel level dataDate dataTime stepRange dataType \\\n", "0 ecmf msl surface 0 20240421 0 0 an \n", "1 ecmf 2t surface 0 20240421 0 0 an \n", "2 ecmf 2d surface 0 20240421 0 0 an \n", "3 ecmf msl surface 0 20240421 1200 0 an \n", "4 ecmf 2t surface 0 20240421 1200 0 an \n", "5 ecmf 2d surface 0 20240421 1200 0 an \n", "\n", " number gridType \n", "0 0 reduced_gg \n", "1 0 reduced_gg \n", "2 0 reduced_gg \n", "3 0 reduced_gg \n", "4 0 reduced_gg \n", "5 0 reduced_gg " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds.ls()" ] }, { "cell_type": "code", "execution_count": 10, "id": "tamil-tattoo", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
centreshortNametypeOfLevelleveldataDatedataTimestepRangedataTypenumbergridType
0ecmf2tsurface02024042100an0reduced_gg
1ecmf2tsurface02024042112000an0reduced_gg
\n", "
" ], "text/plain": [ " centre shortName typeOfLevel level dataDate dataTime stepRange dataType \\\n", "0 ecmf 2t surface 0 20240421 0 0 an \n", "1 ecmf 2t surface 0 20240421 1200 0 an \n", "\n", " number gridType \n", "0 0 reduced_gg \n", "1 0 reduced_gg " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds.sel(param=\"2t\").ls()" ] }, { "cell_type": "code", "execution_count": 11, "id": "assumed-month", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset>\n",
       "Dimensions:     (number: 1, time: 2, step: 1, surface: 1, values: 6599680)\n",
       "Coordinates:\n",
       "  * number      (number) int64 0\n",
       "  * time        (time) datetime64[ns] 2024-04-21 2024-04-21T12:00:00\n",
       "  * step        (step) timedelta64[ns] 00:00:00\n",
       "  * surface     (surface) float64 0.0\n",
       "    latitude    (values) float64 ...\n",
       "    longitude   (values) float64 ...\n",
       "    valid_time  (time, step) datetime64[ns] ...\n",
       "Dimensions without coordinates: values\n",
       "Data variables:\n",
       "    msl         (number, time, step, surface, values) float32 ...\n",
       "    t2m         (number, time, step, surface, values) float32 ...\n",
       "    d2m         (number, time, step, surface, values) float32 ...\n",
       "Attributes:\n",
       "    GRIB_edition:            1\n",
       "    GRIB_centre:             ecmf\n",
       "    GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts\n",
       "    GRIB_subCentre:          0\n",
       "    Conventions:             CF-1.7\n",
       "    institution:             European Centre for Medium-Range Weather Forecasts\n",
       "    history:                 2024-04-22T11:01 GRIB to CDM+CF via cfgrib-0.9.1...
" ], "text/plain": [ "\n", "Dimensions: (number: 1, time: 2, step: 1, surface: 1, values: 6599680)\n", "Coordinates:\n", " * number (number) int64 0\n", " * time (time) datetime64[ns] 2024-04-21 2024-04-21T12:00:00\n", " * step (step) timedelta64[ns] 00:00:00\n", " * surface (surface) float64 0.0\n", " latitude (values) float64 ...\n", " longitude (values) float64 ...\n", " valid_time (time, step) datetime64[ns] ...\n", "Dimensions without coordinates: values\n", "Data variables:\n", " msl (number, time, step, surface, values) float32 ...\n", " t2m (number, time, step, surface, values) float32 ...\n", " d2m (number, time, step, surface, values) float32 ...\n", "Attributes:\n", " GRIB_edition: 1\n", " GRIB_centre: ecmf\n", " GRIB_centreDescription: European Centre for Medium-Range Weather Forecasts\n", " GRIB_subCentre: 0\n", " Conventions: CF-1.7\n", " institution: European Centre for Medium-Range Weather Forecasts\n", " history: 2024-04-22T11:01 GRIB to CDM+CF via cfgrib-0.9.1..." ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds.to_xarray()" ] }, { "cell_type": "markdown", "id": "decent-algeria", "metadata": {}, "source": [ "### Reading into a file" ] }, { "cell_type": "raw", "id": "c7d8304a-5103-43a2-84f9-87311c051b12", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "We can retrieve data from FDB into a file, which is located in the :ref:`cache `: " ] }, { "cell_type": "code", "execution_count": 12, "id": "passing-georgia", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "ds = earthkit.data.from_source(\"fdb\", request=request, stream=False)" ] }, { "cell_type": "code", "execution_count": 13, "id": "foster-profile", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
centreshortNametypeOfLevelleveldataDatedataTimestepRangedataTypenumbergridType
0ecmfmslsurface02024042100an0reduced_gg
1ecmf2tsurface02024042100an0reduced_gg
2ecmf2dsurface02024042100an0reduced_gg
3ecmfmslsurface02024042112000an0reduced_gg
4ecmf2tsurface02024042112000an0reduced_gg
5ecmf2dsurface02024042112000an0reduced_gg
\n", "
" ], "text/plain": [ " centre shortName typeOfLevel level dataDate dataTime stepRange dataType \\\n", "0 ecmf msl surface 0 20240421 0 0 an \n", "1 ecmf 2t surface 0 20240421 0 0 an \n", "2 ecmf 2d surface 0 20240421 0 0 an \n", "3 ecmf msl surface 0 20240421 1200 0 an \n", "4 ecmf 2t surface 0 20240421 1200 0 an \n", "5 ecmf 2d surface 0 20240421 1200 0 an \n", "\n", " number gridType \n", "0 0 reduced_gg \n", "1 0 reduced_gg \n", "2 0 reduced_gg \n", "3 0 reduced_gg \n", "4 0 reduced_gg \n", "5 0 reduced_gg " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds.ls()" ] }, { "cell_type": "raw", "id": "72daa46e-7aa3-4612-bd46-beb7dc6e0375", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "The data is now :ref:`cached `. Subsequent retrievals will used the cached file directly." ] }, { "cell_type": "code", "execution_count": null, "id": "outer-accommodation", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "dev", "language": "python", "name": "dev" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.12" } }, "nbformat": 4, "nbformat_minor": 5 }