{ "cells": [ { "cell_type": "markdown", "id": "recovered-organizer", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "## Reading files as a stream" ] }, { "attachments": {}, "cell_type": "raw", "id": "2da4f1e8-e4ac-489f-9695-72683c779496", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "earthkit-data can read GRIB data from a file as a :ref:`stream`. This can be activated with the stream=True kwarg when calling :ref:`from_source() `.\n", "\n", "First, we ensure the example data is available." ] }, { "cell_type": "code", "execution_count": 1, "id": "fc405471-29e5-4e7e-b8f3-9d6e32dab190", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "import earthkit.data as ekd\n", "\n", "ekd.download_example_file(\"test6.grib\")" ] }, { "cell_type": "markdown", "id": "prescribed-giant", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "### Getting single items from the stream" ] }, { "cell_type": "code", "execution_count": 2, "id": "durable-helicopter", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "ds = ekd.from_source(\"file\", \"test6.grib\", stream=True)" ] }, { "cell_type": "raw", "id": "694df8fb-23a8-46dd-87c4-f9234e14d854", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "Using the resulting object we can iterate through the stream. As we progressing with the iteration :py:class:`~data.readers.grib.codes.GribField` objects are created then get deleted when going out of scope. As a result, only one GRIB message is kept in memory at a time." ] }, { "cell_type": "code", "execution_count": 3, "id": "animated-prayer", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "GribField(t,1000,20180801,1200,0,0)\n", "GribField(u,1000,20180801,1200,0,0)\n", "GribField(v,1000,20180801,1200,0,0)\n", "GribField(t,850,20180801,1200,0,0)\n", "GribField(u,850,20180801,1200,0,0)\n", "GribField(v,850,20180801,1200,0,0)\n" ] } ], "source": [ "for f in ds:\n", " # f is GribField object. It gets deleted when going out of scope\n", " print(f)" ] }, { "cell_type": "markdown", "id": "brilliant-struggle", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "Having finished the iteration there is no data available in *ds*. " ] }, { "cell_type": "code", "execution_count": 4, "id": "de7520fa-abad-40dd-97c5-bb5868859e2b", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len([f in ds])" ] }, { "cell_type": "markdown", "id": "judicial-backing", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "### Using batched" ] }, { "cell_type": "raw", "id": "fb4528e4-f649-4a5b-92b4-e92e9391851b", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "When we use the :py:meth:`batched ` method we can iterate through the stream in batches of fixed size. In this example we create a stream and read 2 fields from it at a time." ] }, { "cell_type": "code", "execution_count": 5, "id": "placed-blues", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "len=2 [('t', 1000), ('u', 1000)]\n", "len=2 [('v', 1000), ('t', 850)]\n", "len=2 [('u', 850), ('v', 850)]\n" ] } ], "source": [ "ds = ekd.from_source(\"file\", \"test6.grib\", stream=True)\n", "\n", "# f is a fieldlist\n", "for f in ds.batched(2):\n", " print(f\"len={len(f)} {f.metadata(('param', 'level'))}\")" ] }, { "cell_type": "raw", "id": "ed0b0e9c-1016-474b-8ea0-c65d864d2427", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "It is possible to use a batch size that is not a factor of the total number fields in the stream. In this case the last batch will simply contain less fields than the specified batch size." ] }, { "cell_type": "code", "execution_count": 6, "id": "bf94a190-ec0e-4172-8e75-6518e48f50a4", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "len=4 [('t', 1000), ('u', 1000), ('v', 1000), ('t', 850)]\n", "len=2 [('u', 850), ('v', 850)]\n" ] } ], "source": [ "ds = ekd.from_source(\"file\", \"test6.grib\", stream=True)\n", "\n", "# f is a fieldlist\n", "for f in ds.batched(4):\n", " print(f\"len={len(f)} {f.metadata(('param', 'level'))}\")" ] }, { "cell_type": "markdown", "id": "7122a9a4-9ca0-4d75-9194-144074c6dcad", "metadata": {}, "source": [ "### Using group_by" ] }, { "cell_type": "raw", "id": "d970d832-7203-498f-81d6-99434ce42b88", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "When we use the :py:meth:`group_by ` method we can iterate throught the stream in groups defined by metadata keys. Each iteration step results in a :py:class:`FieldList ` object, which is built by consuming GRIB messages from the stream until the values of the metadata keys change. The generated :py:class:`FieldList ` keeps GRIB messages in memory then gets deleted when going out of scope." ] }, { "cell_type": "code", "execution_count": 7, "id": "8e1be478-6eb6-4732-bb96-9d6fa942c20d", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "len=3 [('t', 1000), ('u', 1000), ('v', 1000)]\n", "len=3 [('t', 850), ('u', 850), ('v', 850)]\n" ] } ], "source": [ "ds = ekd.from_source(\"file\", \"test6.grib\", stream=True)\n", "\n", "# f is a fieldlist\n", "for f in ds.group_by(\"level\"):\n", " print(f\"len={len(f)} {f.metadata(('param', 'level'))}\")" ] }, { "cell_type": "markdown", "id": "permanent-uncertainty", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "### Storing each GRIB message in memory" ] }, { "cell_type": "raw", "id": "0b9b01c1-b528-42a2-9f1b-ccae28eb65b5", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "We can load the whole stream into memory by using ``read_all=True`` in :ref:`from_source() `. The resulting object will be a :py:class:`FieldList` storing all the GRIB messages in memory." ] }, { "cell_type": "code", "execution_count": 8, "id": "simple-london", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "ds = ekd.from_source(\"file\", \"test6.grib\", stream=True, read_all=True)" ] }, { "cell_type": "code", "execution_count": 9, "id": "meaning-oxide", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "6" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(ds)" ] }, { "cell_type": "code", "execution_count": 10, "id": "copyrighted-walnut", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
centreshortNametypeOfLevelleveldataDatedataTimestepRangedataTypenumbergridType
0ecmftisobaricInhPa10002018080112000an0regular_ll
1ecmfuisobaricInhPa10002018080112000an0regular_ll
2ecmfvisobaricInhPa10002018080112000an0regular_ll
3ecmftisobaricInhPa8502018080112000an0regular_ll
4ecmfuisobaricInhPa8502018080112000an0regular_ll
5ecmfvisobaricInhPa8502018080112000an0regular_ll
\n", "
" ], "text/plain": [ " centre shortName typeOfLevel level dataDate dataTime stepRange \\\n", "0 ecmf t isobaricInhPa 1000 20180801 1200 0 \n", "1 ecmf u isobaricInhPa 1000 20180801 1200 0 \n", "2 ecmf v isobaricInhPa 1000 20180801 1200 0 \n", "3 ecmf t isobaricInhPa 850 20180801 1200 0 \n", "4 ecmf u isobaricInhPa 850 20180801 1200 0 \n", "5 ecmf v isobaricInhPa 850 20180801 1200 0 \n", "\n", " dataType number gridType \n", "0 an 0 regular_ll \n", "1 an 0 regular_ll \n", "2 an 0 regular_ll \n", "3 an 0 regular_ll \n", "4 an 0 regular_ll \n", "5 an 0 regular_ll " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds.ls()" ] }, { "cell_type": "code", "execution_count": 11, "id": "static-reasoning", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
centreshortNametypeOfLevelleveldataDatedataTimestepRangedataTypenumbergridType
0ecmftisobaricInhPa10002018080112000an0regular_ll
1ecmftisobaricInhPa8502018080112000an0regular_ll
\n", "
" ], "text/plain": [ " centre shortName typeOfLevel level dataDate dataTime stepRange \\\n", "0 ecmf t isobaricInhPa 1000 20180801 1200 0 \n", "1 ecmf t isobaricInhPa 850 20180801 1200 0 \n", "\n", " dataType number gridType \n", "0 an 0 regular_ll \n", "1 an 0 regular_ll " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = ds.sel(param=\"t\")\n", "a.ls()" ] }, { "cell_type": "code", "execution_count": 12, "id": "spanish-wagon", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset>\n",
       "Dimensions:    (levelist: 2, latitude: 7, longitude: 12)\n",
       "Coordinates:\n",
       "  * levelist   (levelist) int64 850 1000\n",
       "  * latitude   (latitude) float64 90.0 60.0 30.0 0.0 -30.0 -60.0 -90.0\n",
       "  * longitude  (longitude) float64 0.0 30.0 60.0 90.0 ... 270.0 300.0 330.0\n",
       "Data variables:\n",
       "    t          (levelist, latitude, longitude) float64 ...\n",
       "Attributes:\n",
       "    param:        t\n",
       "    class:        od\n",
       "    stream:       oper\n",
       "    levtype:      pl\n",
       "    type:         an\n",
       "    expver:       0001\n",
       "    date:         20180801\n",
       "    time:         1200\n",
       "    domain:       g\n",
       "    number:       0\n",
       "    Conventions:  CF-1.8\n",
       "    institution:  ECMWF
" ], "text/plain": [ "\n", "Dimensions: (levelist: 2, latitude: 7, longitude: 12)\n", "Coordinates:\n", " * levelist (levelist) int64 850 1000\n", " * latitude (latitude) float64 90.0 60.0 30.0 0.0 -30.0 -60.0 -90.0\n", " * longitude (longitude) float64 0.0 30.0 60.0 90.0 ... 270.0 300.0 330.0\n", "Data variables:\n", " t (levelist, latitude, longitude) float64 ...\n", "Attributes:\n", " param: t\n", " class: od\n", " stream: oper\n", " levtype: pl\n", " type: an\n", " expver: 0001\n", " date: 20180801\n", " time: 1200\n", " domain: g\n", " number: 0\n", " Conventions: CF-1.8\n", " institution: ECMWF" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = a.to_xarray()\n", "a" ] } ], "metadata": { "kernelspec": { "display_name": "dev_ecc", "language": "python", "name": "dev_ecc" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.13" } }, "nbformat": 4, "nbformat_minor": 5 }