{ "cells": [ { "cell_type": "markdown", "id": "d3575125-5f56-43e9-8080-4094cf309c31", "metadata": { "editable": true, "raw_mimetype": "", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "## Reading data from URLs as a stream" ] }, { "cell_type": "code", "execution_count": 1, "id": "c23ffb07-747c-4535-b608-66b080d6d4e5", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "import earthkit.data as ekd" ] }, { "cell_type": "raw", "id": "e55b54da-4011-442e-99e8-9d1c82773d31", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "earthkit-data can read GRIB data from a URL as a stream without writing anything to disk. This can be activated with the ``stream=True`` kwarg when calling :ref:`from_source() `." ] }, { "cell_type": "code", "execution_count": 2, "id": "1d2b07e3-1820-45ec-88d5-c78765533735", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "ds = ekd.from_source(\"url\", \n", " \"https://sites.ecmwf.int/repository/earthkit-data/examples/test4.grib\", \n", " stream=True)" ] }, { "cell_type": "markdown", "id": "1661f2d1-2c30-4021-826f-1c2c5c565fd0", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "The resulting object only supports one iteration. Having finsihed the iteration the stream is consumed and no more data is available." ] }, { "cell_type": "code", "execution_count": 3, "id": "51964579-bf3f-4948-918d-fcd35581950e", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "GribField(t,500,20070101,1200,0,0)\n", "GribField(z,500,20070101,1200,0,0)\n", "GribField(t,850,20070101,1200,0,0)\n", "GribField(z,850,20070101,1200,0,0)\n" ] } ], "source": [ "for f in ds:\n", " # f is GribField object. It gets deleted when going out of scope\n", " print(f)" ] }, { "cell_type": "raw", "id": "df31a770-f5e9-46c7-8d5f-c9f73c57483c", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "The iteration can be done in batches by using :py:meth:`batched ` or :py:meth:`group_by `." ] }, { "cell_type": "code", "execution_count": 4, "id": "754dc030-da97-4076-88ba-f31fed2f091d", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "len=2 [('t', 500), ('z', 500)]\n", "len=2 [('t', 850), ('z', 850)]\n" ] } ], "source": [ "ds = ekd.from_source(\"url\", \n", " \"https://sites.ecmwf.int/repository/earthkit-data/examples/test4.grib\", \n", " stream=True)\n", "\n", "for f in ds.batched(2):\n", " # f is a fieldlist\n", " print(f\"len={len(f)} {f.metadata(('param', 'level'))}\")" ] }, { "cell_type": "markdown", "id": "9db87b26-20e2-4cb5-89a1-2643d97fcf13", "metadata": {}, "source": [ "#### Reading the whole stream into memory" ] }, { "cell_type": "raw", "id": "f8622372-d2c9-4261-86cc-db46531b0cd5", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "We can load the whole stream into memory by using ``read_all=True`` in :ref:`from_source() `. The resulting object will be a :py:class:`FieldList` storing all the GRIB messages in memory." ] }, { "cell_type": "code", "execution_count": 5, "id": "359ceb8d-3c88-411d-af86-78c7cb403a08", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds = ekd.from_source(\"url\", \n", " \"https://sites.ecmwf.int/repository/earthkit-data/examples/test4.grib\", \n", " stream=True, read_all=True)\n", "\n", "len(ds)" ] }, { "cell_type": "code", "execution_count": 6, "id": "4608964a-3849-4c40-bcd3-086f54b2f67a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
centreshortNametypeOfLevelleveldataDatedataTimestepRangedataTypenumbergridType
0ecmftisobaricInhPa5002007010112000an0regular_ll
1ecmfzisobaricInhPa5002007010112000an0regular_ll
2ecmftisobaricInhPa8502007010112000an0regular_ll
3ecmfzisobaricInhPa8502007010112000an0regular_ll
\n", "
" ], "text/plain": [ " centre shortName typeOfLevel level dataDate dataTime stepRange \\\n", "0 ecmf t isobaricInhPa 500 20070101 1200 0 \n", "1 ecmf z isobaricInhPa 500 20070101 1200 0 \n", "2 ecmf t isobaricInhPa 850 20070101 1200 0 \n", "3 ecmf z isobaricInhPa 850 20070101 1200 0 \n", "\n", " dataType number gridType \n", "0 an 0 regular_ll \n", "1 an 0 regular_ll \n", "2 an 0 regular_ll \n", "3 an 0 regular_ll " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds.ls()" ] }, { "cell_type": "markdown", "id": "cb7d74b5-73e0-4e57-9d80-e45a9dc725d8", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "### Multiple URLs" ] }, { "cell_type": "markdown", "id": "6d5efce5-c9d0-4edf-a2cf-312ced9339b5", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "The stream option works even when the input is a list of URLs." ] }, { "cell_type": "code", "execution_count": 7, "id": "891af3e5-5aa5-451f-b7bb-a86eaf7c8ffe", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "len=3 [('t', 500), ('z', 500), ('t', 850)]\n", "len=3 [('z', 850), ('t', 1000), ('u', 1000)]\n", "len=3 [('v', 1000), ('t', 850), ('u', 850)]\n", "len=1 [('v', 850)]\n" ] } ], "source": [ "ds = ekd.from_source(\"url\", \n", " [\"https://sites.ecmwf.int/repository/earthkit-data/examples/test4.grib\", \n", " \"https://sites.ecmwf.int/repository/earthkit-data/examples/test6.grib\"], \n", " stream=True)\n", "\n", "for f in ds.batched(3):\n", " # f is a fieldlist\n", " print(f\"len={len(f)} {f.metadata(('param', 'level'))}\")" ] }, { "cell_type": "code", "execution_count": null, "id": "d287fb70-66fb-4b2f-8c07-a3cca7c3035d", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "dev", "language": "python", "name": "dev" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.12" } }, "nbformat": 4, "nbformat_minor": 5 }