{ "cells": [ { "cell_type": "markdown", "id": "7b42e29a-7d97-4b49-ab49-db2d73d79311", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "## Reading data parts from URLs" ] }, { "cell_type": "code", "execution_count": 1, "id": "2beb1bac-4968-4f0a-aa1c-8fedbf234b0e", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "import earthkit.data as ekd" ] }, { "cell_type": "raw", "id": "a499dce2-5b62-4d7c-b994-ee2020c35468", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "This notebook demonstrates how to download only :ref:`parts ` (byte ranges) from URLs." ] }, { "cell_type": "raw", "id": "e9ccbcb7-a8d2-4e61-9055-1412cdc372ed", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "We download one of the files and inspect the contents with :py:meth:`~data.readers.grib.index.GribFieldList.ls`. By using the \"offset\" key we can get the byte positions where each message starts within the file. " ] }, { "cell_type": "code", "execution_count": 2, "id": "27820099-8b9c-4c6c-9ee0-a72ef4224dac", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "f01720cc546944568e8762d85e48b0e3", "version_major": 2, "version_minor": 0 }, "text/plain": [ "test6.grib: 0%| | 0.00/1.41k [00:00\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
centreshortNametypeOfLevelleveldataDatedataTimestepRangedataTypenumbergridTypeoffset
0ecmftisobaricInhPa10002018080112000an0regular_ll0.0
1ecmfuisobaricInhPa10002018080112000an0regular_ll240.0
2ecmfvisobaricInhPa10002018080112000an0regular_ll480.0
3ecmftisobaricInhPa8502018080112000an0regular_ll720.0
4ecmfuisobaricInhPa8502018080112000an0regular_ll960.0
5ecmfvisobaricInhPa8502018080112000an0regular_ll1200.0
\n", "" ], "text/plain": [ " centre shortName typeOfLevel level dataDate dataTime stepRange \\\n", "0 ecmf t isobaricInhPa 1000 20180801 1200 0 \n", "1 ecmf u isobaricInhPa 1000 20180801 1200 0 \n", "2 ecmf v isobaricInhPa 1000 20180801 1200 0 \n", "3 ecmf t isobaricInhPa 850 20180801 1200 0 \n", "4 ecmf u isobaricInhPa 850 20180801 1200 0 \n", "5 ecmf v isobaricInhPa 850 20180801 1200 0 \n", "\n", " dataType number gridType offset \n", "0 an 0 regular_ll 0.0 \n", "1 an 0 regular_ll 240.0 \n", "2 an 0 regular_ll 480.0 \n", "3 an 0 regular_ll 720.0 \n", "4 an 0 regular_ll 960.0 \n", "5 an 0 regular_ll 1200.0 " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds = ekd.from_source(\n", " \"url\", \n", " \"https://sites.ecmwf.int/repository/earthkit-data/examples/test6.grib\")\n", "ds.ls(extra_keys=\"offset\")" ] }, { "cell_type": "markdown", "id": "f348d7e7-d864-4e23-bb1b-adf6f1143c9c", "metadata": {}, "source": [ "### Single files" ] }, { "cell_type": "raw", "id": "e8346eb4-804f-4093-bda8-80acec9873ae", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "The **parts** option in :ref:`from_source() ` specifies the **byte range(s)** we want to read from a remote file. A single :ref:`part ` is a tuple or list in the following format: *(offset, length)*. \n", "\n", "Using the offsets from the example above we can specify the :ref:`part ` for the fist message." ] }, { "cell_type": "code", "execution_count": 3, "id": "5a048a9f-7ea2-4613-997f-4d7464a7db98", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "56e29bc65a514b23826f9206e37f7387", "version_major": 2, "version_minor": 0 }, "text/plain": [ "test6.grib: 0%| | 0.00/240 [00:00\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
centreshortNametypeOfLevelleveldataDatedataTimestepRangedataTypenumbergridType
0ecmftisobaricInhPa10002018080112000an0regular_ll
\n", "" ], "text/plain": [ " centre shortName typeOfLevel level dataDate dataTime stepRange \\\n", "0 ecmf t isobaricInhPa 1000 20180801 1200 0 \n", "\n", " dataType number gridType \n", "0 an 0 regular_ll " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds = ekd.from_source(\n", " \"url\", \n", " \"https://sites.ecmwf.int/repository/earthkit-data/examples/test6.grib\",\n", " parts=(0, 240))\n", "ds.ls()" ] }, { "cell_type": "markdown", "id": "b40e2a4d-d000-4a3f-b756-0c8780e6bf09", "metadata": {}, "source": [ "The call above can also be written as:" ] }, { "cell_type": "code", "execution_count": 4, "id": "661e7ebb-13f7-4db6-8e87-ee59848c47b2", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "ddf0f38792694bbca01831299bbe04a6", "version_major": 2, "version_minor": 0 }, "text/plain": [ "test6.grib: 0%| | 0.00/240 [00:00\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
centreshortNametypeOfLevelleveldataDatedataTimestepRangedataTypenumbergridType
0ecmftisobaricInhPa10002018080112000an0regular_ll
\n", "" ], "text/plain": [ " centre shortName typeOfLevel level dataDate dataTime stepRange \\\n", "0 ecmf t isobaricInhPa 1000 20180801 1200 0 \n", "\n", " dataType number gridType \n", "0 an 0 regular_ll " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds = ekd.from_source(\n", " \"url\", \n", " \"https://sites.ecmwf.int/repository/earthkit-data/examples/test6.grib\",\n", " parts=[(0, 240)])\n", "ds.ls()" ] }, { "cell_type": "markdown", "id": "5c8001e3-5413-4c8f-ac4e-ff3a7d168a43", "metadata": {}, "source": [ "A part can go over a message boundary. Here bytes 240-244 belong to the second message, which is not read because not all of its bytes are specified." ] }, { "cell_type": "code", "execution_count": 5, "id": "cdbd5356-8696-4e73-9d1d-1996667f1792", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "45b101589a9645679207ce2597de7359", "version_major": 2, "version_minor": 0 }, "text/plain": [ "test6.grib: 0%| | 0.00/245 [00:00\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
centreshortNametypeOfLevelleveldataDatedataTimestepRangedataTypenumbergridType
0ecmftisobaricInhPa10002018080112000an0regular_ll
\n", "" ], "text/plain": [ " centre shortName typeOfLevel level dataDate dataTime stepRange \\\n", "0 ecmf t isobaricInhPa 1000 20180801 1200 0 \n", "\n", " dataType number gridType \n", "0 an 0 regular_ll " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds = ekd.from_source(\n", " \"url\", \n", " \"https://sites.ecmwf.int/repository/earthkit-data/examples/test6.grib\",\n", " parts=[(0, 245)])\n", "ds.ls()" ] }, { "cell_type": "raw", "id": "d2c2aa43-802c-4739-b7da-92c5fc22b7b2", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "Multiple :ref:`parts ` can be used." ] }, { "cell_type": "code", "execution_count": 6, "id": "53c60306-ff16-40ce-9922-6552bab3244f", "metadata": { "editable": true, "scrolled": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "798521ecb2944154bc0001fe32a6f101", "version_major": 2, "version_minor": 0 }, "text/plain": [ "test6.grib: 0%| | 0.00/720 [00:00\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
centreshortNametypeOfLevelleveldataDatedataTimestepRangedataTypenumbergridType
0ecmftisobaricInhPa10002018080112000an0regular_ll
1ecmfvisobaricInhPa10002018080112000an0regular_ll
2ecmftisobaricInhPa8502018080112000an0regular_ll
\n", "" ], "text/plain": [ " centre shortName typeOfLevel level dataDate dataTime stepRange \\\n", "0 ecmf t isobaricInhPa 1000 20180801 1200 0 \n", "1 ecmf v isobaricInhPa 1000 20180801 1200 0 \n", "2 ecmf t isobaricInhPa 850 20180801 1200 0 \n", "\n", " dataType number gridType \n", "0 an 0 regular_ll \n", "1 an 0 regular_ll \n", "2 an 0 regular_ll " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds = ekd.from_source(\n", " \"url\", \n", " \"https://sites.ecmwf.int/repository/earthkit-data/examples/test6.grib\",\n", " parts=[(0, 240), (480, 480)])\n", "ds.ls()" ] }, { "cell_type": "markdown", "id": "a9b26883-6f17-4933-90ad-d2342ac1a26b", "metadata": {}, "source": [ "Parts cannot overlap." ] }, { "cell_type": "code", "execution_count": 7, "id": "01e41b9b-6900-4057-bbe8-958ba2b021bf", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Offsets and lengths must be in order, and not overlapping: offset=220, end of previous part=240\n" ] } ], "source": [ "try:\n", " ds = ekd.from_source(\n", " \"url\", \n", " \"https://sites.ecmwf.int/repository/earthkit-data/examples/test6.grib\",\n", " parts=[(0, 240), (220, 240)])\n", "except Exception as e:\n", " print(e)" ] }, { "cell_type": "markdown", "id": "f090b522-b8dc-4d54-8b49-3ea2a78ed16f", "metadata": {}, "source": [ "### Multiple files" ] }, { "cell_type": "raw", "id": "28957198-65d5-480d-809d-26a1aac27a30", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "When using multiple URLs we can specify the :ref:`part ` for each file with the following syntax:" ] }, { "cell_type": "code", "execution_count": 8, "id": "4b1a35fe-db43-4e33-b00e-f55c5c2332d1", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "5df52b2ee6b04bff8bbaa8fef75cd001", "version_major": 2, "version_minor": 0 }, "text/plain": [ ": 0%| | 0.00/0.98k [00:00\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
centreshortNametypeOfLevelleveldataDatedataTimestepRangedataTypenumbergridType
0ecmf2tsurface02020051312000an0regular_ll
1ecmftisobaricInhPa10002018080112000an0regular_ll
2ecmfvisobaricInhPa10002018080112000an0regular_ll
\n", "" ], "text/plain": [ " centre shortName typeOfLevel level dataDate dataTime stepRange \\\n", "0 ecmf 2t surface 0 20200513 1200 0 \n", "1 ecmf t isobaricInhPa 1000 20180801 1200 0 \n", "2 ecmf v isobaricInhPa 1000 20180801 1200 0 \n", "\n", " dataType number gridType \n", "0 an 0 regular_ll \n", "1 an 0 regular_ll \n", "2 an 0 regular_ll " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds = ekd.from_source(\"url\", [\n", " [\"https://sites.ecmwf.int/repository/earthkit-data/examples/test.grib\", (0,526)], \n", " [\"https://sites.ecmwf.int/repository/earthkit-data/examples/test6.grib\", [(0, 240), (480, 240)]]\n", " ])\n", "ds.ls()" ] }, { "cell_type": "markdown", "id": "b77c770f-0f3c-48f4-afe4-bcac35d424a0", "metadata": {}, "source": [ "When a part is None for a given file the whole file will be used." ] }, { "cell_type": "code", "execution_count": 9, "id": "ba09e8ea-19c1-4763-b603-acedad380ed5", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "601c3fcadc19440db059514c0a129164", "version_major": 2, "version_minor": 0 }, "text/plain": [ ": 0%| | 0.00/1.50k [00:00\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
centreshortNametypeOfLevelleveldataDatedataTimestepRangedataTypenumbergridType
0ecmf2tsurface02020051312000an0regular_ll
1ecmfmslsurface02020051312000an0regular_ll
2ecmftisobaricInhPa10002018080112000an0regular_ll
3ecmfvisobaricInhPa10002018080112000an0regular_ll
\n", "" ], "text/plain": [ " centre shortName typeOfLevel level dataDate dataTime stepRange \\\n", "0 ecmf 2t surface 0 20200513 1200 0 \n", "1 ecmf msl surface 0 20200513 1200 0 \n", "2 ecmf t isobaricInhPa 1000 20180801 1200 0 \n", "3 ecmf v isobaricInhPa 1000 20180801 1200 0 \n", "\n", " dataType number gridType \n", "0 an 0 regular_ll \n", "1 an 0 regular_ll \n", "2 an 0 regular_ll \n", "3 an 0 regular_ll " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds = ekd.from_source(\n", " \"url\", [\n", " [\"https://sites.ecmwf.int/repository/earthkit-data/examples/test.grib\", None], \n", " [\"https://sites.ecmwf.int/repository/earthkit-data/examples/test6.grib\", [(0,240), (480, 240)]]\n", " ])\n", "ds.ls()" ] }, { "cell_type": "markdown", "id": "77361bdd-5294-4a0b-8b9e-0a283483ff80", "metadata": {}, "source": [ "The **parts** kwarg can still be used for multiple files; in this case it will be applied to each of them one by one." ] }, { "cell_type": "code", "execution_count": 10, "id": "007d715f-291a-4f17-b252-7cf717ad7426", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "8ed4d0ae0b7e45dca0d693db672ea014", "version_major": 2, "version_minor": 0 }, "text/plain": [ ": 0%| | 0.00/480 [00:00\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
centreshortNametypeOfLevelleveldataDatedataTimestepRangedataTypenumbergridType
0ecmftisobaricInhPa10002018080112000an0regular_ll
1ecmftisobaricInhPa10002018080112000an0regular_ll
\n", "" ], "text/plain": [ " centre shortName typeOfLevel level dataDate dataTime stepRange \\\n", "0 ecmf t isobaricInhPa 1000 20180801 1200 0 \n", "1 ecmf t isobaricInhPa 1000 20180801 1200 0 \n", "\n", " dataType number gridType \n", "0 an 0 regular_ll \n", "1 an 0 regular_ll " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds = ekd.from_source(\n", " \"url\", \n", " [\"https://sites.ecmwf.int/repository/earthkit-data/examples/test6.grib\",\n", " \"https://sites.ecmwf.int/repository/earthkit-data/examples/tuv_pl.grib\"], \n", " parts=(0,240))\n", "ds.ls()" ] }, { "cell_type": "code", "execution_count": null, "id": "53be8b67-8260-456b-bc58-12aa4a3380d3", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "dev", "language": "python", "name": "dev" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.1" } }, "nbformat": 4, "nbformat_minor": 5 }