{ "cells": [ { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "## NetCDF: working with fieldlists" ] }, { "attachments": {}, "cell_type": "raw", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "We :ref:`read ` a NetCDF file containing 3 variables on pressure levels on a 2D latitude-longitude grid. First we ensure the example file is available." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "import earthkit.data as ekd\n", "ekd.download_example_file(\"tuv_pl.nc\")" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "ds = ekd.from_source(\"file\", \"tuv_pl.nc\")" ] }, { "cell_type": "raw", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "Our NetCDF data is represented as a :py:class:`~data.core.fieldlist.FieldList` consisting of NetCDFFields. :py:class:`~data.core.fieldlist.Field` in this context means a 2D geographical coverage (horizontal slices). " ] }, { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "### Iteration" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can itearte through the fields (we use the first 3 fields for simplicity):" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NetCDFField(t,level=1000,time=2018-08-01 12:00:00)\n", "NetCDFField(t,level=850,time=2018-08-01 12:00:00)\n", "NetCDFField(t,level=700,time=2018-08-01 12:00:00)\n" ] } ], "source": [ "for f in ds[:3]:\n", " print(f)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Inspecting the contents" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "18" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(ds)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
variablelevelvalid_datetimeunits
0t10002018-08-01T12:00:00K
1t8502018-08-01T12:00:00K
2t7002018-08-01T12:00:00K
3t5002018-08-01T12:00:00K
4t4002018-08-01T12:00:00K
5t3002018-08-01T12:00:00K
6u10002018-08-01T12:00:00m s**-1
7u8502018-08-01T12:00:00m s**-1
8u7002018-08-01T12:00:00m s**-1
9u5002018-08-01T12:00:00m s**-1
10u4002018-08-01T12:00:00m s**-1
11u3002018-08-01T12:00:00m s**-1
12v10002018-08-01T12:00:00m s**-1
13v8502018-08-01T12:00:00m s**-1
14v7002018-08-01T12:00:00m s**-1
15v5002018-08-01T12:00:00m s**-1
16v4002018-08-01T12:00:00m s**-1
17v3002018-08-01T12:00:00m s**-1
\n", "
" ], "text/plain": [ " variable level valid_datetime units\n", "0 t 1000 2018-08-01T12:00:00 K\n", "1 t 850 2018-08-01T12:00:00 K\n", "2 t 700 2018-08-01T12:00:00 K\n", "3 t 500 2018-08-01T12:00:00 K\n", "4 t 400 2018-08-01T12:00:00 K\n", "5 t 300 2018-08-01T12:00:00 K\n", "6 u 1000 2018-08-01T12:00:00 m s**-1\n", "7 u 850 2018-08-01T12:00:00 m s**-1\n", "8 u 700 2018-08-01T12:00:00 m s**-1\n", "9 u 500 2018-08-01T12:00:00 m s**-1\n", "10 u 400 2018-08-01T12:00:00 m s**-1\n", "11 u 300 2018-08-01T12:00:00 m s**-1\n", "12 v 1000 2018-08-01T12:00:00 m s**-1\n", "13 v 850 2018-08-01T12:00:00 m s**-1\n", "14 v 700 2018-08-01T12:00:00 m s**-1\n", "15 v 500 2018-08-01T12:00:00 m s**-1\n", "16 v 400 2018-08-01T12:00:00 m s**-1\n", "17 v 300 2018-08-01T12:00:00 m s**-1" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds.ls()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Slicing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Standard Python slicing is available." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "NetCDFField(t,level=850,time=2018-08-01 12:00:00)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "g = ds[1]\n", "g" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
variablelevelvalid_datetimeunits
0t8502018-08-01T12:00:00K
1t7002018-08-01T12:00:00K
\n", "
" ], "text/plain": [ " variable level valid_datetime units\n", "0 t 850 2018-08-01T12:00:00 K\n", "1 t 700 2018-08-01T12:00:00 K" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "g = ds[1:3]\n", "g.ls()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "NetCDFField(v,level=300,time=2018-08-01 12:00:00)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "g = ds[-1]\n", "g" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Getting data values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Using values" ] }, { "cell_type": "raw", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "The :py:attr:`~data.core.fieldlist.Field.values` property always returns a flat array per field:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "(84,)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "v = ds[0].values\n", "v.shape" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([272.56486405, 272.56486405, 272.56486405, 272.56486405])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "v[0:4]" ] }, { "cell_type": "raw", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "When called on the whole fieldlist :py:attr:`~data.core.fieldlist.FieldList.values` returns a 2D array:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "(18, 84)" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "v = ds.values\n", "v.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Using to_numpy()" ] }, { "cell_type": "raw", "metadata": { "editable": true, "raw_mimetype": "text/restructuredtext", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "With :py:meth:`~data.core.fieldlist.Field.to_numpy` the field shape is set on the array: " ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(7, 12)\n", "(7, 12)\n" ] } ], "source": [ "v = ds[0].to_numpy()\n", "print(v.shape)\n", "print(ds[0].shape)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(18, 7, 12)" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "v = ds.to_numpy()\n", "v.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Metadata" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Metadata access works both on individual fields and slices:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'t'" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds[0].metadata(\"variable\")" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[[1000, 't'], [850, 't']]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds[0:2].metadata([\"level\", \"variable\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and on all the fields:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1000,\n", " 850,\n", " 700,\n", " 500,\n", " 400,\n", " 300,\n", " 1000,\n", " 850,\n", " 700,\n", " 500,\n", " 400,\n", " 300,\n", " 1000,\n", " 850,\n", " 700,\n", " 500,\n", " 400,\n", " 300]" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds.metadata(\"level\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For each filed we can get the metadata as an object:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "NetCDFMetadata({'units': 'K', 'long_name': 'Temperature', 'standard_name': 'air_temperature', 'date': 20180801, 'time': 1200, 'variable': 't', 'level': 1000, 'levtype': 'level'})" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "md = ds[0].metadata()\n", "md" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1000" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "md[\"level\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Selection" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Selection by metadata is always creating a \"view\", no copying of data is involved." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
variablelevelvalid_datetimeunits
0u8502018-08-01T12:00:00m s**-1
1v8502018-08-01T12:00:00m s**-1
\n", "
" ], "text/plain": [ " variable level valid_datetime units\n", "0 u 850 2018-08-01T12:00:00 m s**-1\n", "1 v 850 2018-08-01T12:00:00 m s**-1" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "g = ds.sel(variable=[\"u\", \"v\"], level=850)\n", "g.ls()" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
variablelevelvalid_datetimeunits
0t10002018-08-01T12:00:00K
1t8502018-08-01T12:00:00K
2t7002018-08-01T12:00:00K
3t5002018-08-01T12:00:00K
4t4002018-08-01T12:00:00K
5t3002018-08-01T12:00:00K
\n", "
" ], "text/plain": [ " variable level valid_datetime units\n", "0 t 1000 2018-08-01T12:00:00 K\n", "1 t 850 2018-08-01T12:00:00 K\n", "2 t 700 2018-08-01T12:00:00 K\n", "3 t 500 2018-08-01T12:00:00 K\n", "4 t 400 2018-08-01T12:00:00 K\n", "5 t 300 2018-08-01T12:00:00 K" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "g = ds.sel(variable=\"t\")\n", "g.ls()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Xarray" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Xarray conversion does not involve disk writing." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset> Size: 12kB\n",
       "Dimensions:    (longitude: 12, latitude: 7, level: 6, time: 1)\n",
       "Coordinates:\n",
       "  * longitude  (longitude) float32 48B 0.0 30.0 60.0 90.0 ... 270.0 300.0 330.0\n",
       "  * latitude   (latitude) float32 28B 90.0 60.0 30.0 0.0 -30.0 -60.0 -90.0\n",
       "  * level      (level) int32 24B 1000 850 700 500 400 300\n",
       "  * time       (time) datetime64[ns] 8B 2018-08-01T12:00:00\n",
       "Data variables:\n",
       "    t          (time, level, latitude, longitude) float64 4kB dask.array<chunksize=(1, 6, 7, 12), meta=np.ndarray>\n",
       "    u          (time, level, latitude, longitude) float64 4kB dask.array<chunksize=(1, 6, 7, 12), meta=np.ndarray>\n",
       "    v          (time, level, latitude, longitude) float64 4kB dask.array<chunksize=(1, 6, 7, 12), meta=np.ndarray>\n",
       "Attributes:\n",
       "    Conventions:  CF-1.6\n",
       "    history:      2023-08-07 18:24:35 GMT by grib_to_netcdf-2.30.2: grib_to_n...
" ], "text/plain": [ " Size: 12kB\n", "Dimensions: (longitude: 12, latitude: 7, level: 6, time: 1)\n", "Coordinates:\n", " * longitude (longitude) float32 48B 0.0 30.0 60.0 90.0 ... 270.0 300.0 330.0\n", " * latitude (latitude) float32 28B 90.0 60.0 30.0 0.0 -30.0 -60.0 -90.0\n", " * level (level) int32 24B 1000 850 700 500 400 300\n", " * time (time) datetime64[ns] 8B 2018-08-01T12:00:00\n", "Data variables:\n", " t (time, level, latitude, longitude) float64 4kB dask.array\n", " u (time, level, latitude, longitude) float64 4kB dask.array\n", " v (time, level, latitude, longitude) float64 4kB dask.array\n", "Attributes:\n", " Conventions: CF-1.6\n", " history: 2023-08-07 18:24:35 GMT by grib_to_netcdf-2.30.2: grib_to_n..." ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds1 = ds.to_xarray()\n", "ds1" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "dev_ecc", "language": "python", "name": "dev_ecc" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.13" } }, "nbformat": 4, "nbformat_minor": 4 }