NetCDF: working with fieldlists
We read a NetCDF file containing 3 variables on pressure levels on a 2D latitude-longitude grid. First we ensure the example file is available.
[1]:
import earthkit.data as ekd
ekd.download_example_file("tuv_pl.nc")
[2]:
ds = ekd.from_source("file", "tuv_pl.nc")
Our NetCDF data is represented as a FieldList consisting of NetCDFFields. Field in this context means a 2D geographical coverage (horizontal slices).
Iteration
We can itearte through the fields (we use the first 3 fields for simplicity):
[3]:
for f in ds[:3]:
print(f)
NetCDFField(t,level=1000,time=2018-08-01 12:00:00)
NetCDFField(t,level=850,time=2018-08-01 12:00:00)
NetCDFField(t,level=700,time=2018-08-01 12:00:00)
Inspecting the contents
[4]:
len(ds)
[4]:
18
[5]:
ds.ls()
[5]:
| variable | level | valid_datetime | units | |
|---|---|---|---|---|
| 0 | t | 1000 | 2018-08-01T12:00:00 | K |
| 1 | t | 850 | 2018-08-01T12:00:00 | K |
| 2 | t | 700 | 2018-08-01T12:00:00 | K |
| 3 | t | 500 | 2018-08-01T12:00:00 | K |
| 4 | t | 400 | 2018-08-01T12:00:00 | K |
| 5 | t | 300 | 2018-08-01T12:00:00 | K |
| 6 | u | 1000 | 2018-08-01T12:00:00 | m s**-1 |
| 7 | u | 850 | 2018-08-01T12:00:00 | m s**-1 |
| 8 | u | 700 | 2018-08-01T12:00:00 | m s**-1 |
| 9 | u | 500 | 2018-08-01T12:00:00 | m s**-1 |
| 10 | u | 400 | 2018-08-01T12:00:00 | m s**-1 |
| 11 | u | 300 | 2018-08-01T12:00:00 | m s**-1 |
| 12 | v | 1000 | 2018-08-01T12:00:00 | m s**-1 |
| 13 | v | 850 | 2018-08-01T12:00:00 | m s**-1 |
| 14 | v | 700 | 2018-08-01T12:00:00 | m s**-1 |
| 15 | v | 500 | 2018-08-01T12:00:00 | m s**-1 |
| 16 | v | 400 | 2018-08-01T12:00:00 | m s**-1 |
| 17 | v | 300 | 2018-08-01T12:00:00 | m s**-1 |
Slicing
Standard Python slicing is available.
[6]:
g = ds[1]
g
[6]:
NetCDFField(t,level=850,time=2018-08-01 12:00:00)
[7]:
g = ds[1:3]
g.ls()
[7]:
| variable | level | valid_datetime | units | |
|---|---|---|---|---|
| 0 | t | 850 | 2018-08-01T12:00:00 | K |
| 1 | t | 700 | 2018-08-01T12:00:00 | K |
[8]:
g = ds[-1]
g
[8]:
NetCDFField(v,level=300,time=2018-08-01 12:00:00)
Getting data values
Using values
The values property always returns a flat array per field:
[9]:
v = ds[0].values
v.shape
[9]:
(84,)
[10]:
v[0:4]
[10]:
array([272.56486405, 272.56486405, 272.56486405, 272.56486405])
When called on the whole fieldlist values returns a 2D array:
[11]:
v = ds.values
v.shape
[11]:
(18, 84)
Using to_numpy()
With to_numpy() the field shape is set on the array:
[12]:
v = ds[0].to_numpy()
print(v.shape)
print(ds[0].shape)
(7, 12)
(7, 12)
[13]:
v = ds.to_numpy()
v.shape
[13]:
(18, 7, 12)
Metadata
Metadata access works both on individual fields and slices:
[14]:
ds[0].metadata("variable")
[14]:
't'
[15]:
ds[0:2].metadata(["level", "variable"])
[15]:
[[1000, 't'], [850, 't']]
and on all the fields:
[16]:
ds.metadata("level")
[16]:
[1000,
850,
700,
500,
400,
300,
1000,
850,
700,
500,
400,
300,
1000,
850,
700,
500,
400,
300]
For each filed we can get the metadata as an object:
[17]:
md = ds[0].metadata()
md
[17]:
NetCDFMetadata({'units': 'K', 'long_name': 'Temperature', 'standard_name': 'air_temperature', 'date': 20180801, 'time': 1200, 'variable': 't', 'level': 1000, 'levtype': 'level'})
[18]:
md["level"]
[18]:
1000
Selection
Selection by metadata is always creating a “view”, no copying of data is involved.
[19]:
g = ds.sel(variable=["u", "v"], level=850)
g.ls()
[19]:
| variable | level | valid_datetime | units | |
|---|---|---|---|---|
| 0 | u | 850 | 2018-08-01T12:00:00 | m s**-1 |
| 1 | v | 850 | 2018-08-01T12:00:00 | m s**-1 |
[20]:
g = ds.sel(variable="t")
g.ls()
[20]:
| variable | level | valid_datetime | units | |
|---|---|---|---|---|
| 0 | t | 1000 | 2018-08-01T12:00:00 | K |
| 1 | t | 850 | 2018-08-01T12:00:00 | K |
| 2 | t | 700 | 2018-08-01T12:00:00 | K |
| 3 | t | 500 | 2018-08-01T12:00:00 | K |
| 4 | t | 400 | 2018-08-01T12:00:00 | K |
| 5 | t | 300 | 2018-08-01T12:00:00 | K |
Xarray
Xarray conversion does not involve disk writing.
[21]:
ds1 = ds.to_xarray()
ds1
[21]:
<xarray.Dataset> Size: 12kB
Dimensions: (longitude: 12, latitude: 7, level: 6, time: 1)
Coordinates:
* longitude (longitude) float32 48B 0.0 30.0 60.0 90.0 ... 270.0 300.0 330.0
* latitude (latitude) float32 28B 90.0 60.0 30.0 0.0 -30.0 -60.0 -90.0
* level (level) int32 24B 1000 850 700 500 400 300
* time (time) datetime64[ns] 8B 2018-08-01T12:00:00
Data variables:
t (time, level, latitude, longitude) float64 4kB dask.array<chunksize=(1, 6, 7, 12), meta=np.ndarray>
u (time, level, latitude, longitude) float64 4kB dask.array<chunksize=(1, 6, 7, 12), meta=np.ndarray>
v (time, level, latitude, longitude) float64 4kB dask.array<chunksize=(1, 6, 7, 12), meta=np.ndarray>
Attributes:
Conventions: CF-1.6
history: 2023-08-07 18:24:35 GMT by grib_to_netcdf-2.30.2: grib_to_n...[ ]: