GRIB: using array namespaces

In this example we will use a GRIB file containing 4 messages.

[1]:
import earthkit.data as ekd

fl_in = ekd.from_source("sample", "test4.grib").to_fieldlist()

Using the to_fieldlist() method we can convert this object into a fieldlist in memory were each field stores its values as an array. The array format is controlled by array_namespace keyword argument of to_fieldlist(). When using its default value (None) the underlying array format of the original fieldlist is kept. For GRIB data read from a file or stream this will be “numpy”.

Numpy array fieldlist

The “numpy” fieldlist we generate in the cell below works exactly in the same way as the original one but stores all the data in memory.

[2]:
fl = fl_in.to_fieldlist()
len(fl)
[2]:
4

Pytorch array fieldlist

For the next example we choose the “torch” array namespace. Since pytorch is an optional dependency for earthkit-data we need to ensure it is installed in the environment.

[3]:
!pip install torch --quiet
[4]:
fl = fl_in.to_fieldlist(array_namespace="torch")
[5]:
fl.ls()
[5]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 t 2007-01-01 12:00:00 2007-01-01 12:00:00 0 days 500 pressure 0 regular_ll
1 z 2007-01-01 12:00:00 2007-01-01 12:00:00 0 days 500 pressure 0 regular_ll
2 t 2007-01-01 12:00:00 2007-01-01 12:00:00 0 days 850 pressure 0 regular_ll
3 z 2007-01-01 12:00:00 2007-01-01 12:00:00 0 days 850 pressure 0 regular_ll

values

When we use either Field.values or FieldList.values now we get a pytorch Tensor.

[6]:
fl[0].values[:10]
[6]:
tensor([228.0460, 228.0460, 228.0460, 228.0460, 228.0460, 228.0460, 228.0460,
        228.0460, 228.0460, 228.0460], dtype=torch.float64)
[7]:
fl[0].values.shape
[7]:
torch.Size([65160])
[8]:
fl.values.shape
[8]:
torch.Size([4, 65160])

to_array()

Field.to_array() and FieldList.to_array() return the values based on the underlying namespace.

[9]:
fl[0].to_array()[:2, :2]
[9]:
tensor([[228.0460, 228.0460],
        [228.6085, 228.5792]], dtype=torch.float64)
[10]:
fl.to_array().shape
[10]:
torch.Size([4, 181, 360])
[11]:
fl.to_array(flatten=True).shape
[11]:
torch.Size([4, 65160])

to_numpy()

Field.to_numpy() and FieldList.to_numpy() still return ndarrays.

[12]:
fl[0].to_numpy()[:2, :2]
[12]:
array([[228.04600525, 228.04600525],
       [228.60850525, 228.57920837]])
[13]:
fl.to_numpy().shape
[13]:
(4, 181, 360)

Building a fieldlist in a loop

The following cell adds 2 to each field value and creates a new fieldlist from the modified fields.

[18]:
fields = []
for f in fl:
    fields.append(f.set(values=f.values + 2.0))
r1 = ekd.create_fieldlist(fields)
r1.ls()
[18]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 t 2007-01-01 12:00:00 2007-01-01 12:00:00 0 days 500 pressure 0 regular_ll
1 z 2007-01-01 12:00:00 2007-01-01 12:00:00 0 days 500 pressure 0 regular_ll
2 t 2007-01-01 12:00:00 2007-01-01 12:00:00 0 days 850 pressure 0 regular_ll
3 z 2007-01-01 12:00:00 2007-01-01 12:00:00 0 days 850 pressure 0 regular_ll

As expected, the values in r1 are now differing by 2 from the ones in the original fieldlist (r).

[15]:
r1[0].values[:10]
[15]:
tensor([230.0460, 230.0460, 230.0460, 230.0460, 230.0460, 230.0460, 230.0460,
        230.0460, 230.0460, 230.0460], dtype=torch.float64)

Saving to GRIB

We can save these fieldlists into GRIB.

[16]:
path = "_from_pytroch.grib"
r1.to_target("file", path)
fl1 = ekd.from_source("file", path).to_fieldlist()
fl1.ls()
[16]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 t 2007-01-01 12:00:00 2007-01-01 12:00:00 0 days 500 pressure 0 regular_ll
1 z 2007-01-01 12:00:00 2007-01-01 12:00:00 0 days 500 pressure 0 regular_ll
2 t 2007-01-01 12:00:00 2007-01-01 12:00:00 0 days 850 pressure 0 regular_ll
3 z 2007-01-01 12:00:00 2007-01-01 12:00:00 0 days 850 pressure 0 regular_ll
[17]:
# the modified values were correctly written to the GRIB file
fl1[0].values[:10]
[17]:
array([230.04600525, 230.04600525, 230.04600525, 230.04600525,
       230.04600525, 230.04600525, 230.04600525, 230.04600525,
       230.04600525, 230.04600525])
[ ]: