Xarray: using CuPy

This notebook demonstrates how to use Xarray on a GPU with CuPy. Since CuPy is not a dependency for earthkit-data it has to be installed separately. Also a CUDA-based GPU environment has to be up and running for the notebook to work.

[1]:
# Get GRIB data on pressure levels
import earthkit.data as ekd

ds = ekd.from_source("sample", "pl.grib").to_fieldlist()

[2]:
# Create a lazy loaded Xarray with Numpy arrays
r = ds.to_xarray()
r
[2]:
<xarray.Dataset> Size: 176kB
Dimensions:                  (forecast_reference_time: 4, step: 2, level: 2,
                              latitude: 19, longitude: 36)
Coordinates:
  * forecast_reference_time  (forecast_reference_time) datetime64[ns] 32B 202...
  * step                     (step) timedelta64[ns] 16B 00:00:00 06:00:00
  * level                    (level) int64 16B 500 700
  * latitude                 (latitude) float64 152B 90.0 80.0 ... -80.0 -90.0
  * longitude                (longitude) float64 288B 0.0 10.0 ... 340.0 350.0
Data variables:
    r                        (forecast_reference_time, step, level, latitude, longitude) float64 88kB ...
    t                        (forecast_reference_time, step, level, latitude, longitude) float64 88kB ...
Attributes:
    class:        od
    stream:       oper
    levtype:      pl
    type:         fc
    expver:       0001
    date:         20240603
    time:         0
    domain:       g
    number:       0
    Conventions:  CF-1.8
    institution:  ECMWF
[3]:
type(r.t.data)
[3]:
numpy.ndarray

Move to the GPU as CuPy

We use the to_device() method, which is available on the earthkit Xarray accessor. The first argument specifies the device. When the device is not “cpu” and the array_backend keyword argument is not specified it is automatically set to “cupy”.

[4]:
r_cp = r.earthkit.to_device("cuda:0")
# equivalent code:
# r_cp = r.earthkit.to_device("cuda:0", array_backend="cupy")
[5]:
type(r_cp.t.data)
[5]:
cupy.ndarray
[6]:
# Xarray computations work
r_cp.t.mean()
[6]:
<xarray.DataArray 't' ()> Size: 8B
array(261.56490497)
[7]:
# Alter the values
r_cp += 1
type(r_cp.t.data)
[7]:
cupy.ndarray

Move back to the CPU as Numpy

We use to_device() again to move back the dataset to the cpu. When the device is “cpu” and the array_backend keyword argument is not specified it is automatically set to “numpy”.

[8]:
r_np = r_cp.earthkit.to_device("cpu")
# equivalent code:
# r_np = r.earthkit.to_device("cpu", array_backend="numpy")
[9]:
type(r_np.t.data)
[9]:
numpy.ndarray
[10]:
# The dataset contains the values altered on the GPU
r_np.t.mean()
[10]:
<xarray.DataArray 't' ()> Size: 8B
array(262.56490497)
[ ]: