Migration guide for 1.0.0

from_source

The returned object

The return type of the from_source() function was changed and now it returns a data object object. This object provides some basic information about the data but its primary goal is to convert it to a given representation for further work. The actual data loading is deferred as much as possible, until the data is converted into a given type.

For example, when we read GRIB data with from_source(), it returns a data object that can be converted to a fieldlist with to_fieldlist(). Previously, from_source() returned a fieldlist directly. E.g.:

Old way:

import earthkit.data as ekd

fl = ekd.from_source("file", "test6.grib")

New way:

import earthkit.data as ekd

fl = ekd.from_source("file", "test6.grib").to_fieldlist()

The list of available conversion types can be quickly get by calling Data.available_types on the returned object. E.g.:

import earthkit.data as ekd

data = ekd.from_source("file", "test6.grib")
print(data.available_types)
['fieldlist', 'xarray', 'pandas', 'numpy']

Then we can call any of the corresponding to_* methods to convert the data to the desired type. E.g. to convert to an Xarray Dataset we can do:

import earthkit.data as ekd

data = ekd.from_source("file", "test6.grib")
ds = data.to_xarray()

Examples:

The read_all kwarg

Previously, we could use the read_all kwarg in from_source() when stream=True was also set (this latter is only available in sources supporting streams). This is now removed and the same functionality can be achieved by passing read_all as a kwarg to to_fieldlist(). E.g.:

Old way:

import earthkit.data as ekd

url = "https://sites.ecmwf.int/repository/earthkit-data/how-tos/test.grib"
fl = ekd.from_source("url", url, stream=True, read_all=True)

New way:

import earthkit.data as ekd

url = "https://sites.ecmwf.int/repository/earthkit-data/how-tos/test.grib"
fl = ekd.from_source("url", url, stream=True).to_fieldlist(read_all=True)

See more details in Reading all the data into memory.

Concatenation

Previously, fiedllists and some sources could be concatenated using the + operator. This has been replaced with the concat function:

from earthkit.data import concat

ds3 = concat(ds1, ds2)

Please note that + operator is used an arithmetic operator for Fields and Fieldlists, so it is still available but with a different meaning.

Field

The Field is now not polymorphic but is made up of polymorphic components using format independent metadata. So far the following components are implemented:

Each component has its own set of metadata keys and methods. There are two ways to access the related values from the components:

# use the get() method
f.get("time.base_datetime")

# use the key method on the component
f.time.base_datetime()

Raw metadata keys are still available but they are only accessible either by using the “metadata.” prefix in get() or through the metadata() method. E.g. if the Field was created from a GRIB message, we can access the “shortName” key from the raw metadata like this:

f.get("metadata.shortName")
f.metadata("shortName")
f.metadata("metadata.shortName")

Field modification

Fields can be modified using the set() method. This method allows to set new data values and/or change metadata keys. See the notebook examples:

Field arithmetic

Added Field arithmetic. The basic maths operators can now be used to perform arithmetic operations on fields. The operations are performed on the data arrays of the fields, and the resulting field has the same metadata as the left operand and will be entirely stored in memory. For example:

f3 = f1 + f2
f4 = f1 - f2
f5 = f1 * f2
f6 = f1 / f2

Notebook examples

  • /how-tos/field/overview.ipynb

Changes in the Field API

The Field API has been redesigned and many methods have been removed or changed. The following table gives an overview of the changes in the Field API:

Old API

New API

Notes

to_numpy()

to_numpy()

New kwarg: copy=True

to_array()

to_array()

New kwarg: copy=True

to_latlon()

N/A

Use f.geography.latlons(). This returns a tuple of arrays (lats, lons).

to_points()

N/A

Use: f.geography.points(), f.geography.xys(). These functions return a tuple of arrays (x, y)

grid_points()

N/A

Use: f.geography.latlons(). This returns a tuple of arrays (lats, lons).

projection()

N/A

Use: f.geography.projection()

bounding_box()

N/A

Use: f.geography.bounding_box()

clone()

N/A

Functionality not needed. Use f.set() instead

copy()

N/A

Functionality not needed. Use f.set() instead

as_namespace()

N/A

datetime()

N/A

Use f.time.base_datetime() and f.time.valid_datetime() instead.

valid_datetime()

N/A

Use f.time.valid_datetime()

base_datetime()

N/A

Use f.time.base_datetime()

metadata()

metadata()

Has limited scope now. Can only access keys in the raw metadata belonging to the object the field was created from. E.g. for GRIB this works:

f.metadata("shortName")
f.metadata("metadata.shortName")

When the key does not exist in the raw metadata, it raises a KeyError.

MetaData object accessed by calling metadata() without args/kwargs

N/A

dump()

N/A

Use: f.describe()

describe()

Still exists but functionality changed.

handle

N/A

mars_area

N/A

Use: f.geography.area()

mars_grid

N/A

resolution

N/A

rotation

N/A

N/A

grid_points_unrotated()

N/A

N/A

save()

N/A

Use: f.to_target()

write()

N/A

Use: f.to_target()

Fieldlist

FieldList arithmetic

Added FieldList arithmetic. The basic maths operators can now be used to perform arithmetic operations on fieldlists. The operations are performed on the data arrays of the fieldlists, and the resulting fieldlist has the same metadata as the left operand and will be entirely stored in memory. For example:

fl3 = fl1 + fl2
fl4 = fl1 - fl2
fl5 = fl1 * fl2
fl6 = fl1 / fl2

Changes in the FieldList API

The following table gives an overview of the changes in the Fieldlist API:

Old API

New API

Notes

to_numpy()

to_numpy()

New kwarg: copy=True

to_array()

to_array()

New kwarg: copy=True

to_latlon()

N/A

Use fl.geography.latlons(). This returns a tuple of arrays (lats, lons)

to_points()

N/A

Use: fl.geography.points(), fl.geography.xys(). These functions return a tuple of arrays (x, y)

projection()

N/A

Use: fl.geography.projection()

bounding_box()

N/A

Use: fl.geography.bounding_box()

datetime()

N/A

Use fl.time.base_datetime() and fl.time.valid_datetime() instead.

metadata()

metadata()

Has limited scope now. Can only access keys in the raw metadata belonging to the object the field was created from. E.g. for GRIB this works:

f.metadata("shortName")
f.metadata("metadata.shortName")

When the key does not exist in the raw metadata, it raises a KeyError.

save()

N/A

Use: f.to_target()

write()

N/A

Use: f.to_target()

Xarray engine

The Xarray engine has been refactored and many of the internal classes and methods have been changed. The following list gives an overview of the changes in the Xarray engine:

  • a new default profile earthkit has been added which is used when no profile is specified. This profile is designed to work with the new format independent metadata keys from Field to generate the Xarray dataset.

  • the old mars and grib profiles were kept but they are now using some of the new format independent metadata keys to generate the Xarray dataset.

  • the “number” dim_role was renamed to “member” in line with the new format independent metadata keys. See: Dimensions for more details.

  • the time_dim_mode kwarg in to_xarray() was replaced by time_dims and the meaning of some temporal dimensions in the dim_roles also changed. See Temporal dimensions for more details.