Reading data from URLs

Using individual URLs

We can read individual files from URLs with from_source():

[1]:
import earthkit.data as ekd

fs = ekd.from_source("url", "https://sites.ecmwf.int/repository/earthkit-data/examples/test.grib")
[2]:
fs.to_fieldlist().ls()
[2]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 2t 2020-05-13 12:00:00 2020-05-13 12:00:00 0 days 0 surface 0 regular_ll
1 msl 2020-05-13 12:00:00 2020-05-13 12:00:00 0 days 0 surface 0 regular_ll

Tar and zip archives can also be loaded from a URL:

[3]:
fs = ekd.from_source("url", "https://sites.ecmwf.int/repository/earthkit-data/examples/test_gribs.tar")
[4]:
fs.to_fieldlist().ls()
[4]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 2t 2020-05-13 12:00:00 2020-05-13 12:00:00 0 days 0 surface 0 regular_ll
1 msl 2020-05-13 12:00:00 2020-05-13 12:00:00 0 days 0 surface 0 regular_ll
2 t 2007-01-01 12:00:00 2007-01-01 12:00:00 0 days 500 pressure 0 regular_ll
3 z 2007-01-01 12:00:00 2007-01-01 12:00:00 0 days 500 pressure 0 regular_ll
4 t 2007-01-01 12:00:00 2007-01-01 12:00:00 0 days 850 pressure 0 regular_ll
5 z 2007-01-01 12:00:00 2007-01-01 12:00:00 0 days 850 pressure 0 regular_ll

Using multiple URLs

We can access a list of URLs in one go. In the example below the first file contains 2 fields while the second one 4 fields.

[5]:
fs = ekd.from_source(
    "url",
    [
        "https://sites.ecmwf.int/repository/earthkit-data/examples/test.grib",
        "https://sites.ecmwf.int/repository/earthkit-data/examples/test4.grib",
    ],
)
fs.to_fieldlist().ls()
[5]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 2t 2020-05-13 12:00:00 2020-05-13 12:00:00 0 days 0 surface 0 regular_ll
1 msl 2020-05-13 12:00:00 2020-05-13 12:00:00 0 days 0 surface 0 regular_ll
2 t 2007-01-01 12:00:00 2007-01-01 12:00:00 0 days 500 pressure 0 regular_ll
3 z 2007-01-01 12:00:00 2007-01-01 12:00:00 0 days 500 pressure 0 regular_ll
4 t 2007-01-01 12:00:00 2007-01-01 12:00:00 0 days 850 pressure 0 regular_ll
5 z 2007-01-01 12:00:00 2007-01-01 12:00:00 0 days 850 pressure 0 regular_ll

Using URL patterns

URLs can also be specified by using url-patterns. In the example below when pattern “id” is substituted it will match two files: test4.grib and test6.grib:

[6]:
fs = ekd.from_source(
    "url-pattern", "https://sites.ecmwf.int/repository/earthkit-data/examples/test{id}.grib", {"id": [4, 6]}
)
fs.to_fieldlist().ls()
[6]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 t 2007-01-01 12:00:00 2007-01-01 12:00:00 0 days 500 pressure 0 regular_ll
1 z 2007-01-01 12:00:00 2007-01-01 12:00:00 0 days 500 pressure 0 regular_ll
2 t 2007-01-01 12:00:00 2007-01-01 12:00:00 0 days 850 pressure 0 regular_ll
3 z 2007-01-01 12:00:00 2007-01-01 12:00:00 0 days 850 pressure 0 regular_ll
4 t 2018-08-01 12:00:00 2018-08-01 12:00:00 0 days 1000 pressure 0 regular_ll
5 u 2018-08-01 12:00:00 2018-08-01 12:00:00 0 days 1000 pressure 0 regular_ll
6 v 2018-08-01 12:00:00 2018-08-01 12:00:00 0 days 1000 pressure 0 regular_ll
7 t 2018-08-01 12:00:00 2018-08-01 12:00:00 0 days 850 pressure 0 regular_ll
8 u 2018-08-01 12:00:00 2018-08-01 12:00:00 0 days 850 pressure 0 regular_ll
9 v 2018-08-01 12:00:00 2018-08-01 12:00:00 0 days 850 pressure 0 regular_ll

We can specify a format for each pattern. In this example “my_date” is the pattern name and “:date(%Y-%m-%d)” specifies the format:

[7]:
import datetime

fs = ekd.from_source(
    "url-pattern",
    "https://sites.ecmwf.int/repository/earthkit-data/test-data/test_{my_date:date(%Y-%m-%d)}_{name}.grib",
    {"my_date": datetime.datetime(2020, 5, 13), "name": ["t2", "msl"]},
)
fs.to_fieldlist().ls()
[7]:
parameter.variable time.valid_datetime time.base_datetime time.step vertical.level vertical.level_type ensemble.member geography.grid_type
0 2t 2020-05-13 12:00:00 2020-05-13 12:00:00 0 days 0 surface 0 regular_ll
1 msl 2020-05-13 12:00:00 2020-05-13 12:00:00 0 days 0 surface 0 regular_ll
[ ]: