Reading data from URLs¶

Using individual URLs¶

We can read individual files from URLs with from_source():

[1]:

import earthkit.data as ekd

fs = ekd.from_source("url", "https://sites.ecmwf.int/repository/earthkit-data/examples/test.grib")

[2]:

fs.to_fieldlist().ls()

[2]:

	parameter.variable	time.valid_datetime	time.base_datetime	time.step	vertical.level	vertical.level_type	ensemble.member	geography.grid_type
0	2t	2020-05-13 12:00:00	2020-05-13 12:00:00	0 days	0	surface	0	regular_ll
1	msl	2020-05-13 12:00:00	2020-05-13 12:00:00	0 days	0	surface	0	regular_ll

Tar and zip archives can also be loaded from a URL:

[3]:

fs = ekd.from_source("url", "https://sites.ecmwf.int/repository/earthkit-data/examples/test_gribs.tar")

[4]:

fs.to_fieldlist().ls()

[4]:

	parameter.variable	time.valid_datetime	time.base_datetime	vertical.level	vertical.level_type	geography.grid_type
0	2t	2020-05-13 12:00:00	2020-05-13 12:00:00	0	surface	regular_ll
1	msl	2020-05-13 12:00:00	2020-05-13 12:00:00	0	surface	regular_ll
2	t	2007-01-01 12:00:00	2007-01-01 12:00:00	500	pressure	regular_ll
3	z	2007-01-01 12:00:00	2007-01-01 12:00:00	500	pressure	regular_ll
4	t	2007-01-01 12:00:00	2007-01-01 12:00:00	850	pressure	regular_ll
5	z	2007-01-01 12:00:00	2007-01-01 12:00:00	850	pressure	regular_ll

Using multiple URLs¶

We can access a list of URLs in one go. In the example below the first file contains 2 fields while the second one 4 fields.

[5]:

fs = ekd.from_source(
    "url",
    [
        "https://sites.ecmwf.int/repository/earthkit-data/examples/test.grib",
        "https://sites.ecmwf.int/repository/earthkit-data/examples/test4.grib",
    ],
)
fs.to_fieldlist().ls()

[5]:

	parameter.variable	time.valid_datetime	time.base_datetime	vertical.level	vertical.level_type	geography.grid_type
0	2t	2020-05-13 12:00:00	2020-05-13 12:00:00	0	surface	regular_ll
1	msl	2020-05-13 12:00:00	2020-05-13 12:00:00	0	surface	regular_ll
2	t	2007-01-01 12:00:00	2007-01-01 12:00:00	500	pressure	regular_ll
3	z	2007-01-01 12:00:00	2007-01-01 12:00:00	500	pressure	regular_ll
4	t	2007-01-01 12:00:00	2007-01-01 12:00:00	850	pressure	regular_ll
5	z	2007-01-01 12:00:00	2007-01-01 12:00:00	850	pressure	regular_ll

Using URL patterns¶

URLs can also be specified by using url-patterns. In the example below when pattern “id” is substituted it will match two files: test4.grib and test6.grib:

[6]:

fs = ekd.from_source(
    "url-pattern", "https://sites.ecmwf.int/repository/earthkit-data/examples/test{id}.grib", {"id": [4, 6]}
)
fs.to_fieldlist().ls()

[6]:

	parameter.variable	time.valid_datetime	time.base_datetime	vertical.level	vertical.level_type	geography.grid_type
0	t	2007-01-01 12:00:00	2007-01-01 12:00:00	500	pressure	regular_ll
1	z	2007-01-01 12:00:00	2007-01-01 12:00:00	500	pressure	regular_ll
2	t	2007-01-01 12:00:00	2007-01-01 12:00:00	850	pressure	regular_ll
3	z	2007-01-01 12:00:00	2007-01-01 12:00:00	850	pressure	regular_ll
4	t	2018-08-01 12:00:00	2018-08-01 12:00:00	1000	pressure	regular_ll
5	u	2018-08-01 12:00:00	2018-08-01 12:00:00	1000	pressure	regular_ll
6	v	2018-08-01 12:00:00	2018-08-01 12:00:00	1000	pressure	regular_ll
7	t	2018-08-01 12:00:00	2018-08-01 12:00:00	850	pressure	regular_ll
8	u	2018-08-01 12:00:00	2018-08-01 12:00:00	850	pressure	regular_ll
9	v	2018-08-01 12:00:00	2018-08-01 12:00:00	850	pressure	regular_ll

We can specify a format for each pattern. In this example “my_date” is the pattern name and “:date(%Y-%m-%d)” specifies the format:

[7]:

import datetime

fs = ekd.from_source(
    "url-pattern",
    "https://sites.ecmwf.int/repository/earthkit-data/test-data/test_{my_date:date(%Y-%m-%d)}_{name}.grib",
    {"my_date": datetime.datetime(2020, 5, 13), "name": ["t2", "msl"]},
)
fs.to_fieldlist().ls()

[7]:

	parameter.variable	time.valid_datetime	time.base_datetime	time.step	vertical.level	vertical.level_type	ensemble.member	geography.grid_type
0	2t	2020-05-13 12:00:00	2020-05-13 12:00:00	0 days	0	surface	0	regular_ll
1	msl	2020-05-13 12:00:00	2020-05-13 12:00:00	0 days	0	surface	0	regular_ll

[ ]: