Reading data from URLs

Using individual URLs

We can read individual files from URLs with from_source():

[1]:
import earthkit.data as ekd

fs = ekd.from_source("url",
                       "https://sites.ecmwf.int/repository/earthkit-data/examples/test.grib")
[2]:
fs.ls()
[2]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType
0 ecmf 2t surface 0 20200513 1200 0 an 0 regular_ll
1 ecmf msl surface 0 20200513 1200 0 an 0 regular_ll

Tar and zip archives can also be loaded from a URL:

[3]:
fs = ekd.from_source("url",
                       "https://sites.ecmwf.int/repository/earthkit-data/examples/test_gribs.tar")
[4]:
fs.ls()
[4]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType
0 ecmf 2t surface 0 20200513 1200 0 an 0 regular_ll
1 ecmf msl surface 0 20200513 1200 0 an 0 regular_ll
2 ecmf t isobaricInhPa 500 20070101 1200 0 an 0 regular_ll
3 ecmf z isobaricInhPa 500 20070101 1200 0 an 0 regular_ll
4 ecmf t isobaricInhPa 850 20070101 1200 0 an 0 regular_ll
5 ecmf z isobaricInhPa 850 20070101 1200 0 an 0 regular_ll

Using multiple URLs

We can access a list of URLs in one go. In the example below the first file contains 2 fields while the second one 4 fields.

[5]:
fs = ekd.from_source("url",
                       ["https://sites.ecmwf.int/repository/earthkit-data/examples/test.grib",
                        "https://sites.ecmwf.int/repository/earthkit-data/examples/test4.grib"])
fs.ls()
[5]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType
0 ecmf 2t surface 0 20200513 1200 0 an 0 regular_ll
1 ecmf msl surface 0 20200513 1200 0 an 0 regular_ll
2 ecmf t isobaricInhPa 500 20070101 1200 0 an 0 regular_ll
3 ecmf z isobaricInhPa 500 20070101 1200 0 an 0 regular_ll
4 ecmf t isobaricInhPa 850 20070101 1200 0 an 0 regular_ll
5 ecmf z isobaricInhPa 850 20070101 1200 0 an 0 regular_ll

Using URL patterns

URLs can also be specified by using url-patterns. In the example below when pattern “id” is substituted it will match two files: test4.grib and test6.grib:

[6]:
fs = ekd.from_source("url-pattern",
                        "https://sites.ecmwf.int/repository/earthkit-data/examples/test{id}.grib",
                        {"id": [4, 6]})
fs.ls()
[6]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType
0 ecmf t isobaricInhPa 500 20070101 1200 0 an 0 regular_ll
1 ecmf z isobaricInhPa 500 20070101 1200 0 an 0 regular_ll
2 ecmf t isobaricInhPa 850 20070101 1200 0 an 0 regular_ll
3 ecmf z isobaricInhPa 850 20070101 1200 0 an 0 regular_ll
4 ecmf t isobaricInhPa 1000 20180801 1200 0 an 0 regular_ll
5 ecmf u isobaricInhPa 1000 20180801 1200 0 an 0 regular_ll
6 ecmf v isobaricInhPa 1000 20180801 1200 0 an 0 regular_ll
7 ecmf t isobaricInhPa 850 20180801 1200 0 an 0 regular_ll
8 ecmf u isobaricInhPa 850 20180801 1200 0 an 0 regular_ll
9 ecmf v isobaricInhPa 850 20180801 1200 0 an 0 regular_ll

We can specify a format for each pattern. In this example “my_date” is the pattern name and “:date(%Y-%m-%d)” specifies the format:

[7]:
import datetime

fs = ekd.from_source(
    "url-pattern",
    "https://sites.ecmwf.int/repository/earthkit-data/test-data/test_{my_date:date(%Y-%m-%d)}_{name}.grib",
    {"my_date": datetime.datetime(2020,5,13), "name": ["t2","msl"]})
fs.ls()

[7]:
centre shortName typeOfLevel level dataDate dataTime stepRange dataType number gridType
0 ecmf 2t surface 0 20200513 1200 0 an 0 regular_ll
1 ecmf msl surface 0 20200513 1200 0 an 0 regular_ll
[ ]: