Reading data from URLs
Using individual URLs
We can read individual files from URLs with from_source():
[1]:
import earthkit.data as ekd
fs = ekd.from_source("url",
"https://sites.ecmwf.int/repository/earthkit-data/examples/test.grib")
[2]:
fs.ls()
[2]:
| centre | shortName | typeOfLevel | level | dataDate | dataTime | stepRange | dataType | number | gridType | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ecmf | 2t | surface | 0 | 20200513 | 1200 | 0 | an | 0 | regular_ll |
| 1 | ecmf | msl | surface | 0 | 20200513 | 1200 | 0 | an | 0 | regular_ll |
Tar and zip archives can also be loaded from a URL:
[3]:
fs = ekd.from_source("url",
"https://sites.ecmwf.int/repository/earthkit-data/examples/test_gribs.tar")
[4]:
fs.ls()
[4]:
| centre | shortName | typeOfLevel | level | dataDate | dataTime | stepRange | dataType | number | gridType | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ecmf | 2t | surface | 0 | 20200513 | 1200 | 0 | an | 0 | regular_ll |
| 1 | ecmf | msl | surface | 0 | 20200513 | 1200 | 0 | an | 0 | regular_ll |
| 2 | ecmf | t | isobaricInhPa | 500 | 20070101 | 1200 | 0 | an | 0 | regular_ll |
| 3 | ecmf | z | isobaricInhPa | 500 | 20070101 | 1200 | 0 | an | 0 | regular_ll |
| 4 | ecmf | t | isobaricInhPa | 850 | 20070101 | 1200 | 0 | an | 0 | regular_ll |
| 5 | ecmf | z | isobaricInhPa | 850 | 20070101 | 1200 | 0 | an | 0 | regular_ll |
Using multiple URLs
We can access a list of URLs in one go. In the example below the first file contains 2 fields while the second one 4 fields.
[5]:
fs = ekd.from_source("url",
["https://sites.ecmwf.int/repository/earthkit-data/examples/test.grib",
"https://sites.ecmwf.int/repository/earthkit-data/examples/test4.grib"])
fs.ls()
[5]:
| centre | shortName | typeOfLevel | level | dataDate | dataTime | stepRange | dataType | number | gridType | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ecmf | 2t | surface | 0 | 20200513 | 1200 | 0 | an | 0 | regular_ll |
| 1 | ecmf | msl | surface | 0 | 20200513 | 1200 | 0 | an | 0 | regular_ll |
| 2 | ecmf | t | isobaricInhPa | 500 | 20070101 | 1200 | 0 | an | 0 | regular_ll |
| 3 | ecmf | z | isobaricInhPa | 500 | 20070101 | 1200 | 0 | an | 0 | regular_ll |
| 4 | ecmf | t | isobaricInhPa | 850 | 20070101 | 1200 | 0 | an | 0 | regular_ll |
| 5 | ecmf | z | isobaricInhPa | 850 | 20070101 | 1200 | 0 | an | 0 | regular_ll |
Using URL patterns
URLs can also be specified by using url-patterns. In the example below when pattern “id” is substituted it will match two files: test4.grib and test6.grib:
[6]:
fs = ekd.from_source("url-pattern",
"https://sites.ecmwf.int/repository/earthkit-data/examples/test{id}.grib",
{"id": [4, 6]})
fs.ls()
[6]:
| centre | shortName | typeOfLevel | level | dataDate | dataTime | stepRange | dataType | number | gridType | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ecmf | t | isobaricInhPa | 500 | 20070101 | 1200 | 0 | an | 0 | regular_ll |
| 1 | ecmf | z | isobaricInhPa | 500 | 20070101 | 1200 | 0 | an | 0 | regular_ll |
| 2 | ecmf | t | isobaricInhPa | 850 | 20070101 | 1200 | 0 | an | 0 | regular_ll |
| 3 | ecmf | z | isobaricInhPa | 850 | 20070101 | 1200 | 0 | an | 0 | regular_ll |
| 4 | ecmf | t | isobaricInhPa | 1000 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 5 | ecmf | u | isobaricInhPa | 1000 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 6 | ecmf | v | isobaricInhPa | 1000 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 7 | ecmf | t | isobaricInhPa | 850 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 8 | ecmf | u | isobaricInhPa | 850 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
| 9 | ecmf | v | isobaricInhPa | 850 | 20180801 | 1200 | 0 | an | 0 | regular_ll |
We can specify a format for each pattern. In this example “my_date” is the pattern name and “:date(%Y-%m-%d)” specifies the format:
[7]:
import datetime
fs = ekd.from_source(
"url-pattern",
"https://sites.ecmwf.int/repository/earthkit-data/test-data/test_{my_date:date(%Y-%m-%d)}_{name}.grib",
{"my_date": datetime.datetime(2020,5,13), "name": ["t2","msl"]})
fs.ls()
[7]:
| centre | shortName | typeOfLevel | level | dataDate | dataTime | stepRange | dataType | number | gridType | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ecmf | 2t | surface | 0 | 20200513 | 1200 | 0 | an | 0 | regular_ll |
| 1 | ecmf | msl | surface | 0 | 20200513 | 1200 | 0 | an | 0 | regular_ll |
[ ]: