Reading data from URLs¶
Using individual URLs¶
We can read individual files from URLs with from_source():
[1]:
import earthkit.data as ekd
fs = ekd.from_source("url", "https://sites.ecmwf.int/repository/earthkit-data/examples/test.grib")
[2]:
fs.to_fieldlist().ls()
[2]:
| parameter.variable | time.valid_datetime | time.base_datetime | time.step | vertical.level | vertical.level_type | ensemble.member | geography.grid_type | |
|---|---|---|---|---|---|---|---|---|
| 0 | 2t | 2020-05-13 12:00:00 | 2020-05-13 12:00:00 | 0 days | 0 | surface | 0 | regular_ll |
| 1 | msl | 2020-05-13 12:00:00 | 2020-05-13 12:00:00 | 0 days | 0 | surface | 0 | regular_ll |
Tar and zip archives can also be loaded from a URL:
[3]:
fs = ekd.from_source("url", "https://sites.ecmwf.int/repository/earthkit-data/examples/test_gribs.tar")
[4]:
fs.to_fieldlist().ls()
[4]:
| parameter.variable | time.valid_datetime | time.base_datetime | time.step | vertical.level | vertical.level_type | ensemble.member | geography.grid_type | |
|---|---|---|---|---|---|---|---|---|
| 0 | 2t | 2020-05-13 12:00:00 | 2020-05-13 12:00:00 | 0 days | 0 | surface | 0 | regular_ll |
| 1 | msl | 2020-05-13 12:00:00 | 2020-05-13 12:00:00 | 0 days | 0 | surface | 0 | regular_ll |
| 2 | t | 2007-01-01 12:00:00 | 2007-01-01 12:00:00 | 0 days | 500 | pressure | 0 | regular_ll |
| 3 | z | 2007-01-01 12:00:00 | 2007-01-01 12:00:00 | 0 days | 500 | pressure | 0 | regular_ll |
| 4 | t | 2007-01-01 12:00:00 | 2007-01-01 12:00:00 | 0 days | 850 | pressure | 0 | regular_ll |
| 5 | z | 2007-01-01 12:00:00 | 2007-01-01 12:00:00 | 0 days | 850 | pressure | 0 | regular_ll |
Using multiple URLs¶
We can access a list of URLs in one go. In the example below the first file contains 2 fields while the second one 4 fields.
[5]:
fs = ekd.from_source(
"url",
[
"https://sites.ecmwf.int/repository/earthkit-data/examples/test.grib",
"https://sites.ecmwf.int/repository/earthkit-data/examples/test4.grib",
],
)
fs.to_fieldlist().ls()
[5]:
| parameter.variable | time.valid_datetime | time.base_datetime | time.step | vertical.level | vertical.level_type | ensemble.member | geography.grid_type | |
|---|---|---|---|---|---|---|---|---|
| 0 | 2t | 2020-05-13 12:00:00 | 2020-05-13 12:00:00 | 0 days | 0 | surface | 0 | regular_ll |
| 1 | msl | 2020-05-13 12:00:00 | 2020-05-13 12:00:00 | 0 days | 0 | surface | 0 | regular_ll |
| 2 | t | 2007-01-01 12:00:00 | 2007-01-01 12:00:00 | 0 days | 500 | pressure | 0 | regular_ll |
| 3 | z | 2007-01-01 12:00:00 | 2007-01-01 12:00:00 | 0 days | 500 | pressure | 0 | regular_ll |
| 4 | t | 2007-01-01 12:00:00 | 2007-01-01 12:00:00 | 0 days | 850 | pressure | 0 | regular_ll |
| 5 | z | 2007-01-01 12:00:00 | 2007-01-01 12:00:00 | 0 days | 850 | pressure | 0 | regular_ll |
Using URL patterns¶
URLs can also be specified by using url-patterns. In the example below when pattern “id” is substituted it will match two files: test4.grib and test6.grib:
[6]:
fs = ekd.from_source(
"url-pattern", "https://sites.ecmwf.int/repository/earthkit-data/examples/test{id}.grib", {"id": [4, 6]}
)
fs.to_fieldlist().ls()
[6]:
| parameter.variable | time.valid_datetime | time.base_datetime | time.step | vertical.level | vertical.level_type | ensemble.member | geography.grid_type | |
|---|---|---|---|---|---|---|---|---|
| 0 | t | 2007-01-01 12:00:00 | 2007-01-01 12:00:00 | 0 days | 500 | pressure | 0 | regular_ll |
| 1 | z | 2007-01-01 12:00:00 | 2007-01-01 12:00:00 | 0 days | 500 | pressure | 0 | regular_ll |
| 2 | t | 2007-01-01 12:00:00 | 2007-01-01 12:00:00 | 0 days | 850 | pressure | 0 | regular_ll |
| 3 | z | 2007-01-01 12:00:00 | 2007-01-01 12:00:00 | 0 days | 850 | pressure | 0 | regular_ll |
| 4 | t | 2018-08-01 12:00:00 | 2018-08-01 12:00:00 | 0 days | 1000 | pressure | 0 | regular_ll |
| 5 | u | 2018-08-01 12:00:00 | 2018-08-01 12:00:00 | 0 days | 1000 | pressure | 0 | regular_ll |
| 6 | v | 2018-08-01 12:00:00 | 2018-08-01 12:00:00 | 0 days | 1000 | pressure | 0 | regular_ll |
| 7 | t | 2018-08-01 12:00:00 | 2018-08-01 12:00:00 | 0 days | 850 | pressure | 0 | regular_ll |
| 8 | u | 2018-08-01 12:00:00 | 2018-08-01 12:00:00 | 0 days | 850 | pressure | 0 | regular_ll |
| 9 | v | 2018-08-01 12:00:00 | 2018-08-01 12:00:00 | 0 days | 850 | pressure | 0 | regular_ll |
We can specify a format for each pattern. In this example “my_date” is the pattern name and “:date(%Y-%m-%d)” specifies the format:
[7]:
import datetime
fs = ekd.from_source(
"url-pattern",
"https://sites.ecmwf.int/repository/earthkit-data/test-data/test_{my_date:date(%Y-%m-%d)}_{name}.grib",
{"my_date": datetime.datetime(2020, 5, 13), "name": ["t2", "msl"]},
)
fs.to_fieldlist().ls()
[7]:
| parameter.variable | time.valid_datetime | time.base_datetime | time.step | vertical.level | vertical.level_type | ensemble.member | geography.grid_type | |
|---|---|---|---|---|---|---|---|---|
| 0 | 2t | 2020-05-13 12:00:00 | 2020-05-13 12:00:00 | 0 days | 0 | surface | 0 | regular_ll |
| 1 | msl | 2020-05-13 12:00:00 | 2020-05-13 12:00:00 | 0 days | 0 | surface | 0 | regular_ll |
[ ]: