Configuration¶
earthkit-data is maintaining a global configuration.
The configuration is automatically loaded from and saved into a yaml file located at ~/.config/earthkit/data/config.yaml. An alternative path can be specified via the EARTHKIT_DATA_CONFIG_FILE environmental variable (it is only read at startup).
The configuration can be accessed and modified from Python. The configuration options can also be defined as environment variables, which take precedence over the config file.
See the following notebooks for examples:
Accessing configuration options¶
The earthkit-data configuration can be accessed using the Python API:
import earthkit.data
# Access one of the config options
cache_path = earthkit.data.config.get("user-cache-directory")
print(cache_path)
# If this is the last line of a Notebook cell, this
# will display a table with all the current configuration
earthkit.data.config
Warning
When an environment variable is set, it takes precedence over the config parameter, and its value is returned from get().
Changing configuration¶
Note
It is recommended to restart your Jupyter kernels after changing or resetting config options.
The earthkit-data configuration can be modified using the python API:
import earthkit.data
# Change the location of the user defined cache:
earthkit.data.config.set("user-cache-directory", "/big-disk/earthkit-data-cache")
# Change download timeout
earthkit.data.config.set("url-download-timeout", "1m")
# Multiple values can be set together. The argument list
# can be a dictionary:
earthkit.data.config.set({"url-download-timeout": "1m", "check-out-of-date-urls": True})
# Alternatively, we can use keyword arguments. However, because
# the “-” character is not allowed in variable names in Python we have
# to replace “-” with “_” in all the keyword arguments:
earthkit.data.config.set(url_download_timeout="1m", check_out_of_date_urls=True)
Warning
When an environment variable is set, the new value provided for set() is saved into the config file but get() wil still return the value of the environment variable. A warning is also generated.
Temporary configuration¶
We can create a temporary configuration (as a context manager) as a copy of the original configuration. We will still refer to it as “config”, but it is completely independent from the original object and changes are not saved into the yaml file (even when config.autosave is True).
import earthkit.data
print(earthkit.data.config.get("url-download-timeout"))
with earthkit.data.config.temporary():
earthkit.data.config.set("url-download-timeout", 5)
print(earthkit.data.config.get("url-download-timeout"))
# Temporary config can also be created with arguments:
with earthkit.data.config.temporary("url-download-timeout", 11):
print(earthkit.data.config.get("url-download-timeout"))
Output:
30
5
11
Warning
When an environment variable is set, the same rules applies as for set().
Resetting configuration¶
Note
It is recommended to restart your Jupyter kernels after changing or resetting the configuration.
The earthkit-data configuration can be reset using the python API:
import earthkit.data
# Reset a named config option to its default value
earthkit.data.config.reset("user-cache-directory")
# Reset all the config options to their default values
earthkit.data.config.reset()
Warning
When an environment variable is set, the same rules applies as for set().
Environment variables¶
Each configuration parameter has a corresponding environment variable (see the full list here). When an environment variable is set, it takes precedence over the config parameter as the following examples show.
First, let us assume that the value of url-download-timeout is 5 in the config file and no environment variable is set.
>>> from earthkit.data import config
>>> config.get("url-download-timeout")
30
Then, set the environment variable EARTHKIT_DATA_URL_DOWNLOAD_TIMEOUT.
export EARTHKIT_DATA_URL_DOWNLOAD_TIMEOUT=5
>>> from earthkit.data import config
>>> config.get("url-download-timeout")
5
>>> config.env()
{'url-download-timeout': ('EARTHKIT_DATA_URL_DOWNLOAD_TIMEOUT', '5')}
>>> config.set("url-download-timeout", 10)
UserWarning: Config option 'url-download-timeout' is also set by environment variable
'EARTHKIT_DATA_URL_DOWNLOAD_TIMEOUT'.The environment variable takes precedence and
its value is returned when calling get(). Still, the value set here will be
saved to the config file.
>>> config.get("url-download-timeout")
5
Finally, unset the environment variable and check the config value again, which is now the value from the config file.
unset EARTHKIT_DATA_URL_DOWNLOAD_TIMEOUT
>>> from earthkit.data import config
>>> config.get("url-download-timeout")
10
See also the following notebook:
List of configuration parameters¶
This is the list of all the config parameters:
Name |
Default |
Description |
|---|---|---|
cache‑policy |
‘off’ |
Caching policy. Valid values: off, temporary and user. See /guide/caching for more information. |
check‑out‑of‑date‑urls |
True |
Perform a HTTP request to check if the remote version of a cache file has changed |
download‑out‑of‑date‑urls |
False |
Re-download URLs when the remote version of a cached file as been changed |
grib‑file‑serialisation‑policy |
‘path’ |
GRIB file serialisation policy for fieldlists with data on disk. Valid values: path and memory. |
grib‑handle‑cache‑size |
1 |
Maximum number of GRIB handles cached in memory per fieldlist with data on disk.
Used when |
grib‑handle‑policy |
‘cache’ |
GRIB handle management policy for fieldlists with data on disk. Valid values: cache, persistent and temporary. See /guide/misc/grib_memory for more information. |
maximum‑cache‑disk‑usage |
‘95%’ |
Specify maximum disk usage as a percentage of the full disk capacity on the filesystem the
cache is located (e.g.: 90%). When the total disk usage exceeds this limit (it’s not limited to the
cache usage alone), earthkit-data evicts older cached entries until the usage is below the
specified limit. Can be set to None. Ignored when |
maximum‑cache‑size |
None |
Maximum disk space used by the earthkit-data cache (e.g.: 100G or 2T).
When exceeded, earthkit-data evicts older cached entries until the usage
is below the specified limit. Can be set to None.
Ignored when |
number‑of‑download‑threads |
5 |
Number of threads used to download data. |
reader‑type‑check‑bytes |
64 |
Number of bytes read from the beginning of a source to identify its type. Valid when 8 <= x <= 4096. |
temporary‑cache‑directory‑root |
None |
Parent of the cache directory when |
temporary‑directory‑root |
None |
Parent of the temporary directory when |
url‑download‑timeout |
’30s’ |
Timeout when downloading from an url. |
use‑grib‑metadata‑cache |
True |
Use in-memory cache kept in each field for GRIB metadata access in fieldlists with data on disk. See /guide/misc/grib_memory for more information. |
use‑message‑position‑index‑cache |
False |
Stores message offset index for GRIB/BUFR files in the cache. |
use‑standalone‑mars‑client‑when‑available |
True |
Use the standalone mars client when available instead of using the web API. |
user‑cache‑directory |
‘TMP/earthkit‑data‑${USER}’ |
Cache directory used when |
List of environment variables¶
This is the list of the config environment variables:
Config option name |
Environment variable |
|---|---|
cache‑policy |
EARTHKIT_DATA_CACHE_POLICY |
check‑out‑of‑date‑urls |
EARTHKIT_DATA_CHECK_OUT_OF_DATE_URLS |
download‑out‑of‑date‑urls |
EARTHKIT_DATA_DOWNLOAD_OUT_OF_DATE_URLS |
grib‑file‑serialisation‑policy |
EARTHKIT_DATA_GRIB_FILE_SERIALISATION_POLICY |
grib‑handle‑cache‑size |
EARTHKIT_DATA_GRIB_HANDLE_CACHE_SIZE |
grib‑handle‑policy |
EARTHKIT_DATA_GRIB_HANDLE_POLICY |
maximum‑cache‑disk‑usage |
EARTHKIT_DATA_MAXIMUM_CACHE_DISK_USAGE |
maximum‑cache‑size |
EARTHKIT_DATA_MAXIMUM_CACHE_SIZE |
number‑of‑download‑threads |
EARTHKIT_DATA_NUMBER_OF_DOWNLOAD_THREADS |
reader‑type‑check‑bytes |
EARTHKIT_DATA_READER_TYPE_CHECK_BYTES |
temporary‑cache‑directory‑root |
EARTHKIT_DATA_TEMPORARY_CACHE_DIRECTORY_ROOT |
temporary‑directory‑root |
EARTHKIT_DATA_TEMPORARY_DIRECTORY_ROOT |
url‑download‑timeout |
EARTHKIT_DATA_URL_DOWNLOAD_TIMEOUT |
use‑grib‑metadata‑cache |
EARTHKIT_DATA_USE_GRIB_METADATA_CACHE |
use‑message‑position‑index‑cache |
EARTHKIT_DATA_USE_MESSAGE_POSITION_INDEX_CACHE |
use‑standalone‑mars‑client‑when‑available |
EARTHKIT_DATA_USE_STANDALONE_MARS_CLIENT_WHEN_AVAILABLE |
user‑cache‑directory |
EARTHKIT_DATA_USER_CACHE_DIRECTORY |