Cache policies¶
[1]:
from earthkit.data import cache, config
earthkit-data uses a dedicated directory to store the results of remote data access and some GRIB/BUFR indexing information. By default this directory is unmanaged (its size is not checked/limited) and no caching is provided for the files in it, i.e. repeated calls to from_source() for remote services and URLSs will download the data again!
When caching is enabled this directory will also serve as a cache. It means if we run from_source() again with the same arguments it will load the data from the cache instead of downloading it again. Additionally, caching offers monitoring and disk space management. When the cache is full, cached data is deleted according to the configuration (i.e. oldest data is deleted first).
In the examples below we will change the configuration multiple times. First we ensure all the changes are temporary and no options are saved into the configuration file. We also reset the configuration to the defaults.
[2]:
config.autosave = False
config.reset()
No caching (default)¶
The primary key to control the cache in the configuration is cache-policy. The default value is “off”, which means that no caching is available.
In this case all files are downloaded into an unmanaged temporary directory created by tempfile.TemporaryDirectory. Since caching is disabled all calls to from_source() for remote services and URLSs will download the data again! This temporary directory will be unique for each earthkit-data session. When the directory object goes out of scope (at the latest on exit) the directory will be cleaned up.
The config tells us the current cache policy:
[3]:
config.get("cache-policy")
[3]:
'off'
The path to the temporary directory has to be queried through the cache object:
[4]:
cache.directory()
[4]:
'/var/folders/93/w0p869rx17q98wxk83gn9ys40000gn/T/tmpazztzx_4'
We can specify the parent directory for the temporary directory by using the temporary-directory-root config option. By default it is set to None (no parent directory specified).
[5]:
s = {"cache-policy": "off", "temporary-directory-root": "~/my_demo_tmp"}
config.set(s)
cache.directory()
[5]:
'/var/folders/93/w0p869rx17q98wxk83gn9ys40000gn/T/tmp6uvafm1a'
Temporary cache directory¶
When the cache-policy is “temporary” the cache will be active and located in a managed temporary directory created by tempfile.TemporaryDirectory. This directory will be unique for each earthkit-data session. When the directory object goes out of scope (at the latest on exit) the cache is cleaned up.
[6]:
config.set("cache-policy", "temporary")
print(config.get("cache-policy"))
temporary
The path to the cache directory has to be queried through the cache object:
[7]:
cache.directory()
[7]:
'/var/folders/93/w0p869rx17q98wxk83gn9ys40000gn/T/tmph5y7qwk6'
We can specify the parent directory for the the temporary cache by using the temporary-cache-directory-root config option. By default it is set to None (no parent directory specified).
[8]:
s = {"cache-policy": "temporary", "temporary-cache-directory-root": "~/my_demo_cache"}
config.set(s)
cache.directory()
[8]:
'/Users/cgr/my_demo_cache/tmppwscrdw1'
User defined cache directory¶
When the cache-policy is “user” the cache will be active and created in a managed directory defined by the user-cache-directory config option.
The user cache directory is not cleaned up on exit. So next time you start earthkit-data it will be there again unless it is deleted manually or it is set in way that on each startup a different path is assigned to it. Also, when you run multiple sessions of earthkit-data under the same user they will share the same cache.
The configuration tells us all the details about the cache policy and location:
[9]:
config.set("cache-policy", "user")
print(config.get("cache-policy"))
print(config.get("user-cache-directory"))
user
/var/folders/93/w0p869rx17q98wxk83gn9ys40000gn/T/earthkit-data-cgr
The path to the current cache directory can also be queried through the cache object:
[10]:
cache.directory()
[10]:
'/var/folders/93/w0p869rx17q98wxk83gn9ys40000gn/T/earthkit-data-cgr'
We are free to change the user cache directory to another path:
[11]:
config.set("user-cache-directory", "~/earthkit-data-demo-cache")
cache.directory()
[11]:
'/Users/cgr/earthkit-data-demo-cache'
[ ]: