Using split_on in retrieval requests¶
The majority of the request-based retrieval sources supports the split_on request parameter. This includes:
When split_on is present in the request, it instructs the source to split the request into multiple parts, which are then executed independently and the resulting data will be stored in different files. However, the actual storage is hidden from the users and they can still work with the results as a single object.
Single key¶
split_on can be a single string referring to the key the request should be split on:
import earthkit.data as ekd ds = ekd.from_source( "cds", "reanalysis-era5-single-levels", request=dict( variable=["2t", "msl"], product_type="reanalysis", area=[50, -10, 40, 10], # N,W,S,E grid=[2, 2], date="2012-05-10", time=[0, 12], split_on="variable", ), )
This will send 2 requests to the CDS one for “2t” and another one for “msl”.
Sequence of keys¶
A sequence of keys can also be specified:
import earthkit.data as ekd ds = ekd.from_source( "cds", "reanalysis-era5-single-levels", request=dict( variable=["2t", "msl"], product_type="reanalysis", area=[50, -10, 40, 10], # N,W,S,E grid=[2, 2], date="2012-05-10", time=[0, 12], split_on=("variable", "time"), ), )
This will send 4 requests to the CDS:
variable=”2t”, time=0
variable=”2t”, time=12
variable=”msl”, time=0
variable=”msl”, time=12
Dictionary of keys¶
By specifying a dict we can define grouping per key for the splitting:
import earthkit.data as ekd ds = ekd.from_source( "cds", "reanalysis-era5-single-levels", request=dict( variable=["2t", "2d", "msl", "sstk"], product_type="reanalysis", area=[50, -10, 40, 10], # N,W,S,E grid=[2, 2], date="2012-05-10", time=[0, 12], split_on={"variable": 2, "time": 1}, ), )
This will send 4 requests to the CDS:
variable=[“2t”, “2d”], time=0
variable=[“2t”, “2d”], time=12
variable=[“msl”, “sstk”], time=0
variable=[“msl”, “sstk”], time=12