- var
Short name of the variable to load. It should coincide with the
variable name inside the data files.
E.g.: var = 'tos'
, var = 'tas'
, var = 'prlr'
.
In some cases, though, the path to the files contains twice or more times
the short name of the variable but the actual name of the variable inside
the data files is different. In these cases it may be convenient to provide
var
with the name that appears in the file paths (see details on
parameters exp
and obs
).
- exp
Parameter to specify which experimental datasets to load data
from.
It can take two formats: a list of lists or a vector of character strings.
Each format will trigger a different mechanism of locating the requested
datasets.
The first format is adequate when loading data you'll only load once or
occasionally. The second format is targeted to avoid providing repeatedly
the information on a certain dataset but is more complex to use.
IMPORTANT: Place first the experiment with the largest number of members
and, if possible, with the largest number of leadtimes. If not possible,
the arguments 'nmember' and/or 'nleadtime' should be filled to not miss
any member or leadtime.
If 'exp' is not specified or set to NULL, observational data is loaded for
each start-date as far as 'leadtimemax'. If 'leadtimemax' is not provided,
Load()
will retrieve data of a period of time as long as the time
period between the first specified start date and the current date.
List of lists:
A list of lists where each sub-list contains information on the location
and format of the data files of the dataset to load.
Each sub-list can have the following components:
'name': A character string to identify the dataset. Optional.
'path': A character string with the pattern of the path to the
files of the dataset. This pattern can be built up making use of some
special tags that Load()
will replace with the appropriate
values to find the dataset files. The allowed tags are $START_DATE$,
$YEAR$, $MONTH$, $DAY$, $MEMBER_NUMBER$, $STORE_FREQ$, $VAR_NAME$,
$EXP_NAME$ (only for experimental datasets), $OBS_NAME$ (only for
observational datasets) and $SUFFIX$
Example: /path/to/$EXP_NAME$/postprocessed/$VAR_NAME$/
$VAR_NAME$_$START_DATE$.nc
If 'path' is not specified and 'name' is specified, the dataset
information will be fetched with the same mechanism as when using
the vector of character strings (read below).
'nc_var_name': Character string with the actual variable name
to look for inside the dataset files. Optional. Takes, by default,
the same value as the parameter 'var'.
'suffix': Wildcard character string that can be used to build
the 'path' of the dataset. It can be accessed with the tag $SUFFIX$.
Optional. Takes '' by default.
'var_min': Important: Character string. Minimum value beyond
which read values will be deactivated to NA. Optional. No deactivation
is performed by default.
'var_max': Important: Character string. Maximum value beyond
which read values will be deactivated to NA. Optional. No deactivation
is performed by default.
The tag $START_DATES$ will be replaced with all the starting dates
specified in 'sdates'. $YEAR$, $MONTH$ and $DAY$ will take a value for each
iteration over 'sdates', simply these are the same as $START_DATE$ but
split in parts.
$MEMBER_NUMBER$ will be replaced by a character string with each member
number, from 1 to the value specified in the parameter 'nmember' (in
experimental datasets) or in 'nmemberobs' (in observational datasets). It
will range from '01' to 'N' or '0N' if N < 10.
$STORE_FREQ$ will take the value specified in the parameter 'storefreq'
('monthly' or 'daily').
$VAR_NAME$ will take the value specified in the parameter 'var'.
$EXP_NAME$ will take the value specified in each component of the parameter
'exp' in the sub-component 'name'.
$OBS_NAME$ will take the value specified in each component of the parameter
'obs' in the sub-component 'obs.
$SUFFIX$ will take the value specified in each component of the parameters
'exp' and 'obs' in the sub-component 'suffix'.
Example:
list(
list(
name = 'experimentA',
path = file.path('/path/to/$DATASET_NAME$/$STORE_FREQ$',
'$VAR_NAME$$SUFFIX$',
'$VAR_NAME$_$START_DATE$.nc'),
nc_var_name = '$VAR_NAME$',
suffix = '_3hourly',
var_min = '-1e19',
var_max = '1e19'
)
)
This will make Load()
look for, for instance, the following paths,
if 'sdates' is c('19901101', '19951101', '20001101'):
/path/to/experimentA/monthly_mean/tas_3hourly/tas_19901101.nc
/path/to/experimentA/monthly_mean/tas_3hourly/tas_19951101.nc
/path/to/experimentA/monthly_mean/tas_3hourly/tas_20001101.nc
Vector of character strings:
To avoid specifying constantly the same information to load the same
datasets, a vector with only the names of the datasets to load can be
specified.
Load()
will then look for the information in a configuration file
whose path must be specified in the parameter 'configfile'.
Check ?ConfigFileCreate
, ConfigFileOpen
,
ConfigEditEntry
& co. to learn how to create a new configuration
file and how to add the information there.
Example: c('experimentA', 'experimentB')
- obs
Argument with the same format as parameter 'exp'. See details on
parameter 'exp'.
If 'obs' is not specified or set to NULL, no observational data is loaded.
- sdates
Vector of starting dates of the experimental runs to be loaded
following the pattern 'YYYYMMDD'.
This argument is mandatory.
E.g. c('19601101', '19651101', '19701101')
- nmember
Vector with the numbers of members to load from the specified
experimental datasets in 'exp'.
If not specified, the automatically detected number of members of the
first experimental dataset is detected and replied to all the experimental
datasets.
If a single value is specified it is replied to all the experimental
datasets.
Data for each member is fetched in the file system. If not found is
filled with NA values.
An NA value in the 'nmember' list is interpreted as "fetch as many members
of each experimental dataset as the number of members of the first
experimental dataset".
Note: It is recommended to specify the number of members of the first
experimental dataset if it is stored in file per member format because
there are known issues in the automatic detection of members if the path
to the dataset in the configuration file contains Shell Globbing wildcards
such as '*'.
E.g., c(4, 9)
- nmemberobs
Vector with the numbers of members to load from the
specified observational datasets in 'obs'.
If not specified, the automatically detected number of members of the
first observational dataset is detected and replied to all the
observational datasets.
If a single value is specified it is replied to all the observational
datasets.
Data for each member is fetched in the file system. If not found is
filled with NA values.
An NA value in the 'nmemberobs' list is interpreted as "fetch as many
members of each observational dataset as the number of members of the
first observational dataset".
Note: It is recommended to specify the number of members of the first
observational dataset if it is stored in file per member format because
there are known issues in the automatic detection of members if the path
to the dataset in the configuration file contains Shell Globbing wildcards
such as '*'.
E.g., c(1, 5)
- nleadtime
Deprecated. See parameter 'leadtimemax'.
- leadtimemin
Only lead-times higher or equal to 'leadtimemin' are
loaded. Takes by default value 1.
- leadtimemax
Only lead-times lower or equal to 'leadtimemax' are loaded.
Takes by default the number of lead-times of the first experimental
dataset in 'exp'.
If 'exp' is NULL this argument won't have any effect
(see ?Load
description).
- storefreq
Frequency at which the data to be loaded is stored in the
file system. Can take values 'monthly' or 'daily'.
By default it takes 'monthly'.
Note: Data stored in other frequencies with a period which is divisible by
a month can be loaded with a proper use of 'storefreq' and 'sampleperiod'
parameters. It can also be loaded if the period is divisible by a day and
the observational datasets are stored in a file per dataset format or
'obs' is empty.
- sampleperiod
To load only a subset between 'leadtimemin' and
'leadtimemax' with the period of subsampling 'sampleperiod'.
Takes by default value 1 (all lead-times are loaded).
See 'storefreq' for more information.
- lonmin
If a 2-dimensional variable is loaded, values at longitudes
lower than 'lonmin' aren't loaded.
Must take a value in the range [-360, 360] (if negative longitudes are
found in the data files these are translated to this range).
It is set to 0 if not specified.
If 'lonmin' > 'lonmax', data across Greenwich is loaded.
- lonmax
If a 2-dimensional variable is loaded, values at longitudes
higher than 'lonmax' aren't loaded.
Must take a value in the range [-360, 360] (if negative longitudes are
found in the data files these are translated to this range).
It is set to 360 if not specified.
If 'lonmin' > 'lonmax', data across Greenwich is loaded.
- latmin
If a 2-dimensional variable is loaded, values at latitudes
lower than 'latmin' aren't loaded.
Must take a value in the range [-90, 90].
It is set to -90 if not specified.
- latmax
If a 2-dimensional variable is loaded, values at latitudes
higher than 'latmax' aren't loaded.
Must take a value in the range [-90, 90].
It is set to 90 if not specified.
- output
This parameter determines the format in which the data is
arranged in the output arrays.
Can take values 'areave', 'lon', 'lat', 'lonlat'.
'areave': Time series of area-averaged variables over the specified domain.
'lon': Time series of meridional averages as a function of longitudes.
'lat': Time series of zonal averages as a function of latitudes.
'lonlat': Time series of 2d fields.
Takes by default the value 'areave'. If the variable specified in 'var' is
a global mean, this parameter is forced to 'areave'.
All the loaded data is interpolated into the grid of the first experimental
dataset except if 'areave' is selected. In that case the area averages are
computed on each dataset original grid. A common grid different than the
first experiment's can be specified through the parameter 'grid'. If 'grid'
is specified when selecting 'areave' output type, all the loaded data is
interpolated into the specified grid before calculating the area averages.
- method
This parameter determines the interpolation method to be used
when regridding data (see 'output'). Can take values 'bilinear', 'bicubic',
'conservative', 'distance-weighted'.
See remapcells
for advanced adjustments.
Takes by default the value 'conservative'.
- grid
A common grid can be specified through the parameter 'grid' when
loading 2-dimensional data. Data is then interpolated onto this grid
whichever 'output' type is specified. If the selected output type is
'areave' and a 'grid' is specified, the area averages are calculated after
interpolating to the specified grid.
If not specified and the selected output type is 'lon', 'lat' or 'lonlat',
this parameter takes as default value the grid of the first experimental
dataset, which is read automatically from the source files.
Note that the auto-detected grid type is not guarenteed to be correct, and
it won't be correct if the netCDF file doesn't contain global domain.
Please check the warning carefully to ensure the detected grid type is
expected, or assign this parameter even regridding is not needed.
The grid must be supported by 'cdo' tools. Now only supported: rNXxNY
or tTRgrid.
Both rNXxNY and tRESgrid yield rectangular regular grids. rNXxNY yields
grids that are evenly spaced in longitudes and latitudes (in degrees).
tRESgrid refers to a grid generated with series of spherical harmonics
truncated at the RESth harmonic. However these spectral grids are usually
associated to a gaussian grid, the latitudes of which are spaced with a
Gaussian quadrature (not evenly spaced in degrees). The pattern tRESgrid
will yield a gaussian grid.
E.g., 'r96x72'
Advanced: If the output type is 'lon', 'lat' or 'lonlat' and no common
grid is specified, the grid of the first experimental or observational
dataset is detected and all data is then interpolated onto this grid.
If the first experimental or observational dataset's data is found shifted
along the longitudes (i.e., there's no value at the longitude 0 but at a
longitude close to it), the data is re-interpolated to suppress the shift.
This has to be done in order to make sure all the data from all the
datasets is properly aligned along longitudes, as there's no option so far
in Load
to specify grids starting at longitudes other than 0.
This issue doesn't affect when loading in 'areave' mode without a common
grid, the data is not re-interpolated in that case.
- maskmod
List of masks to be applied to the data of each experimental
dataset respectively, if a 2-dimensional variable is specified in 'var'.
Each mask can be defined in 2 formats:
a) a matrix with dimensions c(longitudes, latitudes).
b) a list with the components 'path' and, optionally, 'nc_var_name'.
In the format a), the matrix must have the same size as the common grid
or with the same size as the grid of the corresponding experimental dataset
if 'areave' output type is specified and no common 'grid' is specified.
In the format b), the component 'path' must be a character string with the
path to a NetCDF mask file, also in the common grid or in the grid of the
corresponding dataset if 'areave' output type is specified and no common
'grid' is specified. If the mask file contains only a single variable,
there's no need to specify the component 'nc_var_name'. Otherwise it must
be a character string with the name of the variable inside the mask file
that contains the mask values. This variable must be defined only over 2
dimensions with length greater or equal to 1.
Whichever the mask format, a value of 1 at a point of the mask keeps the
original value at that point whereas a value of 0 disables it (replaces
by a NA value).
By default all values are kept (all ones).
The longitudes and latitudes in the matrix must be in the same order as in
the common grid or as in the original grid of the corresponding dataset
when loading in 'areave' mode. You can find out the order of the longitudes
and latitudes of a file with 'cdo griddes'.
Note that in a common CDO grid defined with the patterns 't<RES>grid' or
'r<NX>x<NY>' the latitudes and latitudes are ordered, by definition, from
-90 to 90 and from 0 to 360, respectively.
If you are loading maps ('lonlat', 'lon' or 'lat' output types) all the
data will be interpolated onto the common 'grid'. If you want to specify
a mask, you will have to provide it already interpolated onto the common
grid (you may use 'cdo' libraries for this purpose). It is not usual to
apply different masks on experimental datasets on the same grid, so all
the experiment masks are expected to be the same.
Warning: When loading maps, any masks defined for the observational data
will be ignored to make sure the same mask is applied to the experimental
and observational data.
Warning: list() compulsory even if loading 1 experimental dataset only!
E.g., list(array(1, dim = c(num_lons, num_lats)))
- maskobs
See help on parameter 'maskmod'.
- configfile
Path to the s2dv configuration file from which
to retrieve information on location in file system (and other) of datasets.
If not specified, the configuration file used at BSC-ES will be used
(it is included in the package).
Check the BSC's configuration file or a template of configuration file in
the folder 'inst/config' in the package.
Check further information on the configuration file mechanism in
ConfigFileOpen()
.
- varmin
Loaded experimental and observational data values smaller
than 'varmin' will be disabled (replaced by NA values).
By default no deactivation is performed.
- varmax
Loaded experimental and observational data values greater
than 'varmax' will be disabled (replaced by NA values).
By default no deactivation is performed.
- silent
Parameter to show (FALSE) or hide (TRUE) information messages.
Warnings will be displayed even if 'silent' is set to TRUE.
Takes by default the value 'FALSE'.
- nprocs
Number of parallel processes created to perform the fetch
and computation of data.
These processes will use shared memory in the processor in which Load()
is launched.
By default the number of logical cores in the machine will be detected
and as many processes as logical cores there are will be created.
A value of 1 won't create parallel processes.
When running in multiple processes, if an error occurs in any of the
processes, a crash message appears in the R session of the original
process but no detail is given about the error. A value of 1 will display
all error messages in the original and only R session.
Note: the parallel process create other blocking processes each time they
need to compute an interpolation via 'cdo'.
- dimnames
Named list where the name of each element is a generic
name of the expected dimensions inside the NetCDF files. These generic
names are 'lon', 'lat' and 'member'. 'time' is not needed because it's
detected automatically by discard.
The value associated to each name is the actual dimension name in the
NetCDF file.
The variables in the file that contain the longitudes and latitudes of
the data (if the data is a 2-dimensional variable) must have the same
name as the longitude and latitude dimensions.
By default, these names are 'longitude', 'latitude' and 'ensemble. If any
of those is defined in the 'dimnames' parameter, it takes priority and
overwrites the default value.
E.g., list(lon = 'x', lat = 'y')
In that example, the dimension 'member' will take the default value 'ensemble'.
- remapcells
When loading a 2-dimensional variable, spatial subsets can
be requested via lonmin
, lonmax
, latmin
and
latmax
. When Load()
obtains the subset it is then
interpolated if needed with the method specified in method
.
The result of this interpolation can vary if the values surrounding the
spatial subset are not present. To better control this process, the width
in number of grid cells of the surrounding area to be taken into account
can be specified with remapcells
. A value of 0 will take into
account no additional cells but will generate less traffic between the
storage and the R processes that load data.
A value beyond the limits in the data files will be automatically runcated
to the actual limit.
The default value is 2.
- path_glob_permissive
In some cases, when specifying a path pattern
(either in the parameters 'exp'/'obs' or in a configuration file) one can
specify path patterns that contain shell globbing expressions. Too much
freedom in putting globbing expressions in the path patterns can be
dangerous and make Load()
find a file in the file system for a
start date for a dataset that really does not belong to that dataset.
For example, if the file system contains two directories for two different
experiments that share a part of their path and the path pattern contains
globbing expressions:
/experiments/model1/expA/monthly_mean/tos/tos_19901101.nc
/experiments/model2/expA/monthly_mean/tos/tos_19951101.nc
And the path pattern is used as in the example right below to load data of
only the experiment 'expA' of the model 'model1' for the starting dates
'19901101' and '19951101', Load()
will undesiredly yield data for
both starting dates, even if in fact there is data only for the
first one:
expA <- list(path = file.path('/experiments/*/expA/monthly_mean/$VAR_NAME$',
'$VAR_NAME$_$START_DATE$.nc')
data <- Load('tos', list(expA), NULL, c('19901101', '19951101'))
To avoid these situations, the parameter path_glob_permissive
is
set by default to 'partial'
, which forces Load()
to replace
all the globbing expressions of a path pattern of a data set by fixed
values taken from the path of the first found file for each data set, up
to the folder right before the final files (globbing expressions in the
file name will not be replaced, only those in the path to the file).
Replacement of globbing expressions in the file name can also be triggered
by setting path_glob_permissive
to FALSE
or 'no'
. If
needed to keep all globbing expressions, path_glob_permissive
can
be set to TRUE
or 'yes'
.