mbl
functionmbl
function
mblControl(sm = "pc", pcSelection = list("opc", 40), pcMethod = "svd", ws = if(sm == "movcor") 41, k0, returnDiss = FALSE, center = TRUE, scaled = TRUE, valMethod = c("NNv", "loc_crossval"), localOptimization = TRUE, resampling = 10, p = 0.75, range.pred.lim = TRUE, progress = TRUE, cores = 1, allowParallel = TRUE)
mbl
).
Options are:
"euclid"
: Euclidean dissimilarity.
"cosine"
: Cosine dissimilarity.
"sidF"
: Spectral information divergence computed on the spectral variables.
"sidD"
: Spectral information divergence computed on the density distributions of the spectra.
"cor"
: Correlation dissimilarity.
"movcor"
: Moving window correlation dissimilarity.
"pc"
: Principal components dissimilarity: Mahalanobis dissimilarity computed on the principal components space.
"loc.pc"
: Dissimilarity estimation based on local principal components.
"pls"
: Partial least squares dissimilarity: Mahalanobis dissimilarity computed on the partial least squares space.
"loc.pls"
Dissimilarity estimation based on local partial least squares.
The "pc"
spectral dissimilarity metric is the default. If the "sidD"
is chosen, the default parameters of the sid
function are used however they cab be modified by specifying them as additional arguments in the mbl
function.
This argument can also be set to "none"
, in such a case, a dissimilarity matrix must be specified in the dissimilarityM
argument of the mbl
function.
sm = "Xu"
to the centre of sm = "Xr"
. It also specifies the number of components in any of the following cases: sm = "pc"
, sm = "loc.pc"
, sm = "pls"
and sm = "loc.pls"
. This list must contain two objects in the following order: method
:the method for selecting the number of components. Possible options are: "opc"
(optimized pc selection based on Ramirez-Lopez et al. (2013a, 2013b). See the orthoProjection
function for more details; "cumvar"
(for selecting the number of principal components based on a given cumulative amount of explained variance); "var"
(for selecting the number of principal components based on a given amount of explained variance); and "manual"
(for specifying manually the desired number of principal components)
value
:a numerical value that complements the selected method. If "opc"
is chosen, it must be a value indicating the maximal number of principal components to be tested (see Ramirez-Lopez et al., 2013a, 2013b). If "cumvar"
is chosen, it must be a value (higher than 0 and lower than 1) indicating the maximum amount of cumulative variance that the retained components should explain. If "var"
is chosen, it must be a value (higher than 0 and lower than 1) indicating that components that explain (individually) a variance lower than this threshold must be excluded. If "manual"
is chosen, it must be a value specifying the desired number of principal components to retain.
The default method for the pcSelection
argument is "opc"
and the maximal number of principal components to be tested is set to 40.
Optionally, the pcSelection
argument admits "opc"
or "cumvar"
or "var"
or "manual"
as a single character string. In such a case the default for "value"
when either "opc"
or "manual"
are used is 40. When "cumvar"
is used the default "value"
is set to 0.99 and when "var"
is used the default "value"
is set to 0.01.
"svd"
(default) and "nipals"
. See orthoDiss
.sm = "movcor"
). The default is 41.sm = "loc.pc"
or sm = "loc.pls"
) a numeric integer value. This argument controls the number of initial neighbours($k0$) to retain in order to compute the local principal components (at each neighbourhood)."NNv"
and "loc_crossval"
. Alternatively "none"
can be used when corss-validation is not required (see details below).valMethod = "loc_crossval"
, it optmizes the parameters of the local pls models (i.e. pls factors for pls
and minimum and maximum pls factors for wapls1
)."loc_crossval"
is selected in the valMethod
argument. Default is 10."loc_crossval"
is selected in the valMethod
argument. Default is 0.75 (i.e. 75 "%")FALSE
, no prediction limits are imposed. Default is TRUE
.TRUE
. Note: In case multicore processing is used, this progress bar will not be printed.method
in pcSelection
is "opc"
(which can be computationally intensive) (default = 1). See details.TRUE
)mblControl
returns a list
of class mbl
with the specified parameters
"NNv"
): From the group of neighbours of each sample to be predicted, the nearest sample (i.e. the most similar sample) is excluded and then a local model is fitted using the remaining neighbours. This model is then used to predict the value of the target response variable of the nearest sample. These predicted values are finally cross validated with the actual values (See Ramirez-Lopez et al. (2013a) for additional details). This method is faster than "loc_crossval"
"loc_crossval"
): The group of neighbours of each sample to be predicted is partitioned into different equal size subsets. Each partition is selected based on a stratified random sampling which takes into account the values of the response variable of the corresponding set of neighbours. The selected local subset is used as local validation subset and the remaining samples are used for fitting a model. This model is used to predict the target response variable values of the local validation subset and the local root mean square error is computed. This process is repeated $m$ times and the final local error is computed as the average of the local root mean square error of all the $m$ iterations. In the mbl
function $m$ is controlled by the resampling
argument and the size of the subsets is controlled by the p
argument which indicates the percentage of samples to be selected from the subset of nearest neighbours. The global error of the predictions is computed as the average of the local root mean square errors.
"none"
): No validation is carried out. If "none"
is seleceted along with "NNv"
and/or "loc_crossval"
, then it will be ignored and the respective validation(s) will be carried out.
Multi-threading for the computation of dissimilarities is based on OpenMP and hence works only on windows and linux.
However, the loop used to iterate over the Xu
samples in mbl
uses the %dopar%
operator of the foreach
package, which can be used to parallelize this internal loop. The last example given in the mbl
function ilustrates how to parallelize the mbl
function.
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Viscarra Rossel, R., Dematte, J. A. M., Scholten, T. 2013b. Distance and similarity-search metrics for use with soil vis-NIR spectra. Geoderma 199, 43-53.
fDiss
, corDiss
, sid
, orthoDiss
, mbl
#A control list with the default parameters
mblControl()
#A control list which specifies the moving correlation
#dissimilarity metric with a moving window of 30
mblControl(sm = "movcor", ws = 31)
Run the code above in your browser using DataLab