sampks

The function divides the data \(X\) in two sets, "train" vs "test", using the Kennard-Stone (KS) algorithm (Kennard &amp; Stone, 1969). The two sets correspond to two different underlying probability distributions: set "train" has higher dispersion than set "test".

datagen

Data exploration and prediction with focus on high dimensional data and chemometrics.
The package was initially designed about partial least squares regression and discrimination models and variants, in particular locally weighted PLS models (LWPLS). Then, it has been expanded to many other methods for analyzing high dimensional data.
The name 'rchemo' comes from the fact that the package is orientated to chemometrics, but most of the provided methods are fully generic to other domains.
Functions such as transform(), predict(), coef() and summary() are available. Tuning the predictive models is facilitated by generic functions gridscore() (validation dataset) and gridcv() (cross-validation). Faster versions are also available for models based on latent variables (LVs) (gridscorelv() and gridcvlv()) and ridge regularization (gridscorelb() and gridcvlb()).

Marion Brandolini-Bunlon

rchemo

Dimension Reduction, Regression and Discrimination for
Chemometrics

Benoit Jaillais

Jean-Michel Roger

Matthieu Lesnoff

sampks function

<dl>
<dt>X</dt>
<dd>X-data (\(n, p\)) to be sampled.</dd>
<dt>k</dt>
<dd>An integer defining the number of training observations to select.</dd>
<dt>diss</dt>
<dd>The type of dissimilarity used for selecting the observations in the algorithm. Possible values are "eucl" (default; Euclidean distance) or "mahal" (Mahalanobis distance).</dd>
</dl>

Arguments

The function divides the data \(X\) in two sets, "train" vs "test", using the Kennard-Stone (KS) algorithm (Kennard &amp; Stone, 1969). The two sets correspond to two different underlying probability distributions: set "train" has higher dispersion than set "test".

Kennard-Stone sampling — sampks

<dl>


<dt>X</dt>
<dd>X-data (\(n, p\)) to be sampled.</dd>


<dt>k</dt>
<dd>An integer defining the number of training observations to select.</dd>


<dt>diss</dt>
<dd>The type of dissimilarity used for selecting the observations in the algorithm. Possible values are "eucl" (default; Euclidean distance) or "mahal" (Mahalanobis distance).</dd>


</dl>

sampks: Kennard-Stone sampling

Description

Usage

Value

Arguments

References

Examples