sampdp

The function divides the data \(X\) in two sets, "train" vs "test", using the Duplex algorithm (Snee, 1977). The two sets are of equal size. If needed, the user can add \(a posteriori\) the eventual remaining observations (not in "train" nor "test") to "train".

datagen

Data exploration and prediction with focus on high dimensional data and chemometrics.
The package was initially designed about partial least squares regression and discrimination models and variants, in particular locally weighted PLS models (LWPLS). Then, it has been expanded to many other methods for analyzing high dimensional data.
The name 'rchemo' comes from the fact that the package is orientated to chemometrics, but most of the provided methods are fully generic to other domains.
Functions such as transform(), predict(), coef() and summary() are available. Tuning the predictive models is facilitated by generic functions gridscore() (validation dataset) and gridcv() (cross-validation). Faster versions are also available for models based on latent variables (LVs) (gridscorelv() and gridcvlv()) and ridge regularization (gridscorelb() and gridcvlb()).

Marion Brandolini-Bunlon

rchemo

Dimension Reduction, Regression and Discrimination for
Chemometrics

Benoit Jaillais

Jean-Michel Roger

Matthieu Lesnoff

sampdp function

<dl>
<dt>X</dt>
<dd>X-data (\(n, p\)) to be sampled.</dd>
<dt>k</dt>
<dd>An integer defining the number of training observations to select. Must be &lt;= \(n / 2\).</dd>
<dt>diss</dt>
<dd>The type of dissimilarity used for selecting the observations in the algorithm. Possible values are "eucl" (default; Euclidean distance) or "mahal" (Mahalanobis distance).</dd>
</dl>

Arguments

The function divides the data \(X\) in two sets, "train" vs "test", using the Duplex algorithm (Snee, 1977). The two sets are of equal size. If needed, the user can add \(a posteriori\) the eventual remaining observations (not in "train" nor "test") to "train".

Duplex sampling — sampdp

<dl>


<dt>X</dt>
<dd>X-data (\(n, p\)) to be sampled.</dd>


<dt>k</dt>
<dd>An integer defining the number of training observations to select. Must be &lt;= \(n / 2\).</dd>


<dt>diss</dt>
<dd>The type of dissimilarity used for selecting the observations in the algorithm. Possible values are "eucl" (default; Euclidean distance) or "mahal" (Mahalanobis distance).</dd>


</dl>

sampdp: Duplex sampling

Description

Usage

Value

Arguments

References

Examples