STEP: Stepwise selection of logratios

Description

Stepwise selection of pairwise logratios that explain maximum variance in a target matrix.

Usage

STEP(data, datatarget=data, previous=NA, previous.wt=NA, weight=TRUE, 
     random=FALSE, nsteps=min(ncol(data), ncol(datatarget))-1, top=1)

Value

names: Names of maximizing ratios in stepwise process
ratios: Indices of ratios
logratios: Matrix of logratios
R2max: Sequence of maximum cumulative explained variances
pro.cor: Corresponding sequence of Procrustes correlations
names.top: Names of "top" ratios at last step
ratios.top: Indices of "top" ratios
logratios.top: Matrix of "top" logratios
R2.top: Sequence of "top" cumulative explained variances (in descending order)
pro.cor.top: Corresponding sequence of "top" Procrustes correlations
totvar: Total logratio variance of target matrix

Arguments

data: A data frame or matrix of compositional data on which pairwise logratios are computed
datatarget: A matrix of interval-scale data, with as many rows as data, which serves as the target matrix whose variance is to be explained (by default it is the same matrix as data, in which case total logratio variance is to be explained)
previous: A vector or matrix of variables to be forced in before logratios are sought
previous.wt: Possible weights of the variable(s) forced in before logratios are sought (if not specified, weights of 1 are assumed)
weight: TRUE (default) when weights are in data list object, FALSE for unweighted analysis, or a vector of user-defined part weights
random: TRUE if a random selection is made of tied logratios; FALSE (default) if logratio that maximizes Procrustes correlation is chosen
nsteps: Number of steps to take (by default, one less than the number of columns of data and of datatarget, whichever is smaller)
top: Number of top variance-explaining logratios returned after last step (by default, 1, i.e. the best)

Author

Michael Greenacre

Details

The function STEP sequentially computes the logratios in a data matrix (usually compositional) that best explain the variance in a second matrix, called the target matrix. By default, the target matrix is the same matrix, in which case the logratios that best explain the logratio variance in the same matrix are computed. In this case, weights for the data matrix are assumed by default, proportional to part means of the compositional data matrix. For the unweighted logratio variance, specify the option weight=FALSE. User-specified weights on the columns of the data matrix (usually compositional parts) can be provided using the same weight option.

If the target matrix is a different matrix, it is the logratio variance of that matrix that is to be explained. An option for the target matrix to be any response matrix will be in the next release.

If nsteps > 1 and top=1 the results are in the form of an optimal set of logratios that sequentially add maximum explained variance at each step. If top>1 then at the last step the ordered list of top variance-explaining logratios is returned, which allows users to make an alternative choice of the logratio based on substantive knowledge. Hence, if nsteps=1 and top=10, for example, the procedure will move only one step, but list the top 10 logratios for that step. If top=1 then all results with extension .top related to the top ratios are omitted because they are already given.

References

Van den Wollenbergh, A. (1977), Redundancy analysis. An alternative to canonical correlation analysis, Psychometrika 42, 207-219.
Greenacre, M. (2018), Variable selection in compositional data analysis using pairwise logratios, Mathematical Geosciences, DOI: 10.1007/s11004-018-9754-x.
Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC

Examples

Run this code

# Stepwise selection of ratios for RomanCups data set
data(cups)
# Set seed to obtain same results as in Appendix C of Greenacre (2018)
set.seed(2872)
STEP(cups, random=TRUE)
# Select best ratio, but output "top 5"
STEP(cups, nsteps=1, top=5)

Run the code above in your browser using DataLab