Learn R Programming

kernDeepStackNet (version 2.0.2)

rdcVarSelSubset: Variable selection based on RDC with genetic algorithm (experimental)

Description

Selects important variables, which have high RDC scores. A genetic algorithm is used to search the discrete space. Note that this function is still experimental.

Usage

rdcVarSelSubset(x, y, k=20, s=1/6, f=sin, seedX=1:10, seedY=-c(1:10), 
rdcRep=10, popSize=100, maxiter=100, nCores=1, addInfo=TRUE)

Arguments

x

Covariates data (numeric matrix).

y

Responses (numeric matrix).

k

Number of random features (integer scalar).

s

Variance of the random weights. Default is 1/6.

f

Non-linear transformation function. Default is sin.

seedX

Random number seed of normal distributed weights for covariates (integer scalar). Default is to randomly draw weights.

seedY

Random number seed of normal distributed weights for responses (integer scalar). Default is to randomly draw weights.

rdcRep

Gives the number of rdc repetitions. All repetitions are averaged per variable, to give more robust estimates. Default is to use one repetition.

popSize

Size of population of the genetic algorithm.

maxiter

Maximum number of generations to generate.

nCores

Number of threads used in parallelisation in ga. Default is no parallelisation.

addInfo

Should details of the optimization be printed? (logical scalar) Default TRUE enables default monitoring, see ga for further details.

Value

Indices of selected variables

References

David Lopez-Paz and Philipp Hennig and Bernhard Schoelkopf, (2013), The Randomized dependence coefficient, Proceedings of Neural Information Processing Systems (NIPS) 26, Stateline Nevada USA, C.J.C. Burges and L. Bottou and M. Welling and Z. Ghahramani and K.Q. Weinberger (eds.)

M. Wahde, (2008), Biological inspired methods: An introduction, WIT Press

See Also

rdcPart, cancorRed, rdcSubset, rdcVarOrder

Examples

Run this code
# Generate 10 covariates
library(mvtnorm)
set.seed(3489)
X <- rmvnorm(n=200, mean=rep(0, 10))

# Generate responses based on some covariates
set.seed(-239247)
y <- 0.5*X[, 1]^3 - 2*X[, 2]^2 + X[, 3] - 1 + rnorm(200)

# Running variable selection
foundVar <- rdcVarSelSubset(x=X, y=y, seedX=1, seedY=-(1), rdcRep=1, 
popSize=80, maxiter=5)
foundVar

Run the code above in your browser using DataLab