lossCvKDSN: Kernel deep stacking network loss function based on cross-validation

Description

Computes the average root mean squared error by cross-validation. The model is fitted length(cvIndex) times and predicts the left out test cases. This procedure may be repeated to decrease the variance of the average predictive deviance. The repetitions are assumed to be included in the list cvIndex. The format can be conviniently created by using supplied functions in the package caret, e. g. createFolds.

Usage

lossCvKDSN(parOpt, y, X, levels, cvIndex, seedW=NULL, lossFunc=devStandard,
varSelect=rep(FALSE, levels), varRanking=NULL, alpha=rep(0, levels),
dropHidden=rep(FALSE, levels), seedDrop=NULL, standX=TRUE, standY=FALSE,
namesParOpt=rep(c("dim", "sigma", "lambdaRel"), levels))

Arguments

parOpt

The numeric parameter vector of the tuning parameters. The length of the vector is required to be divisible with three without rest. The order of the parameters is (D, sigma, lambda) in one level. Each additional level is added as new elements, e.g. with two levels (D_1, sigma_1, lambda_1, D_2, sigma_2, lambda_2), etc.

Numeric response vector of the outcome. Should be formated as a one column matrix.

Numeric design matrix of the covariates. All factors have to be prior encoded.

levels

Number of levels of the kernel deep stacking network (integer scalar).

cvIndex

List of training indices. Each list gives one drawn training set indices. See for example function createMultiFolds.

seedW

Seeds for the random Fourier transformation (integer vector). Each value corresponds to the seed of one level. The length must equal the number of levels of the kernel deep stacking network.

lossFunc

Specifies the used loss function for evaluation on the test observations. It must have arguments "preds" (predictions) and "ytest" (test observations). Default=devStandard is predictive deviance.

varSelect

Should unimportant variables be excluded in each level? Default is that all available variables are used. (logical scalar)

varRanking

Defines a variable ranking in increasing order. The first variable is least important and the last is the most important. (integer vector)

alpha

Weight parameter between lasso and ridge penalty (numeric vector) of each level. Default=0 corresponds to ridge penalty and 1 equals lasso.

dropHidden

Should dropout be applied on the random Fourier transformation? Each entry corresponds to the one level. Default is without dropout (logical vector).

seedDrop

Specifies the seed of the random dropouts in the calculation of random Fourier transformation per level. Default is random (integer vector).

standX

Should the design matrix be standardized by median and median absolute deviation? Default is TRUE.

standY

Should the response be standardized by median and median absolute deviation? Default is FALSE.

namesParOpt

Gives the names of the argument parOpt (character vector). It is used to encode the parameters into the correct structure suitable in fitting the kernel deep stacking network.

Value

Root average predictive deviance of the model (numeric scalar). The fit is available as attribute.

Details

namesParOpt: \ "select" corresponds to the relative frequency of dropping unimportant variables according to the randomized dependence coefficient. \ "dim" corresponds to a dimension parameter of the random Fourier transformation. \ "sigma" corresponds to the precision parameter of the gaussian distribution to generate random weights. \ "dropProb" corresponds to probability of dropping out cells in the calculated random Fourier transformation matrix. \ "lambdaRel" corresponds to the regularization parameter of kernel ridge regression.

References

Po-Seng Huang and Li Deng and Mark Hasegawa-Johnson and Xiaodong He, (2013), Random Features for kernel deep convex network, Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

Simon N. Wood, (2006), Generalized Additive Models: An Introduction with R, Taylor \& Francis Group LLC

Examples

Run this code

####################################
# Example with simple binary outcome

# Generate covariate matrix
sampleSize <- 100
X <- matrix(0, nrow=100, ncol=10)
for(j in 1:10) {
  set.seed (j)
  X [, j] <- rnorm(sampleSize)
}

# Generate response of binary problem with sum(X) > 0 -> 1 and 0 elsewhere
# with Gaussian noise
set.seed(-1)
error <- rnorm (100)
y <- ifelse((rowSums(X) + error) > 0, 1, 0)

# Draw random test sets
library(caret)
Indices <- createMultiFolds(y=y, k = 10, times = 5)

# Calculate loss function with parameters (D=10, sigma=1, lambda=0)
# in one level
calcLoss <- lossCvKDSN(parOpt=c(10, 1, 0), y=y, X=X, 
cvIndex=Indices, seedW=0, levels=1)
c(calcLoss)

# Calculate loss function with parameters 
# (D=10, sigma=1, lambda=0.1, D=5, sigma=2, lambda=0.01) in two levels
calcLoss <- lossCvKDSN(parOpt=c(10, 1, 0.1, 5, 2, 0.01), 
y=y, X=X, cvIndex=Indices, seedW=1:2, levels=2)
c(calcLoss)

Run the code above in your browser using DataLab