cross_val: Cross-validation run

Description

Function running a single cross-validation by partitioning the data into training and test set

Usage

cross_val(
  vari,
  outi,
  c,
  rule,
  part,
  l,
  we,
  vari_col,
  preds,
  mode,
  cmode,
  predm,
  cutoff,
  objfun,
  minx = 1,
  maxx = NULL,
  nr = NULL,
  maxw = NULL,
  st = NULL,
  corr = 1,
  Rsq = F,
  marg = 0,
  n_tr,
  preds_tr
)

Value

regr: An M x N matrix of sums of the absolute errors for each element of the test set for each feasible regression. M is maximum feasible number of variables included in a regression, N is the maximum feasible number of regressions of the fixed size; the row index indicates the number of variables included in a regression. Therefore each row corresponds to results obtained from running regressions with the same number of variables and columns correspond to different subsets of predictors used.
regrr: An M x N matrix of sums of the relative errors for each element of the test set (only for mode = 'linear') for each feasible regression. M is maximum feasible number of variables included in a regression, N is the maximum feasible number of regressions of the fixed size; the row index indicates the number of variables included in a regression. Therefore each row corresponds to results obtained from running regressions with the same number of variables and columns correspond to different subsets of predictors used.
nvar: Maximum feasible number of variables in the regression
emp: An accuracy of always predicting the more likely outcome as suggested by the training set (only for mode = 'binary' and objfun = 'acc')

In regr and regrr

NA values are possible since for some numbers of variables there are fewer feasible regressions than for the others.

Arguments

vari: set of predictors
outi: array of outcomes
c: set of all indices of the predictors
rule: an Events per Variable (EPV) rule, defaults to 10
part: indicates partition of the original data-set into training and test set in a proportion (part-1):1
l: number of observations
we: weights of the predictors
vari_col: overall number of predictors
preds: array to write predictions for the test split into, intially empty
mode: 'binary' (logistic regression), 'multin' (multinomial regression)
cmode: 'det' or ''; 'det' always predicts the more likely outcome as determined by the odds ratio; '' predicts certain outcome with probability corresponding to its odds ratio (more conservative). Option available for multinomial/logistic regression
predm: 'exact' or ''; for logistic and multinomial regression; 'exact' computes how many times the exact outcome category was predicted, '' computes how many times either the exact outcome category or its nearest neighbour was predicted
cutoff: cut-off value for logistic regression
objfun: 'roc' for maximising the predictive power with respect to AUC, 'acc' for maximising predictive power with respect to accuracy.
minx: minimum number of predictors to be included in a regression, defaults to 1
maxx: maximum number of predictors to be included in a regression, defaults to maximum feasible number according to one in ten rule
nr: a subset of the data-set, such that 1/part of it lies in the test set and 1-1/part is in the training set, defaults to empty set
maxw: maximum weight of predictors to be included in a regression, defaults to maximum weight according to one in ten rule
st: a subset of predictors to be always included into a predictive model,defaults to empty set
corr: maximum correlation between a pair of predictors in a model
Rsq: whether R-squared statistics constrained is introduced
marg: margin of error for R-squared statistics constraint
n_tr: size of the training set
preds_tr: array to write predictions for the training split into, intially empty

Examples

Run this code

#creating variables

vari<-matrix(c(1:100,seq(1,300,3)),ncol=2)

#creating outcomes

out<-rbinom(100,1,0.3)

#creating array for predictions

pr<-array(NA,c(2,2))

pr_tr<-array(NA,c(2,2))

#passing set of the inexes of the predictors

c<-c(1:2)

#passing the weights of the predictors

we<-c(1,1)

#setting the mode

m<-'binary'

#running the function

cross_val(vari,out,c,10,10,100,we,2,pr,m,'det','exact',0.5,'acc',nr=c(1,4),n_tr=90,preds_tr=pr_tr)

Run the code above in your browser using DataLab

Description

Usage

Value

Arguments

See Also

Examples