Function running a single cross-validation by partitioning the data into training and test set
cross_val(
vari,
outi,
c,
rule,
part,
l,
we,
vari_col,
preds,
mode,
cmode,
predm,
cutoff,
objfun,
minx = 1,
maxx = NULL,
nr = NULL,
maxw = NULL,
st = NULL,
corr = 1,
Rsq = F,
marg = 0,
n_tr,
preds_tr
)
An M x N matrix of sums of the absolute errors for each element of the test set for each feasible regression. M is maximum feasible number of variables included in a regression, N is the maximum feasible number of regressions of the fixed size; the row index indicates the number of variables included in a regression. Therefore each row corresponds to results obtained from running regressions with the same number of variables and columns correspond to different subsets of predictors used.
An M x N matrix of sums of the relative errors for each element of the test set (only for mode = 'linear'
) for each feasible regression. M is maximum feasible number of variables included in a regression, N is the maximum feasible number of regressions of the fixed size; the row index indicates the number of variables included in a regression. Therefore each row corresponds to results obtained from running regressions with the same number of variables and columns correspond to different subsets of predictors used.
Maximum feasible number of variables in the regression
An accuracy of always predicting the more likely outcome as suggested by the training set (only for mode = 'binary'
and objfun = 'acc'
)
In regr
and regrr
NA
values are possible since for some numbers of variables there are fewer feasible regressions than for the others.
set of predictors
array of outcomes
set of all indices of the predictors
an Events per Variable (EPV) rule, defaults to 10
indicates partition of the original data-set into training and test set in a proportion (part-1):1
number of observations
weights of the predictors
overall number of predictors
array to write predictions for the test split into, intially empty
'binary'
(logistic regression), 'multin'
(multinomial regression)
'det'
or ''
; 'det'
always predicts the more likely outcome as determined by the odds ratio; ''
predicts certain outcome with probability corresponding to its odds ratio (more conservative). Option available for multinomial/logistic regression
'exact'
or ''
; for logistic and multinomial regression; 'exact'
computes how many times the exact outcome category was predicted, ''
computes how many times either the exact outcome category or its nearest neighbour was predicted
cut-off value for logistic regression
'roc'
for maximising the predictive power with respect to AUC, 'acc'
for maximising predictive power with respect to accuracy.
minimum number of predictors to be included in a regression, defaults to 1
maximum number of predictors to be included in a regression, defaults to maximum feasible number according to one in ten rule
a subset of the data-set, such that 1/part
of it lies in the test set and 1-1/part
is in the training set, defaults to empty set
maximum weight of predictors to be included in a regression, defaults to maximum weight according to one in ten rule
a subset of predictors to be always included into a predictive model,defaults to empty set
maximum correlation between a pair of predictors in a model
whether R-squared statistics constrained is introduced
margin of error for R-squared statistics constraint
size of the training set
array to write predictions for the training split into, intially empty
Uses compute_max_weight
, sum_weights_sub
, make_numeric_sets
, get_predictions_lin
, get_predictions
, get_probabilities
, AUC
, combn
#creating variables
vari<-matrix(c(1:100,seq(1,300,3)),ncol=2)
#creating outcomes
out<-rbinom(100,1,0.3)
#creating array for predictions
pr<-array(NA,c(2,2))
pr_tr<-array(NA,c(2,2))
#passing set of the inexes of the predictors
c<-c(1:2)
#passing the weights of the predictors
we<-c(1,1)
#setting the mode
m<-'binary'
#running the function
cross_val(vari,out,c,10,10,100,we,2,pr,m,'det','exact',0.5,'acc',nr=c(1,4),n_tr=90,preds_tr=pr_tr)
Run the code above in your browser using DataLab