bsgw.crossval
calculates cross-validation-based, out-of-sample log-likelihood of a bsgw model for a data set, given the supplied folds. bsgw.crossval.wrapper
applies bsgw.crossval
to a set of combinations of shrinkage parameters (lambda
,lambdas
) and produces the resulting vector of log-likelihood values as well as the specific combination of shrinkage parameters associated with the maximum log-likelihood. bsgw.generate.folds
generates random partitions, while bsgw.generate.folds.eventbalanced
generates random partitions with events evenly distributed across partitions. The latter feature is useful for cross-valiation of small data sets with low event rates, since it prevents over-accumulation of events in one or two partitions, and lack of events altogether in other partitions.
bsgw.generate.folds(ntot, nfold=5)
bsgw.generate.folds.eventbalanced(formula, data, nfold=5)
bsgw.crossval(data, folds, all=FALSE, print.level=1
, control=bsgw.control(), ncores=1, ...)
bsgw.crossval.wrapper(data, folds, all=FALSE, print.level=1
, control=bsgw.control(), ncores=1
, lambda.vec=exp(seq(from=log(0.01), to=log(100), length.out = 10)), lambdas.vec=NULL
, lambda2=if (is.null(lambdas.vec)) cbind(lambda=lambda.vec, lambdas=lambda.vec)
else as.matrix(expand.grid(lambda=lambda.vec, lambdas=lambdas.vec))
, plot=TRUE, ...)
Functions bsgw.generate.folds
and bsgw.generate.folds.eventbalanced
produce integer vectors of length ntot
or nrow(data)
respectively. The output of these functions can be directly passed to bsgw.crossval
or bsgw.crossval.wrapper
. Function bsgw.crossval
returns the log-likelihood of data under the assumed bsgw model, calculated using a cross-validation scheme with the supplied fold
parameter. If all=TRUE
, the estimation objects for each of the nfold
estimation jobs will be returned as the "estobjs" attribute of the returned value. Function bsgw.crossval.wrapper
returns a list with elements lambda
and lambdas
, the optimal shrinkage parameters for scale and shape coefficients, respectively. Additionally, the following attributes are attached:
Vector of log-likelihood values, one for each tested combination of lambda
and lambdas
.
The maximum log-likelihood value from the loglike.vec
.
Data frame with columns lambda
and lambdas
. Each row of this data frame contains one combination of shrinkage parameters that are tested in the wrapper function.
If all=TRUE
, a list of length nrow(lambda2)
is returned, with each element being itself a list of nfold
estimation objects associated with each call to the bsgw
function. This object can be examined by the user for diagnostic purposes, e.g. by applying plot against each object.
Number of observations to create partitions for. It must typically be set to nrow(data)
.
Number of folds or partitions to generate.
Survival formula, used to extract the binary status
field from the data. Right-hand side of the formula is ignored, so a formula of the form Surv(time,status)~1
is sufficient.
Data frame used in model training and prediction.
An integer vector of length nrow(data)
, defining fold/partition membership of each observation. For example, in 5-fold cross-validation for a data set of 200 observations, folds
must be a 200-long vector with elements from the set {1,2,3,4,5}
. Convenience functions bsgw.generate.folds
and bsgw.generate.folds.eventbalanced
can be used to generate the folds
vector for a given survival data frame.
If TRUE
, estimation objects from each cross-validation task is collected and returned for diagnostics purposes.
Verbosity of progress report.
List of control parameters, usually the output of bsgw.control.
Number of cores for parallel execution of cross-validation code.
Vector of shrinkage parameters to be tested for scale-parameter coefficients.
Vector of shrinkage parameters to be tested for shape-parameter coefficients.
A data frame that enumerates all combinations of lambda
and lambdas
to be tested. By default, it is constructed from forming all permutations of lambda.vec
and lambdas.vec
. If lambdas.vec=NULL
, it will only try equal values of the two parameters in each combination.
If TRUE
, and if the lambda
and lambdas
entries in lambda2
are identical, a plot of loglike
as a function of either vector is produced.
Other arguments to be passed to bsgw.
Alireza S. Mahani, Mansour T.A. Sharabiani
library("survival")
data(ovarian)
folds <- bsgw.generate.folds.eventbalanced(Surv(futime, fustat) ~ 1, ovarian, 5)
cv <- bsgw.crossval(ovarian, folds, formula=Surv(futime, fustat) ~ ecog.ps + rx
, control=bsgw.control(iter=50, nskip=10), print.level = 3)
cv2 <- bsgw.crossval.wrapper(ovarian, folds, formula=Surv(futime, fustat) ~ ecog.ps + rx
, control=bsgw.control(iter=50, nskip=10)
, print.level=3, lambda.vec=exp(seq(from=log(0.1), to=log(1), length.out = 3)))
Run the code above in your browser using DataLab