A routine for performing K-fold cross-validation for gamsel.
cv.gamsel(
x,
y,
lambda = NULL,
family = c("gaussian", "binomial"),
degrees = rep(10, p),
dfs = rep(5, p),
bases = pseudo.bases(x, degrees, dfs, parallel = parallel, ...),
type.measure = c("mse", "mae", "deviance", "class"),
nfolds = 10,
foldid,
keep = FALSE,
parallel = FALSE,
...
)
an object of class "cv.gamsel"
is returned, which is a list
with the ingredients of the cross-validation fit.
the values
of lambda
used in the fits.
The mean cross-validated
error - a vector of length length(lambda)
.
estimate of
standard error of cvm
.
upper curve = cvm+cvsd
.
lower curve = cvm-cvsd
.
number of non-zero
coefficients at each lambda
.
a text string indicating type of measure (for plotting purposes).
a fitted gamsel object for the full data.
value of lambda
that
gives minimum cvm
.
largest value of lambda
such that error is within 1 standard error of the minimum.
if keep=TRUE
, this is the array of prevalidated
fits. Some entries can be NA
, if that and subsequent values of
lambda
are not reached for that fold
if
keep=TRUE
, the fold assignments used
the sequence number of the minimum lambda.
the sequence number of the 1se lambda value.
x
matrix as in gamsel
response y
as in gamsel
Optional use-supplied lambda sequence. If NULL
,
default behaviour is for gamsel
routine to automatically select a
good lambda sequence.
family
as in gamsel
degrees
as in gamsel
dfs
as in gamsel
bases
as in gamsel
Loss function for cross-validated error calculation.
Currently there are four options: mse
(mean squared error),
mae
(mean absolute error), deviance
(deviance, same as
mse
for family="gaussian"
), class
(misclassification
error, for use with family="binomial"
).
Numer of folds (default is 10). Maximum value is nobs
.
Small values of nfolds
are recommended for large data sets.
Optional vector of length nobs
with values between 1
and nfolds
specifying what fold each observation is in.
If keep=TRUE
, a prevalidated array is returned
containing fitted values for each observation and each value of
lambda
. This means these fits are computed with this observation and
the rest of its fold omitted. The folid
vector is also returned.
Default is keep=FALSE
If TRUE
, use parallel foreach
to fit each
fold. See the example below for usage details.
Other arguments that can be passed to gamsel
.
Alexandra Chouldechova and Trevor Hastie
Maintainer: Trevor
Hastie hastie@stanford.edu
This function has the effect of running gamsel
nfolds
+1 times.
The initial run uses all the data and gets the lambda
sequence. The
remaining runs fit the data with each of the folds omitted in turn. The
error is accumulated, and the average error and standard deviation over the
folds is computed. Note that cv.gamsel
does NOT search for values
for gamma
. A specific value should be supplied, else gamma=.4
is assumed by default. If users would like to cross-validate gamma
as
well, they should call cv.gamsel
with a pre-computed vector
foldid
, and then use this same fold vector in separate calls to
cv.gamsel
with different values of gamma
. Note also that the
results of cv.gamsel
are random, since the folds are selected at
random. Users can reduce this randomness by running cv.gamsel
many
times, and averaging the error curves.
Chouldechova, A. and Hastie, T. (2015) Generalized Additive Model Selection
gamsel
, plot
function for cv.gamsel
object.
##data=gamsel:::gendata(n=500,p=12,k.lin=3,k.nonlin=3,deg=8,sigma=0.5)
data = readRDS(system.file("extdata/gamsel_example.RDS", package = "gamsel"))
attach(data)
bases=pseudo.bases(X,degree=10,df=6)
# Gaussian gam
gamsel.out=gamsel(X,y,bases=bases)
par(mfrow=c(1,2),mar=c(5,4,3,1))
summary(gamsel.out)
gamsel.cv=cv.gamsel(X,y,bases=bases)
par(mfrow=c(1,1))
plot(gamsel.cv)
par(mfrow=c(3,4))
plot(gamsel.out,newx=X,index=20)
Run the code above in your browser using DataLab