estimateCV: Cross-Validated Estimation and Prediction

Description

Functions that perform cross-validated parameter estimation and prediction for the spatio-temporal model.

Usage

estimateCV(par.init = 3, mesa.data.model, Ind.cv, type = "p",
           h = 0.001, diff.type = 1, lower = -15, upper = 15,
           hessian.all = FALSE, control =
           list(trace = 3, maxit = 1000))
predictCV(par, mesa.data.model, Ind.cv, type = "p",express=FALSE, Nmax = 1000,
          silent = TRUE)

Arguments

par.init

Vector or matrix of starting point(s) for the parameter estimation. See further fit.mesa.model.

par

Vector or matrix of parameters. Cross-validated predictions are carried out using these parameter values. If par is a vector then the same parameter values will be used for all cross-validation sets. If par is a matri

mesa.data.model

Data structure holding observations, and information regarding the observation locations. See create.data.model and mesa.data.model

Ind.cv

Ind.cv defines the cross-validation scheme. Either a (number or observations) - by - (groups) logical matrix or an integer valued vector with length equal to (number or observations). See further

type

A single character indicating the type of log-likelihood to use. Valid options are "f", "p", and "r", for full, profile or restricted maximum likelihood (REML).

express

logical: should an ``express'' version of predictions be returned as a single data frame, without variances? For more datails see under ``Value'' (defaults to FALSE).

h, diff.type

Step length and type of finite difference to use when computing gradients. See loglike.grad and gen.gradient for details.

lower, upper

Bounds on the variables, see optim.

hessian.all

If type!="f" computes hessian (and uncertainties) for both regression and log-covariance parameters, not only for log-covariance parameters. See fit.mesa.model

control

A list of control parameters for the optimisation. See optim for details.

Nmax

Limits the size of matrices constructed when computing the expectations. See cond.expectation.

silent

If FALSE outputs brief progress information. If TRUE no output.

Value

estimateCV returns a list containing:
parA (number of parameters) - by - (number of groups) matrix containing the estimated parameters for each cross-validation group. This can be used as par-input to predictCV.
res.allA list with (number of groups) elements. Each element is the best estimation result for that cross-validation group, e.g. fit.mesa.model(...)$res.best.
If ``express''=FALSE (default), predictCV returns a list containing:
pred.obsA data.frame with the observations, predictions and prediction variances. The elements in this data.frame are in the same order as those in mesa.data.model$obs.
NULL if any observation occurs in more than one cross-validation set, e.g. max(rowSums(Ind.cv))!=1.
pred.by.cvA list with (number of groups) elements. Each element contains a data.frame with the observations, predictions and prediction variances for that cross-validation group. The elements in each data.frame are in the same order as those in mesa.data.model$obs.
NULL if no observation occurs in more than one cross-validation set, e.g. max(rowSums(Ind.cv))==1.
pred.allA list with (number of groups) elements. Each element contains a (number of time points) - by - (number of locations in that group) - by - (3) 3D-array. The arrays contain observations, predictions and prediction variances for the locations in each each cross-validation group.
If ``express''=TRUE, predictCV returns a data frame with information pertaining only to observations predicted during the CV cycle (i.e., observations left out and predicted upon at least once). That data frame has vectors:
IDThe location ID, as it appears in the ``location'' component of ``mesa.data.model''
dateThe observation's date
rawThe actual observation as it appears in the ``obs'' component of ``mesa.data.model''
lurThe prediction (on the model scale) without any spatial smoothing: see ``EX.mu'' under cond.expectation
krigThe prediction (on the model scale) of the full model including spatial smoothing: see ``EX'' under cond.expectation

encoding

latin1

Details

For estimateCV the inital parameters for the optimisation can be given as a vector of single starting point, as a matrix of multiple starting point, or as a single value. If par.init is a single integer then multiple starting points will be created as a set of constant vectors with the values of each starting point taken as seq(-5, 5, length.out=par.init).

For predictCV the parameters used to compute predictions for the left out observations can be either a single vector or a matrix with dim(par)[2]=dim(Ind.cv)[2] (or dim(par)[2]=max(Ind.cv)). For a single vector the same parameter values will be used for all cross-validation predictions, for a matrix the parameters in par[,i] will be used for the predictions of the cross-validation set given by Ind.cv[,i] (or Ind.cv==i). A suitable matrix is provided in the output from estimateCV.

The cross-validation groups are given by Ind.cv. Ind.cv should be either a (number of observations) - by - (groups) logical matrix or an integer valued vector with length equal to (number of observations). If a matrix then each column defines a cross-validation set with the TRUE values marking the observations to be left out. If a vector then 1:s denote observations to be dropped in the first cross-validation set, 2:s observations to be dropped in the second set, etc. Observations marked by values <=0< code=""> are never dropped. See createCV for details.

Examples

Run this code

##Load data
data(mesa.data.model)
data(mesa.data.res)

##create the CV structure defining 10 different CV-groups
Ind.cv <- createCV(mesa.data.model, groups=10, min.dist=.1)

##estimate different parameters for each CV-group
dimm <- loglike.dim(mesa.data.model)
x.init <- as.matrix(cbind(rep(2,dimm$nparam.cov),c(rep(c(1,-3),dimm$m+1),-3)))
##This may take a while...
par.est.cv <- estimateCV(par.est$res.best$par, 
      mesa.data.model, Ind.cv)
##Get the pre-computed results instead.
par.est.cv <- mesa.data.res$par.est.cv

##let's examine estimation of the results
names(par.est.cv)
##estimated parameters for each CV-group
par.est.cv$par

##Convergence of the optimisation for each group
sapply(par.est.cv$res.all,function(x) x$conv)

##boxplot of the different estimates from the CV
par(mfrow=c(1,1), mar=c(7,2.5,2,.5), las=2)
boxplot(t(par.est.cv$par))

##Do cross-validated predictions using the newly 
##This may take a while...
pred.cv <- predictCV(par.est.cv$par, mesa.data.model,
                     Ind.cv, silent = FALSE)
##Get the pre-computed results instead.
pred.cv <- mesa.data.res$pred.cv
##examine the results
names(pred.cv)

##Alternatively, one can use the ``express'' option which should be faster and returns a compact dataset:
pred0.cv <- predictCV(par.est.cv$par, mesa.data.model,Ind.cv, silent = FALSE,express=TRUE)
#Instead of a list, a single data frame is returned:
head(pred0.cv)



##compute CV-statistics
pred.cv.stats <- summaryStatsCV(pred.cv, lta=TRUE, trans=0)
pred.cv.stats$Stats

##Plot observations with CV-predictions and prediction intervals
par(mfcol=c(4,1),mar=c(2.5,2.5,2,.5))
plotCV(pred.cv,  1, mesa.data.model)
plotCV(pred.cv, 10, mesa.data.model)
plotCV(pred.cv, 17, mesa.data.model)
plotCV(pred.cv, 22, mesa.data.model)

##Residual QQ-plots
par(mfrow=c(1,2),mar=c(3,2,1,1),pty="s")
##Raw Residual QQ-plot
CVresiduals.qqnorm(pred.cv.stats$res)
##Normalized Residual QQ-plot
CVresiduals.qqnorm(pred.cv.stats$res.norm, norm=TRUE)

Run the code above in your browser using DataLab