Learn R Programming

SpatioTemporal (version 0.9.2)

estimateCV: Cross-Validated Estimation and Prediction

Description

Functions that perform cross-validated parameter estimation and prediction for the spatio-temporal model.

Usage

estimateCV(par.init = 3, mesa.data.model, Ind.cv, type = "p",
           h = 0.001, diff.type = 1, lower = -15, upper = 15,
           hessian.all = FALSE, control =
           list(trace = 3, maxit = 1000))

predictCV(par, mesa.data.model, Ind.cv, type = "p",express=FALSE, Nmax = 1000, silent = TRUE)

Arguments

par.init
Vector or matrix of starting point(s) for the parameter estimation. See further fit.mesa.model.
par
Vector or matrix of parameters. Cross-validated predictions are carried out using these parameter values. If par is a vector then the same parameter values will be used for all cross-validation sets. If par is a matri
mesa.data.model
Data structure holding observations, and information regarding the observation locations. See create.data.model and mesa.data.model
Ind.cv
Ind.cv defines the cross-validation scheme. Either a (number or observations) - by - (groups) logical matrix or an integer valued vector with length equal to (number or observations). See further
type
A single character indicating the type of log-likelihood to use. Valid options are "f", "p", and "r", for full, profile or restricted maximum likelihood (REML).
express
logical: should an ``express'' version of predictions be returned as a single data frame, without variances? For more datails see under ``Value'' (defaults to FALSE).
h, diff.type
Step length and type of finite difference to use when computing gradients. See loglike.grad and gen.gradient for details.
lower, upper
Bounds on the variables, see optim.
hessian.all
If type!="f" computes hessian (and uncertainties) for both regression and log-covariance parameters, not only for log-covariance parameters. See fit.mesa.model
control
A list of control parameters for the optimisation. See optim for details.
Nmax
Limits the size of matrices constructed when computing the expectations. See cond.expectation.
silent
If FALSE outputs brief progress information. If TRUE no output.

Value

  • estimateCV returns a list containing:
  • parA (number of parameters) - by - (number of groups) matrix containing the estimated parameters for each cross-validation group. This can be used as par-input to predictCV.
  • res.allA list with (number of groups) elements. Each element is the best estimation result for that cross-validation group, e.g. fit.mesa.model(...)$res.best.
  • If ``express''=FALSE (default), predictCV returns a list containing:
  • pred.obsA data.frame with the observations, predictions and prediction variances. The elements in this data.frame are in the same order as those in mesa.data.model$obs.

    NULL if any observation occurs in more than one cross-validation set, e.g. max(rowSums(Ind.cv))!=1.

  • pred.by.cvA list with (number of groups) elements. Each element contains a data.frame with the observations, predictions and prediction variances for that cross-validation group. The elements in each data.frame are in the same order as those in mesa.data.model$obs.

    NULL if no observation occurs in more than one cross-validation set, e.g. max(rowSums(Ind.cv))==1.

  • pred.allA list with (number of groups) elements. Each element contains a (number of time points) - by - (number of locations in that group) - by - (3) 3D-array. The arrays contain observations, predictions and prediction variances for the locations in each each cross-validation group.
  • If ``express''=TRUE, predictCV returns a data frame with information pertaining only to observations predicted during the CV cycle (i.e., observations left out and predicted upon at least once). That data frame has vectors:
  • IDThe location ID, as it appears in the ``location'' component of ``mesa.data.model''
  • dateThe observation's date
  • rawThe actual observation as it appears in the ``obs'' component of ``mesa.data.model''
  • lurThe prediction (on the model scale) without any spatial smoothing: see ``EX.mu'' under cond.expectation
  • krigThe prediction (on the model scale) of the full model including spatial smoothing: see ``EX'' under cond.expectation

encoding

latin1

Details

For estimateCV the inital parameters for the optimisation can be given as a vector of single starting point, as a matrix of multiple starting point, or as a single value. If par.init is a single integer then multiple starting points will be created as a set of constant vectors with the values of each starting point taken as seq(-5, 5, length.out=par.init).

For predictCV the parameters used to compute predictions for the left out observations can be either a single vector or a matrix with dim(par)[2]=dim(Ind.cv)[2] (or dim(par)[2]=max(Ind.cv)). For a single vector the same parameter values will be used for all cross-validation predictions, for a matrix the parameters in par[,i] will be used for the predictions of the cross-validation set given by Ind.cv[,i] (or Ind.cv==i). A suitable matrix is provided in the output from estimateCV.

The cross-validation groups are given by Ind.cv. Ind.cv should be either a (number of observations) - by - (groups) logical matrix or an integer valued vector with length equal to (number of observations). If a matrix then each column defines a cross-validation set with the TRUE values marking the observations to be left out. If a vector then 1:s denote observations to be dropped in the first cross-validation set, 2:s observations to be dropped in the second set, etc. Observations marked by values <=0< code=""> are never dropped. See createCV for details.

See Also

See createCV for creating CV-schemes. The functions use fit.mesa.model for estimation and cond.expectation for prediction. For computing CV statistics, see also compute.ltaCV, predictNaive, and for further illustration see plotCV, CVresiduals.qqnorm, and summaryStatsCV.

Examples

Run this code
##Load data
data(mesa.data.model)
data(mesa.data.res)

##create the CV structure defining 10 different CV-groups
Ind.cv <- createCV(mesa.data.model, groups=10, min.dist=.1)

##estimate different parameters for each CV-group
dimm <- loglike.dim(mesa.data.model)
x.init <- as.matrix(cbind(rep(2,dimm$nparam.cov),c(rep(c(1,-3),dimm$m+1),-3)))
##This may take a while...
par.est.cv <- estimateCV(par.est$res.best$par, 
      mesa.data.model, Ind.cv)
##Get the pre-computed results instead.
par.est.cv <- mesa.data.res$par.est.cv

##let's examine estimation of the results
names(par.est.cv)
##estimated parameters for each CV-group
par.est.cv$par

##Convergence of the optimisation for each group
sapply(par.est.cv$res.all,function(x) x$conv)

##boxplot of the different estimates from the CV
par(mfrow=c(1,1), mar=c(7,2.5,2,.5), las=2)
boxplot(t(par.est.cv$par))

##Do cross-validated predictions using the newly 
##This may take a while...
pred.cv <- predictCV(par.est.cv$par, mesa.data.model,
                     Ind.cv, silent = FALSE)
##Get the pre-computed results instead.
pred.cv <- mesa.data.res$pred.cv
##examine the results
names(pred.cv)

##Alternatively, one can use the ``express'' option which should be faster and returns a compact dataset:
pred0.cv <- predictCV(par.est.cv$par, mesa.data.model,Ind.cv, silent = FALSE,express=TRUE)
#Instead of a list, a single data frame is returned:
head(pred0.cv)



##compute CV-statistics
pred.cv.stats <- summaryStatsCV(pred.cv, lta=TRUE, trans=0)
pred.cv.stats$Stats

##Plot observations with CV-predictions and prediction intervals
par(mfcol=c(4,1),mar=c(2.5,2.5,2,.5))
plotCV(pred.cv,  1, mesa.data.model)
plotCV(pred.cv, 10, mesa.data.model)
plotCV(pred.cv, 17, mesa.data.model)
plotCV(pred.cv, 22, mesa.data.model)

##Residual QQ-plots
par(mfrow=c(1,2),mar=c(3,2,1,1),pty="s")
##Raw Residual QQ-plot
CVresiduals.qqnorm(pred.cv.stats$res)
##Normalized Residual QQ-plot
CVresiduals.qqnorm(pred.cv.stats$res.norm, norm=TRUE)

Run the code above in your browser using DataLab