Learn R Programming

SpatialExtremes (version 2.1-0)

fitmaxstab: Fits a max-stable process to data

Description

This function fits max-stable processes to data using pairwise likelihood. Two max-stable characterisations are available: the Smith and Schlather representations.

Usage

fitmaxstab(data,  coord, cov.mod, loc.form, scale.form,
shape.form, marg.cov = NULL, temp.cov = NULL, temp.form.loc = NULL,
temp.form.scale = NULL, temp.form.shape = NULL, iso = FALSE, ...,
fit.marge = FALSE, warn = TRUE, method = "Nelder", start, control =
list(), weights = NULL, corr = FALSE, check.grad
= FALSE)

Arguments

data

A matrix representing the data. Each column corresponds to one location.

coord

A matrix that gives the coordinates of each location. Each row corresponds to one location.

cov.mod

A character string corresponding to the covariance model in the max-stable representation. Must be one of "gauss" for the Smith's model; "whitmat", "cauchy", "powexp", "bessel" or "caugen" for the Whittle-Matern, the Cauchy, the Powered Exponential, the Bessel and the Generalized Cauchy correlation families with the Schlather's model; "brown" for Brown-Resnick processes. The geometric Gaussian and Extremal-t models with a Whittle-Matern correlation function can be fitted by passing respectively "gwhitmat" or "twhitmat". Other correlation function families are considered in a similar way.

loc.form, scale.form, shape.form

R formulas defining the spatial linear model for the GEV parameters. May be missing. See section Details.

marg.cov

Matrix with named columns giving additional covariates for the GEV parameters. If NULL, no extra covariates are used.

temp.cov

Matrix with names columns giving additional *temporal* covariates for the GEV parameters. If NULL, no temporal trend are assume for the GEV parameters --- see section Details.

temp.form.loc, temp.form.scale, temp.form.shape

R formulas defining the temporal trends for the GEV parameters. May be missing. See section Details.

iso

Logical. If TRUE an isotropic model is fitted to data. Otherwise (default), anisotropy is allowed. Currently, this is only implemented for the Smith's model.

Several arguments to be passed to the optim, nlm or nlminb functions. See section details.

fit.marge

Logical. If TRUE, the GEV parameters are estimated pointwise or using the formulas given by loc.form, scale.form and shape.form. If FALSE, observations are supposed to be unit Frechet distributed. Note that when formulas are given, fit.marge is automatically set to TRUE.

warn

Logical. If TRUE (default), users are warned if the log-likelihood is infinite at starting values and/or problems arised while computing the standard errors.

method

The method used for the numerical optimisation procedure. Must be one of BFGS, Nelder-Mead, CG, L-BFGS-B, SANN, nlm or nlminb. See optim for details. Please note that passing nlm or nlminb will use the nlm or nlminb functions instead of optim.

start

A named list giving the initial values for the parameters over which the pairwise likelihood is to be minimized. If start is omitted the routine attempts to find good starting values - but might fail.

control

A list giving the control parameters to be passed to the optim function.

weights

A numeric vector specifying the weights in the pairwise likelihood - and so has length the number of pairs. If NULL (default), no weighting scheme is used.

corr

Logical. If TRUE (non default), the asymptotic correlation matrix is computed.

check.grad

Logical. If TRUE (non default), the analytic gradient of the pairwise likelihood will be compared to the numerical one. Such a checking might be usefull for ill-conditionned situation diagnosis.

Value

This function returns a object of class maxstab. Such objects are list with components:

fitted.values

A vector containing the estimated parameters.

std.err

A vector containing the standard errors.

fixed

A vector containing the parameters of the model that have been held fixed.

param

A vector containing all parameters (optimised and fixed).

deviance

The (pairwise) deviance at the maximum pairwise likelihood estimates.

corr

The correlation matrix.

convergence, counts, message

Components taken from the list returned by optim - for the mle method.

data

The data analysed.

model

The max-stable characterisation used.

fit.marge

A logical that specifies if the GEV margins were estimated.

cov.fun

The estimated covariance function - for the Schlather model only.

extCoeff

The estimated extremal coefficient function.

cov.mod

The covariance model for the spatial structure.

Warning

When using reponse surfaces to model spatially the GEV parameters, the likelihood is pretty rough so that the general purpose optimization routines may fail. It is your responsability to check if the numerical optimization succeeded or not. I tried, as best as I can, to provide warning messages if the optimizers failed but in some cases, no warning will appear!

Details

As spatial data often deal with a large number of locations, it is impossible to write analytically the joint distribution. Consequently, the fitting procedure substitutes the "full likelihood" for the pairwise likelihood.

Let define \(L_{i,j}(x_{i,j}, \theta)\) the likelihood for site \(i\) and \(j\), where \(i = 1, \dots, N-1\), \(j = i+1, \dots, N\), \(N\) is the number of site within the region and \(x_{i,j}\) are the joint observations for site \(i\) and \(j\). Then the pairwise likelihood \(PL(\theta)\) is defined by:

$$\ell_P = \log PL(\theta) = \sum_{i = 1}^{N-1} \sum_{j=i+1}^{N} \log L_{i,j} (x_{i,j}, \theta)$$

As pairwise likelihood is an approximation of the ``full likelihood'', standard errors cannot be computed directly by the inverse of the Fisher information matrix. Instead, a sandwich estimate must be used to account for model mispecification e.g.

$$\hat{\theta} \sim N(\theta, H^{-1} J H^{-1})$$ where \(H\) is the Fisher information matrix (computed from the misspecified model) and \(J\) is the variance of the score function.

There are two different kind of covariates : "spatial" and "temporal".

A "spatial" covariate may have different values accross station but does not depend on time. For example the coordinates of the stations are obviously "spatial". These "spatial" covariates should be used with the marg.cov and loc.form, scale.form, shape.form.

A "temporal" covariates may have different values accross time but does not depend on space. For example the years where the annual maxima were recorded is "temporal". These "temporal" covariates should be used with the temp.cov and temp.form.loc, temp.form.scale, temp.form.shape.

As a consequence note that marg.cov must have K rows (K being the number of sites) while temp.cov must have n rows (n being the number of observations).

References

Cox, D. R. and Reid, N. (2004) A note on pseudo-likelihood constructed from marginal densities. Biometrika 91, 729--737.

Demarta, S. and McNeil, A. (2005) The t copula and Related Copulas International Statistical Review 73, 111-129.

Gholam--Rezaee, M. (2009) Spatial extreme value: A composite likelihood. PhD Thesis. Ecole Polytechnique Federale de Lausanne.

Kabluchko, Z., Schlather, M. and de Haan, L. (2009) Stationary max-stable fields associated to negative definite functions Annals of Probability 37:5, 2042--2065.

Padoan, S. A. (2008) Computational Methods for Complex Problems in Extreme Value Theory. PhD Thesis. University of Padova.

Padoan, S. A., Ribatet, M. and Sisson, S. A. (2010) Likelihood-based inference for max-stable processes. Journal of the American Statistical Association (Theory and Methods) 105:489, 263-277.

Schlather, M. (2002) Models for Stationary Max-Stable Random Fields. Extremes 5:1, 33--44.

Smith, R. L. (1990) Max-stable processes and spatial extremes. Unpublished.

Examples

Run this code
# NOT RUN {
##Define the coordinate of each location
n.site <- 30
locations <- matrix(runif(2*n.site, 0, 10), ncol = 2)
colnames(locations) <- c("lon", "lat")

##Simulate a max-stable process - with unit Frechet margins
data <- rmaxstab(40, locations, cov.mod = "whitmat", nugget = 0, range = 3,
smooth = 0.5)

##Now define the spatial model for the GEV parameters
param.loc <- -10 + 2 * locations[,2]
param.scale <- 5 + 2 * locations[,1] + locations[,2]^2
param.shape <- rep(0.2, n.site)

##Transform the unit Frechet margins to GEV
for (i in 1:n.site)
  data[,i] <- frech2gev(data[,i], param.loc[i], param.scale[i],
param.shape[i])

##Define a model for the GEV margins to be fitted
##shape ~ 1 stands for the GEV shape parameter is constant
##over the region
loc.form <- loc ~ lat
scale.form <- scale ~ lon + I(lat^2)
shape.form <- shape ~ 1

##Fit a max-stable process using the Schlather's model
fitmaxstab(data, locations, "whitmat", loc.form, scale.form,
           shape.form)

## Model without any spatial structure for the GEV parameters
## Be careful this could be *REALLY* time consuming
fitmaxstab(data, locations, "whitmat")

##  Fixing the smooth parameter of the Whittle-Matern family
##  to 0.5 - e.g. considering exponential family. We suppose the data
##  are unit Frechet here.
fitmaxstab(data, locations, "whitmat", smooth = 0.5, fit.marge = FALSE)

##  Fitting a penalized smoothing splines for the margins with the
##     Smith's model
data <- rmaxstab(40, locations, cov.mod = "gauss", cov11 = 100, cov12 =
                 25, cov22 = 220)

##     And transform it to ordinary GEV margins with a non-linear
##     function
fun <- function(x)
  2 * sin(pi * x / 4) + 10
fun2 <- function(x)
  (fun(x) - 7 ) / 15

param.loc <- fun(locations[,2])
param.scale <- fun(locations[,2])
param.shape <- fun2(locations[,1])

##Transformation from unit Frechet to common GEV margins
for (i in 1:n.site)
  data[,i] <- frech2gev(data[,i], param.loc[i], param.scale[i],
param.shape[i])

##Defining the knots, penalty, degree for the splines
n.knots <- 5
knots <- quantile(locations[,2], prob = 1:n.knots/(n.knots+1))
knots2 <- quantile(locations[,1], prob = 1:n.knots/(n.knots+1))

##Be careful the choice of the penalty (i.e. the smoothing parameter)
##may strongly affect the result Here we use p-splines for each GEV
##parameter - so it's really CPU demanding but one can use 1 p-spline
##and 2 linear models.
##A simple linear model will be clearly faster...
loc.form <- y ~ rb(lat, knots = knots, degree = 3, penalty = .5)
scale.form <- y ~ rb(lat, knots = knots, degree = 3, penalty = .5)
shape.form <- y ~ rb(lon, knots = knots2, degree = 3, penalty = .5)

fitted <- fitmaxstab(data, locations, "gauss", loc.form, scale.form, shape.form,
                     control = list(ndeps = rep(1e-6, 24), trace = 10),
                     method = "BFGS")
fitted
op <- par(mfrow=c(1,3))
plot(locations[,2], param.loc, col = 2, ylim = c(7, 14),
     ylab = "location parameter", xlab = "latitude")
plot(fun, from = 0, to = 10, add = TRUE, col = 2)
points(locations[,2], predict(fitted)[,"loc"], col = "blue", pch = 5)
new.data <- cbind(lon = seq(0, 10, length = 100), lat = seq(0, 10, length = 100))
lines(new.data[,1], predict(fitted, new.data)[,"loc"], col = "blue")
legend("topleft", c("true values", "predict. values", "true curve", "predict. curve"),
       col = c("red", "blue", "red", "blue"), pch = c(1, 5, NA, NA), inset = 0.05,
       lty = c(0, 0, 1, 1), ncol = 2)

plot(locations[,2], param.scale, col = 2, ylim = c(7, 14),
     ylab = "scale parameter", xlab = "latitude")
plot(fun, from = 0, to = 10, add = TRUE, col = 2)
points(locations[,2], predict(fitted)[,"scale"], col = "blue", pch = 5)
lines(new.data[,1], predict(fitted, new.data)[,"scale"], col = "blue")
legend("topleft", c("true values", "predict. values", "true curve", "predict. curve"),
       col = c("red", "blue", "red", "blue"), pch = c(1, 5, NA, NA), inset = 0.05,
       lty = c(0, 0, 1, 1), ncol = 2)

plot(locations[,1], param.shape, col = 2,
     ylab = "shape parameter", xlab = "longitude")
plot(fun2, from = 0, to = 10, add = TRUE, col = 2)
points(locations[,1], predict(fitted)[,"shape"], col = "blue", pch = 5)
lines(new.data[,1], predict(fitted, new.data)[,"shape"], col = "blue")
legend("topleft", c("true values", "predict. values", "true curve", "predict. curve"),
       col = c("red", "blue", "red", "blue"), pch = c(1, 5, NA, NA), inset = 0.05,
       lty = c(0, 0, 1, 1), ncol = 2)
par(op)
# }

Run the code above in your browser using DataLab