Learn R Programming

climextRemes (version 0.3.1)

calc_riskRatio_pot: Compute risk ratio and uncertainty based on peaks-over-threshold models fit to exceedances over a threshold

Description

Compute risk ratio and uncertainty by fitting peaks-over-threshold model, designed specifically for climate data, to exceedance-only data, using the point process approach. The risk ratio is the ratio of the probability of exceedance of a pre-specified value under the model fit to the first dataset to the probability under the model fit to the second dataset. Default standard errors are based on the usual MLE asymptotics using a delta-method-based approximation, but standard errors based on the nonparametric bootstrap and on a likelihood ratio procedure can also be computed.

Usage

calc_riskRatio_pot(
  returnValue,
  y1,
  y2,
  x1 = NULL,
  x2 = NULL,
  threshold1,
  threshold2,
  locationFun1 = NULL,
  locationFun2 = NULL,
  scaleFun1 = NULL,
  scaleFun2 = NULL,
  shapeFun1 = NULL,
  shapeFun2 = NULL,
  nBlocks1 = nrow(x1),
  nBlocks2 = nrow(x2),
  blockIndex1 = NULL,
  blockIndex2 = NULL,
  firstBlock1 = 1,
  firstBlock2 = 1,
  index1 = NULL,
  index2 = NULL,
  nReplicates1 = 1,
  nReplicates2 = 1,
  replicateIndex1 = NULL,
  replicateIndex2 = NULL,
  weights1 = NULL,
  weights2 = NULL,
  proportionMissing1 = NULL,
  proportionMissing2 = NULL,
  xNew1 = NULL,
  xNew2 = NULL,
  declustering = NULL,
  upperTail = TRUE,
  scaling1 = 1,
  scaling2 = 1,
  ciLevel = 0.9,
  ciType,
  bootSE,
  bootControl = NULL,
  lrtControl = NULL,
  optimArgs = NULL,
  optimControl = NULL,
  initial1 = NULL,
  initial2 = NULL,
  logScale1 = NULL,
  logScale2 = NULL,
  getReturnCalcs = FALSE,
  getParams = FALSE,
  getFit = FALSE
)

Value

The primary outputs of this function are as follows: the log of the risk ratio and standard error of that log risk ratio (logRiskRatio and se_logRiskRatio) as well the risk ratio itself (riskRatio). The standard error is based on the usual MLE asymptotics using a delta-method-based approximation. If requested via ciType, confidence intervals will be returned, as discussed in Details.

Arguments

returnValue

numeric value giving the value for which the risk ratio should be calculated, where the resulting period will be the average number of blocks until the value is exceeded and the probability the probability of exceeding the value in any single block.

y1

a numeric vector of exceedance values for the first dataset (values of the outcome variable above the threshold). For better optimization performance, it is recommended that the y1 have magnitude around one (see Details), for which one can use scaling1.

y2

a numeric vector of exceedance values for the second dataset (values of the outcome variable above the threshold).

x1

a data frame, or object that can be converted to a data frame with columns corresponding to covariate/predictor/feature variables and each row containing the values of the variable for a block (e.g., often a year with climate data) for the first dataset. The number of rows must equal the number of blocks.

x2

analogous to x1 but for the second dataset

threshold1

a single numeric value for constant threshold or a numeric vector with length equal to the number of blocks, indicating the threshold for each block for the first dataset.

threshold2

analogous to threshold1 but for the second dataset

locationFun1

formula, vector of character strings, or indices describing a linear model (i.e., regression function) for the location parameter using columns from x1 for the first dataset. x1 must be supplied if this is anything other than NULL or ~1.

locationFun2

formula, vector of character strings, or indices describing a linear model (i.e., regression function) for the location parameter using columns from x2 for the second dataset. x2 must be supplied if this is anything other than NULL or ~1.

scaleFun1

formula, vector of character strings, or indices describing a linear model (i.e., regression function) for the (potentially transformed) scale parameter using columns from x1 for the first dataset. x1 must be supplied if this is anything other than NULL or ~1. logScale1 controls whether this determines the log of the scale or the scale directly.

scaleFun2

formula, vector of character strings, or indices describing a linear model (i.e., regression function) for the (potentially transformed) scale parameter using columns from x2 for the second dataset. x2 must be supplied if this is anything other than NULL or ~1. logScale2 controls whether this determines the log of the scale or the scale directly.

shapeFun1

formula, vector of character strings, or indices describing a linear model (i.e., regression function) for the shape parameter using columns from x1 for the first dataset. x1 must be supplied if this is anything other than NULL or ~1.

shapeFun2

formula, vector of character strings, or indices describing a linear model (i.e., regression function) for the shape parameter using columns from x2 for the first dataset. x2 must be supplied if this is anything other than NULL or ~1.

nBlocks1

number of blocks (e.g., a block will often be a year with climate data) in first dataset; note this value determines the interpretation of return values/periods/probabilities; see returnPeriod and returnValue.

nBlocks2

number of blocks (e.g., a block will often be a year with climate data) in second dataset; note this value determines the interpretation of return values/periods/probabilities; see returnPeriod and returnValue.

blockIndex1

numeric vector providing the index of the block corresponding to each element of y1. Used only when x1 is provided to match exceedances to the covariate/predictor/feature value for the exceedance or when using bootstrapping with the resampling based on blocks based on the by element of bootControl. If firstBlock1 is not equal to one, then blockIndex1 need not have one as its smallest possible value.

blockIndex2

numeric vector providing the index of the block corresponding to each element of y2. Analogous to blockIndex1.

firstBlock1

single numeric value indicating the numeric value of the first possible block of blockIndex1. For example the values in blockIndex1 might indicate the year of each exceedance with the first year of data being 1969, in which case firstBlock1 would be 1969. Note that the first block may not have any exceedances so it may not be represented in blockIndex1. Used only to adjust blockIndex1 so that the block indices start at one and therefore correspond to the rows of x1.

firstBlock2

single numeric value indicating the numeric value of the first possible block of blockIndex2. Analogous to firstBlock1.

index1

numeric vector providing the integer-valued index (e.g., julian day for daily climate data) corresponding to each element of y1. For example if there are 10 original observations and the third, fourth, and seventh values are exceedances, then index1 would be the vector 3,4,7. Used only when declustering is provided to determine which exceedances occur sequentially or within a contiguous set of values of a given length. The actual values are arbitrary; only the lags between the values are used.

index2

numeric vector providing the integer-valued index (e.g., julian day for daily climate data) corresponding to each element of y2. Analogous to index1.

nReplicates1

numeric value indicating the number of replicates for the first dataset.

nReplicates2

numeric value indicating the number of replicates for the second dataset.

replicateIndex1

numeric vector providing the index of the replicate corresponding to each element of y1. Used for three purposes: (1) when using bootstrapping with the resampling based on replicates based on the by element of bootControl, (2) to avoid treating values in different replicates as potentially being sequential or within a short interval when removing values based on declustering, and (3) to match outcomes to weights or proportionMissing when either vary by replicate.

replicateIndex2

numeric vector providing the index of the replicate corresponding to each element of y2. Analogous to replicateIndex1.

weights1

a vector or matrix providing the weights by block for the first dataset. When there is only one replicate or the weights do not vary by replicate, a vector of length equal to the number of blocks. When weights vary by replicate, a matrix with rows corresponding to blocks and columns to replicates. Likelihood contribution of each block is multiplied by the corresponding weight.

weights2

a vector or matrix providing the weights by block for the second dataset. Analogous to weights1.

proportionMissing1

a numeric value, vector or matrix indicating the proportion of missing values in the original first dataset before exceedances were selected. When the proportion missing is the same for all blocks and replicates, a single value. When there is only one replicate or the weights do not vary by replicate, a vector of length equal to the number of blocks. When weights vary by replicate, a matrix with rows corresponding to blocks and columns to replicates.

proportionMissing2

a numeric value, vector or matrix indicating the proportion of missing values in the original second dataset before exceedances were selected. Analogous to proportionMissing1.

xNew1

object of the same form as x1, providing covariate/predictor/feature values for which log risk ratios are desired.

xNew2

object of the same form as x2, providing covariate/predictor/feature values for which log risk ratios are desired. Must provide the same number of covariate sets as xNew1 as the risk ratio is based on contrasting return probabilities under xNew1 and xNew2.

declustering

one of NULL, "noruns", or a number. If 'noruns' is specified, only the maximum (or minimum if upperTail = FALSE) value within a set of exceedances corresponding to successive indices is included. If a number, this should indicate the size of the interval (which will be used with the index argument) within which to allow only the largest (or smallest if upperTail = FALSE) value.

upperTail

logical indicating whether one is working with exceedances over a high threshold (TRUE) or exceedances under a low threshold (FALSE); in the latter case, the function works with the negative of the values and the threshold, changing the sign of the resulting location parameters.

scaling1

positive-valued scalar used to scale the data values of the first dataset for more robust optimization performance. When multiplied by the values, it should produce values with magnitude around 1.

scaling2

positive-valued scalar used to scale the data values of the second dataset for more robust optimization performance. When multiplied by the values, it should produce values with magnitude around 1.

ciLevel

statistical confidence level for confidence intervals; in repeated experimentation, this proportion of confidence intervals should contain the true risk ratio. Note that if only one endpoint of the resulting interval is used, for example the lower bound, then the effective confidence level increases by half of one minus ciLevel. For example, a two-sided 0.90 confidence interval corresponds to a one-sided 0.95 confidence interval.

ciType

character vector indicating which type of confidence intervals to compute. See Details.

bootSE

logical indicating whether to use the bootstrap to estimate the standard error of the risk ratio

bootControl

a list of control parameters for the bootstrapping. See Details.

lrtControl

list containing a single component, bounds, which sets the range inside which the algorithm searches for the endpoints of the likelihood ratio-based confidence interval. This avoids numerical issues with endpoints converging to zero and infinity. If an endpoint is not found within the interval, it is set to NA.

optimArgs

a list with named components matching exactly any arguments that the user wishes to pass to optim. See help(optim) for details. Of particular note, 'method' can be used to choose the optimization method used for maximizing the log-likelihood to fit the model and 'control=list(maxit=VALUE)' for a user-chosen VALUE can be used to increase the number of iterations if the optimization is converging slowly.

optimControl

a list with named components matching exactly any elements that the user wishes to pass as the control argument to R's optim function. See help(optim) for details. Primarily provided for the Python interface because control can also be passed as part of optimArgs.

initial1

a list with components named 'location', 'scale', and 'shape' providing initial parameter values for the first dataset, intended for use in speeding up or enabling optimization when the default initial values are resulting in failure of the optimization; note that use of scaling1, logScale1 and .normalizeX = TRUE cause numerical changes in some of the parameters. For example with logScale1 = TRUE, initial value(s) for 'scale' should be specified on the log scale.

initial2

a list with components named 'location', 'scale', and 'shape' providing initial parameter values for the second dataset, intended for use in speeding up or enabling optimization when the default initial values are resulting in failure of the optimization; note that use of scaling2, logScale2 and .normalizeX = TRUE cause numerical changes in some of the parameters. For example with logScale2 = TRUE, initial value(s) for 'scale' should be specified on the log scale.

logScale1

logical indicating whether optimization for the scale parameter should be done on the log scale for the first dataset. By default this is FALSE when the scale is not a function of covariates and TRUE when the scale is a function of covariates (to ensure the scale is positive regardless of the regression coefficients).

logScale2

logical indicating whether optimization for the scale parameter should be done on the log scale for the second dataset. By default this is FALSE when the scale is not a function of covariates and TRUE when the scale is a function of covariates (to ensure the scale is positive regardless of the regression coefficients).

getReturnCalcs

logical indicating whether to return the estimated return values/probabilities/periods from the fitted models.

getParams

logical indicating whether to return the fitted parameter values and their standard errors for the fitted models; WARNING: parameter values for models with covariates for the scale parameter must interpreted based on the value of logScale.

getFit

logical indicating whether to return the full fitted models (potentially useful for model evaluation and for understanding optimization problems); note that estimated parameters in the fit object for nonstationary models will not generally match the MLE provided when getParams is TRUE because covariates are normalized before fitting and the fit object is based on the normalized covariates. Similarly, parameters will not match if scaling is not 1.

Author

Christopher J. Paciorek

Details

See fit_pot for more details on fitting the peaks-over-threshold model for each dataset, including details on blocking and replication. Also see fit_pot for information on the bootControl argument.

Optimization failures:

It is not uncommon for maximization of the log-likelihood to fail for extreme value models. Please see the help information for fit_pot. Also note that if the probability in the denominator of the risk ratio is near one, one may achieve better numerical performance by swapping the two datasets and computing the risk ratio for the probability under dataset 2 relative to the probability under dataset 1.

ciType can include one or more of the following: 'delta', 'lrt', 'boot_norm', 'boot_perc', 'boot_basic', 'boot_stud', 'boot_bca'. 'delta' uses the delta method to compute an asymptotic interval based on the standard error of the log risk ratio. 'lrt' inverts a likelihood-ratio test. Bootstrap-based options are the normal-based interval using the bootstrap standard error ('boot_norm'), the percentile bootstrap ('boot_perc'), the basic bootstrap ('boot_basic'), the bootstrap-t ('boot_stud'), and the bootstrap BCA method ('boot_bca'). See Paciorek et al. for more details.

See fit_pot for information on the bootControl argument.

References

Paciorek, C.J., D.A. Stone, and M.F. Wehner. 2018. Quantifying uncertainty in the attribution of human influence on severe weather. Weather and Climate Extremes 20:69-80. arXiv preprint <https://arxiv.org/abs/1706.03388>.

Jeon S., C.J. Paciorek, and M.F. Wehner. 2016. Quantile-based bias correction and uncertainty quantification of extreme event attribution statements. Weather and Climate Extremes 12: 24-32. <DOI:10.1016/j.wace.2016.02.001>. arXiv preprint: <http://arxiv.org/abs/1602.04139>.

Examples

Run this code
data(Fort, package = 'extRemes')
threshold <- 0.395
ord <- order(Fort$year, Fort$month, Fort$day) 
Fort <- Fort[ord, ]
ind <- Fort$Prec > threshold
FortExc <- Fort[ind, ]
earlyYears <- 1900:1929
lateYears <- 1970:1999
earlyPeriod <- which(FortExc$year %in% earlyYears)
latePeriod <- which(FortExc$year %in% lateYears)
# contrast late period with early period, assuming a nonstationary fit
# within each time period and finding RR based on midpoint of each period
if (FALSE) {
out <- calc_riskRatio_pot(returnValue = 3,
                   y1 = FortExc$Prec[earlyPeriod], y2 = FortExc$Prec[latePeriod],
                   x1 = data.frame(years = earlyYears), x2 = data.frame(years = lateYears),
                   threshold1 = threshold, threshold2 = threshold,
                   locationFun1 = ~years, locationFun2 = ~years,
                   xNew1 = data.frame(years = mean(earlyYears)), 
	      xNew2 = data.frame(years = mean(lateYears)),
                   blockIndex1 = FortExc$year[earlyPeriod], 
                   blockIndex2 = FortExc$year[latePeriod],
                   firstBlock1 = earlyYears[1], firstBlock2 = lateYears[1])
}

Run the code above in your browser using DataLab