RVineStructureSelect: Sequential Specification of R- and C-Vine Copula Models

Description

This function fits either an R- or a C-vine copula model to a d-dimensional copula data set. Tree structures are determined and appropriate pair-copula families are selected using BiCopSelect() and estimated sequentially (forward selection of trees).

Usage

RVineStructureSelect(
  data,
  familyset = NA,
  type = 0,
  selectioncrit = "AIC",
  indeptest = FALSE,
  level = 0.05,
  trunclevel = NA,
  progress = FALSE,
  weights = NA,
  treecrit = "tau",
  rotations = TRUE,
  se = FALSE,
  presel = TRUE,
  method = "mle",
  cores = 1
)

Value

An RVineMatrix() object with the selected structure (RVM$Matrix) and families (RVM$family) as well as sequentially estimated parameters stored in RVM$par and RVM$par2. The object is augmented by the following information about the fit:

se, se2: standard errors for the parameter estimates; note that these are only approximate since they do not account for the sequential nature of the estimation,
nobs: number of observations,
logLik, pair.logLik: log likelihood (overall and pairwise)
AIC, pair.AIC: Aikaike's Informaton Criterion (overall and pairwise),
BIC, pair.BIC: Bayesian's Informaton Criterion (overall and pairwise),
emptau: matrix of empirical values of Kendall's tau,
p.value.indeptest: matrix of p-values of the independence test.

Arguments

data: An N x d data matrix (with uniform margins).
familyset: An integer vector of pair-copula families to select from. The vector has to include at least one pair-copula family that allows for positive and one that allows for negative dependence. Not listed copula families might be included to better handle limit cases. If familyset = NA (default), selection among all possible families is performed. Coding of pair-copula families is the same as in BiCop().
type: Type of the vine model to be specified:
0 or "RVine" = R-vine (default)
1 or "CVine" = C-vine
C- and D-vine copula models with pre-specified order can be specified using CDVineCopSelect of the package CDVine. Similarly, R-vine copula models with pre-specified tree structure can be specified using RVineCopSelect().
selectioncrit: Character indicating the criterion for pair-copula selection. Possible choices:selectioncrit = "AIC" (default), "BIC", or "logLik" (see BiCopSelect()).
indeptest: logical; whether a hypothesis test for the independence of u1 and u2 is performed before bivariate copula selection (default: indeptest = FALSE; see BiCopIndTest()). The independence copula is chosen for a (conditional) pair if the null hypothesis of independence cannot be rejected.
level: numeric; significance level of the independence test (default: level = 0.05).
trunclevel: integer; level of truncation.
progress: logical; whether the tree-wise specification progress is printed (default: progress = FALSE).
weights: numeric; weights for each observation (optional).
treecrit: edge weight for Dissman's structure selection algorithm, see Details.
rotations: If TRUE, all rotations of the families in familyset are included.
se: Logical; whether standard errors are estimated (default: se = FALSE).
presel: Logical; whether to exclude families before fitting based on symmetry properties of the data. Makes the selection about 30\ (on average), but may yield slightly worse results in few special cases.
method: indicates the estimation method: either maximum likelihood estimation (method = "mle"; default) or inversion of Kendall's tau (method = "itau"). For method = "itau" only one parameter families and the Student t copula can be used (family = 1,2,3,4,5,6,13,14,16,23,24,26,33,34 or 36). For the t-copula, par2 is found by a crude profile likelihood optimization over the interval (2, 10].
cores: integer; if cores > 1, estimation will be parallelized within each tree (using foreach::foreach()). Note that parallelization causes substantial overhead and may be slower than single-threaded computation when dimension, sample size, or family set are small or method = "itau".

Author

Jeffrey Dissmann, Eike Brechmann, Ulf Schepsmeier, Thomas Nagler

Details

R-vine trees are selected using maximum spanning trees w.r.t. some edge weights. The most commonly used edge weight is the absolute value of the empirical Kendall's tau, say $\hat{\tau}_{ij}$. Then, the following optimization problem is solved for each tree: $$\max \sum_{\mathrm{edges }\; e_{ij} \in \; \mathrm{ in \; spanning \; tree}} |\hat{\tau}_{ij}|, $$ where a spanning tree is a tree on all nodes. The setting of the first tree selection step is always a complete graph. For subsequent trees, the setting depends on the R-vine construction principles, in particular on the proximity condition.

Some commonly used edge weights are implemented:

`"tau"`	absolute value of empirical Kendall's tau.
`"rho"`	absolute value of empirical Spearman's rho.
`"AIC"`	Akaike information (multiplied by -1).
`"BIC"`	Bayesian information criterion (multiplied by -1).
`"cAIC"`	corrected Akaike information criterion (multiplied by -1).

If the data contain NAs, the edge weights in "tau" and "rho" are multiplied by the square root of the proportion of complete observations. This penalizes pairs where less observations are used.

The criteria "AIC", "BIC", and "cAIC" require estimation and model selection for all possible pairs. This is computationally expensive and much slower than "tau" or "rho". The user can also specify a custom function to calculate the edge weights. The function has to be of type function(u1, u2, weights) ... and must return a numeric value. The weights argument must exist, but does not has to be used. For example, "tau" (without using weights) can be implemented as follows:
function(u1, u2, weights)
abs(cor(u1, u2, method = "kendall", use = "complete.obs"))

The root nodes of C-vine trees are determined similarly by identifying the node with strongest dependencies to all other nodes. That is we take the node with maximum column sum in the empirical Kendall's tau matrix.

Note that a possible way to determine the order of the nodes in the D-vine is to identify a shortest Hamiltonian path in terms of weights $1-|\hat{\tau_{ij}|}$. This can be established for example using the package TSP. Example code is shown below.

References

Brechmann, E. C., C. Czado, and K. Aas (2012). Truncated regular vines in high dimensions with applications to financial data. Canadian Journal of Statistics 40 (1), 68-85.

Dissmann, J. F., E. C. Brechmann, C. Czado, and D. Kurowicka (2013). Selecting and estimating regular vine copulae and application to financial returns. Computational Statistics & Data Analysis, 59 (1), 52-69.

Examples

Run this code


# load data set
data(daxreturns)

# select the R-vine structure, families and parameters
# using only the first 4 variables and the first 250 observations
# we allow for the copula families: Gauss, t, Clayton, Gumbel, Frank and Joe
daxreturns <- daxreturns[1:250, 1:4]
RVM <- RVineStructureSelect(daxreturns, c(1:6), progress = TRUE)

## see the object's content or a summary
str(RVM)
summary(RVM)

## inspect the fitted model using plots
if (FALSE) plot(RVM)  # tree structure
contour(RVM)  # contour plots of all pair-copulas

## estimate a C-vine copula model with only Clayton, Gumbel and Frank copulas
CVM <- RVineStructureSelect(daxreturns, c(3,4,5), "CVine")

## determine the order of the nodes in a D-vine using the package TSP
library(TSP)
d <- dim(daxreturns)[2]
M <- 1 - abs(TauMatrix(daxreturns))
hamilton <- insert_dummy(TSP(M), label = "cut")
sol <- solve_TSP(hamilton, method = "repetitive_nn")
order <- cut_tour(sol, "cut")
DVM <- D2RVine(order, family = rep(0,d*(d-1)/2), par = rep(0, d*(d-1)/2))
RVineCopSelect(daxreturns, c(1:6), DVM$Matrix)

Run the code above in your browser using DataLab