Learn R Programming

copula (version 0.999-15)

C.n: The Empirical Copula

Description

Given pseudo-observations from a distribution with continuous margins and copula C, the empirical copula is the empirical distribution function of these pseudo-observations. It is thus a natural nonparametric estimator of C. The function C.n() computes the empirical copula.

The function dCn() approximates first-order partial derivatives of the unknown copula.

The function F.n() computes the empirical distribution function of a multivariate sample. Note that C.n(u, X, *) simply calls F.n(u, pobs(X), *) after checking u.

Usage

C.n(u, X, offset=0, method=c("C", "R")) dCn(u, U, j.ind=1:d, b=1/sqrt(nrow(U)), ...)
F.n(x, X, offset=0, method=c("C", "R"))
Cn(x, w) ## <-- deprecated! use C.n(w, x) instead!

Arguments

u,w
an $(m, d)$-matrix with elements in $[0,1]$ whose rows contain the evaluation points of the empirical copula.
x
an $(m, d)$-matrix whose rows contain the evaluation points of the empirical distribution function.
U
for dCN() only: an $(n,d)$-matrix with elements in $[0,1]$ and with the same number $d$ of columns as u. The rows of U are the pseudo-observations based on which the empirical copula is computed.
X
(and x and U for Cn():) an $(n, d)$-matrix with the same number $d$ of columns as x. Recall that a multivariate random sample X can be transformed to an appropriate U via pobs().
j.ind
integer vector of indices $j$ between 1 and $d$ indicating the dimensions with respect to which first-order partial derivatives are approximated.
b
numeric giving the bandwidth for approximating first-order partial derivatives.
offset
used in scaling the result which is of the form sum(....)/(n+offset); defaults to zero.
method
character string indicating which method is applied to compute the empirical cumulative distribution function or the empirical copula. method="C" uses a an implementation in C, method="R" uses a pure R implementation.
...
additional arguments passed to dCn().

Value

C.n() returns the empirical copula at u (that is, the empirical distribution function of the observations U evaluated at u). The name “U” is a slight misnomer here, for back compatibility reasons, as U typically contains the original observations X here. F.n() returns the empirical distribution function of X evaluated at x.dCn() returns a vector (length(j.ind) is 1) or a matrix (with number of columns equal to length(j.ind)), containing the approximated first-order partial derivatives of the unknown copula at u with respect to the arguments in j.ind.

Details

There are several asymptotically equivalent definitions of the empirical copula. As mentioned above, the empirical copula C.n() is simply defined as the empirical distribution function computed from the pseudo-observations, that is, $$C_n(\bm{u})=\frac{1}{n}\sum_{i=1}^n\mathbf{1}_{\{\hat{\bm{U}}_i\le\bm{u}\}},$$ where $U_i$, $i=1,..,n$, denote the pseudo-observations (rows in U) and $n$ the sample size. Internally, C.n() is just a wrapper for F.n() and is expected to be fed with the pseudo-observations.

The approximation for the $j$th partial derivative of the unknown copula $C$ is implemented as, for example, in Rémillard and Scaillet (2009), and given by $$\hat{\dot{C}}_{jn}(\bm{u})=\frac{C_n(u_1,..,u_{j-1},min(u_j+b,1),u_{j+1},..,u_d)-C_n(u_1,..,u_{j-1},max(u_j-b,0),u_{j+1},..,u_d)}{2b},$$ where $b$ denotes the bandwidth and $C[n]$ the empirical copula.

References

Rüschendorf, L. (1976). Asymptotic distributions of multivariate rank order statistics, Annals of Statistics 4, 912--923.

Deheuvels, P. (1979). La fonction de dépendance empirique et ses propriétés: un test non paramétrique d'indépendance, Acad. Roy. Belg. Bull. Cl. Sci., 5th Ser. 65, 274--292.

Deheuvels, P. (1981). A non parametric test for independence, Publ. Inst. Statist. Univ. Paris 26, 29--50.

Rémillard, B. and Scaillet, O. (2009). Testing for equality between two copulas. Journal of Multivariate Analysis, 100(3), pages 377-386.

See Also

pobs() for computing pseudo-observations, pCopula() for evaluating a copula.

Examples

Run this code
## Generate data X (from a meta-Gumbel model with N(0,1) margins)
n <- 100
d <- 3
family <- "Gumbel"
theta <- 2
cop <- onacopulaL(family, list(theta=theta, 1:d))
set.seed(1)
X <- qnorm(rCopula(n, cop)) # meta-Gumbel data with N(0,1) margins

## Random points were to evaluate the empirical copula
u <- matrix(runif(n*d), n, d)
ec <- C.n(u, X)

## Compare the empirical copula with the true copula
mean(abs(pCopula(u, copula=cop)-ec)) # ~= 0.012 -- increase n to decrease this error

## Compare the empirical copula with F.n(pobs())
U <- pobs(X) # pseudo-observations
stopifnot(identical(ec, F.n(u, X=pobs(U)))) # even identical

## Compare the empirical copula based on U at U with the Kendall distribution
## Note: Theoretically, C(U) ~ K, so K(C_n(U, U=U)) should approximately be U(0,1)
plot(pK(C.n(U, X), cop=cop@copula, d=d))

## Compare the empirical copula and the true copula on the diagonal
C.n.diag <- function(u) C.n(do.call(cbind, rep(list(u), d)), X=X) # diagonal of C_n
C.diag <- function(u) pCopula(do.call(cbind, rep(list(u), d)), cop) # diagonal of C
curve(C.n.diag, from=0, to=1, # empirical copula diagonal
      main=paste("True vs empirical diagonal of a", family, "copula"),
      xlab="u", ylab=expression("True C(u,..,u) and empirical"~C[n](u,..,u)))
curve(C.diag, lty=2, add=TRUE) # add true copula diagonal
legend("bottomright", lty=2:1, bty="n", inset=0.02,
       legend=c("C", expression(C[n])))

## Approximate partial derivatives w.r.t. the 2nd and 3rd component
j.ind <- 2:3 # indices w.r.t. which the partial derivatives are computed
## Partial derivatives based on the empirical copula and the true copula
der23 <- dCn(u, U=pobs(U), j.ind=j.ind)
der23. <- copula:::dCdu(archmCopula(family, param=theta, dim=d), u=u)[,j.ind]
## Approximation error
summary(as.vector(abs(der23-der23.)))

## For an example of using F.n(), see help(mvdc)% ./Mvdc.Rd

Run the code above in your browser using DataLab