Compute a measure of maximum exchangable asymmetry of a copula \(\mathbf{C}_\Theta\) using exchangability (permutation symmetry) according to De Baets and De Meyer (2017) by
$$\mu_{\infty\mathbf{C}}^{\mathrm{permsym}} = \mu_\infty^{\mathrm{permsym}} = 3 \times \mathrm{max}\bigl(\,|\,\mathbf{C}_\Theta(u,v) -
\mathbf{C}_\Theta(v,u)\,|\,\bigr)$$
for \((u,v) \in \mathcal{I}^2\). De Baets and De Meyer comment that among many asymmetric metrics with copulas that \(\mu_\infty^{\mathrm{permsym}}\) is “by far the most interesting” (De Baets and De Meyer, 2017, p. 36). The 3 multiplier in the definition ensures that \(\mu_\infty^{\mathrm{permsym}} \in [0, 1]\). Those authors also conclude that exchangability of random variables, in general, is not a desired property in statistical models, and they use the \(\mu_\infty\) notation in lieu of \(L_\infty^{\mathrm{permsym}}\) (see documentation related to LpCOPpermsym
). The term “Permutation-Mu” is used for this measure in copBasic-package
and in similar contexts.
LzCOPpermsym(cop=NULL, para=NULL, n=5E4,
type=c("halton", "sobol", "torus", "runif"),
as.abs=TRUE, as.vec=FALSE, as.mat=FALSE, plot=FALSE, ...)
A scalar value for the measure is returned or other as dictated by arguments.
A copula function;
Vector of parameters, if and as needed, to pass to the copula;
The simulation size. The default seems sufficient for many practical applications but is suboptimal because the maximum operator in the definition is expected to potentially underestimate the true maximum. When a vector is returned, the default simulation size appears sufficient for many parameter estimation schemes;
The type of random number generator on \(\mathcal{I}^2\) for computing the maximum (apparent) (see argument n
) or a vector of signed differences (see Details);
A logical controlling whether the absolute value operation in the \(\mu_\infty^{\mathrm{permsym}}\) definition is used. This feature permits flexibility retaining the sign of asymmetry;
A logical to disable the maximum operation but instead return the a vector of signed differences in the exchanged variables. If this argument is set true, then as.abs
will be set false. The return of a vector of signed differences (still multiplied by 3) could be useful in parameter estimation schemes with a similar vector from an empirical copula (EMPIRcop
) (see Details);
A logical to disable the maximum operation (like as.vec
), but instead return a matrix of the \(\mathcal{I}^2\) values with third column as the vector of signed differences. If this argument is set true, then as.abs
will be set false;
A logical to create a plot of the \(\mathcal{I}^2\) domain used in the simulation with a plot title showing the type
argument setting;
Additional arguments to pass to support flexible implementation.
W.H. Asquith
EFFECT OF RANDOM NUMBER GENERATION---Package randtoolbox provides for random number generation on forms different than simply simulating uniform independent random variables for \(\mathcal{I}^2\). The Halton, Sobol, and Torus types are implemented. The plot
argument is useful for the user to see the differences in how these generators canvas the \(\mathcal{I}^2\) domain.
The default is Halton, which visually appears to better canvas \(\mathcal{I}^2\) without the clumping that simple uniform random variables does and without the larger gaps of Sobol or Torus. Testing indicates that Halton might generally require the smallest simulation size of the others with simple uniform random variables potentially being the worst and hence such is not the default. Exceptions surely exist depending on the style of the asymmetry. Nevertheless, Halton, Sobol, and Torus produce more consistent estimation behavior with each having a monotone approach towards the true maximum than simple uniform random variables.
The following example is a useful illustration of an asymmetrical Clayton copula (\(\mathbf{CL}(u,v; \Theta)\), CLcop
) by composition of a single copula (composite1COP
) with the theorical \(\mu_\infty^{\mathrm{permsym}}\) maxima computed by large sample simulation. A user might explore the effect of the random number generation by changing the type
variable.
type <- "halton"
para <- list(cop=CLcop, para=20, alpha=0.3, beta=0.1) # asymmetrical Clayton
ti <- LzCOPpermsym(cop=composite1COP, para=para, n=2E6, type=type) # large
ns <- as.integer( 10^seq(1, 4, by=0.05) ) # sequence of simulation sizes
mi <- sapply(ns, function(n) { # produce vector of maxima for simulation size
LzCOPpermsym(cop=composite1COP, para=para, n=n, type=type) })
ylim <- range(c(0.06, mi, ti)) # vertical limits to ensure visibility
plot(ns, mi, log="x", pch=21, bg=grey(0.9), ylim=ylim, main=type,
xlab="Simulation size", ylab="Maximum asymmetry measure")
abline(h=ti, lwd=3, col="seagreen") # large sample size estimate in green
COPULA PARAMETER ESTIMATION---Parameter estimation using signed permutation asymmetry vector can readily be accomplished. In the self-contained example below, we will assume a parent of Gumbel--Hougaard (\(\mathbf{GH}(u,v; \Theta)\), GHcop
) extended to asymmetry by using three parameters (\(\Theta = (10, 0.8, 0.6)\). Imagine that we unfortunately have a very small sample size (\(n = 100\)) as “hundred years of data.” The small sample size facilitates the use of the checkboard empirical copula (EMPIRcop
); the sample size is small enough that the checkerboard helps smooth through ties. The simulation size for LzCOPpermsym
is set “large” as presumed by the existing default.
para <- c(10, 0.8, 0.6) # parameters of the parent
nsam <- 100; seed <- 2; nsim <- 5000 # note a change from default
as.vec <- TRUE # set to FALSE to use just Permutation-Mu
rhoP <- rhoCOP(cop=GHcop, para=para) # parent Spearman Rho
UVsS <- simCOP(cop=GHcop, para=para, n=nsam, seed=seed) # simulate a sample
rhoS <- rhoCOP(as.sample=TRUE, para=UVsS) # sample Spearman Rho
infS <- LzCOPpermsym(cop=EMPIRcop, para=UVsS, n=nsim, type="halton",
as.vec=as.vec, ctype="checkerboard")
# empirical copula used and returning signed asymmetry vector
# transformation and re-transformation, GHcop paras >1; [0,1]; and [0,1]
tparf <- function(par) c(exp(par[1]) + 1, pnorm( par[2] ), pnorm( par[3] ))
rparf <- function(par) c(log(par[1] - 1), qnorm( par[2] ), qnorm( par[3] )) ofunc <- function(par, norho=FALSE) { # objective function
mypara <- tparf(par)
rhoT <- rhoCOP(cop=GHcop, para=mypara) # simulated Spearman Rho
infT <- LzCOPpermsym(cop=GHcop, para=mypara, n=nsim, type="halton",
as.vec=as.vec)
err <- mean( (infT - infS)^2 ) # mean squared errors
ifelse(norho, err, (rhoT - rhoS)^2 + err) # with Spearman Rho or not
}
init.par <- rparf(c(2, 0.5, 0.5)); rt <- NULL # initial parameter guess
try( rt <- optim(init.par, ofunc, norho=FALSE) ) # 3D optimization
if(is.null(rt)) stop("fatal, optim() returned NULL")
# construct GHcop parameters from optimization with re-transformation
sara <- tparf(rt$par)
rhoT <- rhoCOP(cop=GHcop, para=sara) # theoretical Spearman Rho
UVsT <- simCOP(cop=GHcop, para=sara, n=nsam, seed=seed, # same seed sim by
cex=0.3, pch=16, col="red", ploton=FALSE) # est. parameters
mara <- mleCOP(UVsS, cop=GHcop, init.para=init.par, parafn=tparf)$para
level.curvesCOP(cop=GHcop, para=para)
level.curvesCOP(cop=GHcop, para=sara, ploton=FALSE, col="red" ) # perm diffs
level.curvesCOP(cop=GHcop, para=mara, ploton=FALSE, col="blue") # mleCOP()
Comparison of level curves between the known parent, the parameter estimation using function LzCOPpermsym
, and the maximum likelihood by mleCOP
shows that signed asymmetry differences can be used for parameter estimation. One could use the maximum as in the definition, but for purposes of high-dimensional optimization, using the vector might be better to prevent local minima (less optimal solutions) being found if the \(\mu_\infty^{\mathrm{permsym}}\) was used. Because vectors of differences between empirical copula and the fitted copula are involved, measures of fit using such differences are expected to be more favorable to optimization than using LzCOPpermsym
than say maximum likelihood. The measures of fit AIC (aicCOP
), BIC (bicCOP
), and RMSE (rmseCOP
), for example, are often, smaller for the sara
fitted parameters than for the mara
fitted (maximum likelihood). Finally, setting as.vec <- FALSE
, re-running, and thus using \(\mu_\infty^{\mathrm{permsym}}\), will likely show parameter estimates, visible through the level curves, that are much less favorable.
RELATION TO ANOTHER DISTANCES---The documentation for LpCOPpermsym
lists a supremum definition \(L_\infty^{\mathrm{permsym}}\), which is like \(\mu_\infty^{\mathrm{permsym}}\) but lacks the multiplier of 3. However, that documentation mentions a ratio of 1/3 being as upper bounds and hence the De Baets and De Meyer (2017) reasoning for the 3 multiplier to rescale \(\mu_\infty^{\mathrm{permsym}} \in (0,1)\). The simple interrelations between the two functions are explored in the following example:
para <- c(30, 0.2, 0.95); n <- 5E4
p <- 1
mean(abs(LzCOPpermsym(cop=GHcop, para=para, n=n,
as.vec=TRUE)/3)^p)^(1/p) # 0.01867929
LpCOPpermsym(cop=GHcop, para=para, p=p) # 0.01867940
p <- 3
mean(abs(LzCOPpermsym(cop=GHcop, para=para, n=n,
as.vec=TRUE)/3)^p)^(1/p) # 0.02649376
LpCOPpermsym(cop=GHcop, para=para, p=p) # 0.02649317
The critical note is that LpCOPpermsym
is the integral of the absolute differences in permuted differences across \(\mathcal{I}^2\). Hence, it is an expectation. The LzCOPpermsym
is difference because of the maximum of the differences. The computations in the example above show how the same information can be extracted from the two functions. De Baets and De Meyer (2017) do not make reference to raising and then rooting by the power \(p\) as shown. The examples here provide credence to the default setting of n
(simulation size) for several significant figures of similarity.
De Baets, B., and De Meyer, H., 2017, Chapter 3---A look at copulas in a curved mirror: New York, Springer, ISBN 978--3--319--64221--5, pp. 33--47.
LpCOPpermsym
, isCOP.permsym