Learn R Programming

sensitivity (version 1.12.1)

sensiHSIC: Sensitivity Indices based on Hilbert-Schmidt Independence Criterion (HSIC)

Description

sensiHSIC conducts a sensitivity analysis where the impact of an input variable is defined in terms of the distance between the input/output joint probability distribution and the product of their marginals when they are embedded in a Reproducing Kernel Hilbert Space (RKHS). This distance corresponds to the Hilbert-Schmidt Independence Criterion (HSIC) proposed by Gretton et al. (2005) and serves as a dependence measure between random variables, see Da Veiga (2014) for an illustration in the context of sensitivity analysis.

Usage

sensiHSIC(model = NULL, X, kernelX = "rbf", paramX = NA, 
            kernelY = "rbf", paramY = NA, nboot = 0, conf = 0.95, ...)
  ## S3 method for class 'sensiHSIC':
tell(x, y = NULL, \dots)
  ## S3 method for class 'sensiHSIC':
print(x, \dots)
  ## S3 method for class 'sensiHSIC':
plot(x, ylim = c(0, 1), ...)

Arguments

model
a function, or a model with a predict method, defining the model to analyze.
X
a matrix or data.frame representing the input random sample.
kernelX
a string or a list of strings specifying the reproducing kernel to be used for the input variables. If only one kernel is provided, it is used for all input variables. Available choices are "rbf" (Gaussian), "laplace" (exponential), "dcov" (dista
paramX
a scalar or a vector of hyperparameters to be used in the input variable kernels. If only one scalar is provided, it is replicated for all input variables. By default paramX is equal to the standard deviation of the input variable fo
kernelY
a string specifying the reproducing kernel to be used for the output variable. Available choices are "rbf" (Gaussian), "laplace" (exponential), "dcov" (distance covariance, see details), "raquad" (rationale quadratic), "invmultiquad" (inverse mul
paramY
a scalar to be used in the output variable kernel. By default paramY is equal to the standard deviation of the output variable for "rbf", "laplace", "raquad", "invmultiquad", "matern3" and "matern5" and to 1 for "dcov".
nboot
the number of bootstrap replicates
conf
the confidence level for confidence intervals.
x
a list of class "sensiHSIC" storing the state of the sensitivity study (parameters, data, estimates).
y
a vector of model responses.
ylim
y-coordinate plotting limits.
...
any other arguments for model which are passed unchanged each time it is called.

Value

  • sensiHSIC returns a list of class "sensiHSIC", containing all the input arguments detailed before, plus the following components:
  • callthe matched call.
  • Xa data.frame containing the design of experiments.
  • ya vector of model responses.
  • Sthe estimations of HSIC sensitivity indices.

Details

The HSIC sensitivity indices are obtained as a normalized version of the Hilbert-Schmidt independence criterion: $$S_i^{HSIC} = \frac{HSIC(X_i,Y)}{\sqrt{HSIC(X_i,X_i)}\sqrt{HSIC(Y,Y)}},$$ see Da Veiga (2014) for details. When kernelX="dcov" and kernelY="dcov", the kernel is given by $k(u,u')=1/2(||u||+||u'||-||u-u'||)$ and the sensitivity index is equal to the distance correlation introduced by Szekely et al. (2007) as was recently proven by Sejdinovic et al. (2013).

References

Da Veiga S. (2014), Global sensitivity analysis with dependence measures, Journal of Statistical Computation and Simulation, in press. http://hal.archives-ouvertes.fr/hal-00903283 Gretton A., Bousquet O., Smola A., Scholkopf B. (2005), Measuring statistical dependence with hilbert-schmidt norms, Jain S, Simon H, Tomita E, editors: Algorithmic learning theory, Lecture Notes in Computer Science, Vol. 3734, Berlin: Springer, 63--77. Sejdinovic D., Sriperumbudur B., Gretton A., Fukumizu K., (2013), Equivalence of distance-based and RKHS-based statistics in hypothesis testing, Annals of Statistics 41(5), 2263--2291. Szekely G.J., Rizzo M.L., Bakirov N.K. (2007), Measuring and testing dependence by correlation of distances, Annals of Statistics 35(6), 2769--2794.

See Also

kde, sensiFdiv

Examples

Run this code
# Test case : the non-monotonic Sobol g-function
  # Only one kernel is provided with default hyperparameter value
  n <- 100
  X <- data.frame(matrix(runif(8 * n), nrow = n))
  x <- sensiHSIC(model = sobol.fun, X, kernelX = "raquad", kernelY = "rbf")
  print(x)
  
  # Test case : the Ishigami function
  # A list of kernels is given with default hyperparameter value
  n <- 100
  X <- data.frame(matrix(-pi+2*pi*runif(3 * n), nrow = n))
  x <- sensiHSIC(model = ishigami.fun, X, kernelX = c("rbf","matern3","dcov"), 
                  kernelY = "rbf")
  print(x)
  
  # A combination of kernels is given and a dummy value is passed for 
  # the first hyperparameter
  x <- sensiHSIC(model = ishigami.fun, X, kernelX = c("ssanova1","matern3","dcov"), 
                  paramX = c(1,2,1), kernelY = "ssanova1")
  print(x)

Run the code above in your browser using DataLab