Learn R Programming

support (version 0.1.6)

e_dist: Computes the energy distance of a point set

Description

e_dist computes the energy distance between points D and a target distribution (or big dataset) \(F\). The cross-term \(E[||X-X'||]\), \(X,X'~F\) is NOT computed in e_dist for computational efficiency, since this is not needed for optimizing D. The target distribution or big dataset can be set using dist.str or dist.samp, respectively.

Usage

e_dist(D, dist.str=NA, dist.param=vector("list",ncol(D)),
       nsamp=1e6, dist.samp=NA)

Arguments

D

An \(n\) x \(p\) point set.

dist.str

A \(p\)-length string vector for target distribution (assuming independence). Current options include uniform, normal, exponential, gamma, lognormal, student-t, weibull, cauchy and beta. Exactly one of dist.str or dist.samp should be NA.

dist.param

A \(p\)-length list for desired distribution parameters in dist.str. The following parameters are supported:

  • Uniform: Minimum, maximum;

  • Normal: Mean, standard deviation;

  • Exponential: Rate parameter;

  • Gamma: Shape parameter, scale parameter;

  • Lognormal: Log-mean, Log-standard deviation;

  • Student-t: Degree-of-freedom;

  • Weibull: Shape parameter, scale parameter;

  • Cauchy: Location parameter, scale parameter;

  • Beta: Shape parameter 1, shape parameter 2.

nsamp

Number of samples to draw from dist.str for comparison.

dist.samp

An \(N\) x \(p\) matrix for the target big dataset (e.g., MCMC chain). Exactly one of dist.str or dist.samp should be NA.

References

Szekely, G. J. and Rizzo, M. L. (2013). Energy statistics: A class of statistics based on distances. Journal of Statistical Planning and Inference, 143(8):1249-1272.

Examples

Run this code
# NOT RUN {
  
# }
# NOT RUN {
  #############################################################
  # Generate 25 SPs for the 2-d i.i.d. N(0,1) distribution
  #############################################################
  n <- 25 #number of points
  p <- 2 #dimension
  D <- sp(n,p,dist.str=rep("normal",p))
  Drnd <- matrix(rnorm(n*p),ncol=p)
  e_dist(D$sp,dist.str=rep("normal",p)) #smaller
  e_dist(Drnd,dist.str=rep("normal",p))
  
  
    #############################################################
    # Support points for big data reduction: Franke's function
    #############################################################
    \dontrun{
    library(MHadaptive)
    
    #Use modified Franke's function as posterior
    franke2d <- function(xx){
      if ((xx[1]>1)||(xx[1]<0)||(xx[2]>1)||(xx[2]<0)){
        return(-Inf)
      }
      else{
        x1 <- xx[1]
        x2 <- xx[2]
        
        term1 <- 0.75 * exp(-(9*x1-2)^2/4 - (9*x2-2)^2/4)
        term2 <- 0.75 * exp(-(9*x1+1)^2/49 - (9*x2+1)/10)
        term3 <- 0.5 * exp(-(9*x1-7)^2/4 - (9*x2-3)^2/4)
        term4 <- -0.2 * exp(-(9*x1-4)^2 - (9*x2-7)^2)
        
        y <- term1 + term2 + term3 + term4
        return(2*log(y))
      }
    }
    
    #Generate MCMC samples
    li_func <- franke2d #Desired log-posterior
    ini <- c(0.5,0.5) #Initial point for MCMc
    NN <- 1e5 #Number of MCMC samples desired
    burnin <- NN/2 #Number of burn-in runs
    mcmc_r <- Metro_Hastings(li_func, pars=ini, prop_sigma=0.05*diag(2),
                             iterations=NN, burn_in=burnin)
    
    #Generate ncur SPs
    ncur <- 50
    D <- sp(ncur,2,dist.samp=mcmc_r$trace)$sp
    Drnd <- mcmc_r$trace[sample(1:nrow(mcmc_r$trace),n,F),]
    e_dist(D,dist.samp=mcmc_r$trace) #smaller
    e_dist(Drnd,dist.samp=mcmc_r$trace)
    }
  
# }

Run the code above in your browser using DataLab