Learn R Programming

matie (version 1.2)

ma: Measure association

Description

A non-parametric measure of association between variables. The association score $A$ ranges from 0 (when the variables are independent) to 1 (when they are perfectly associated). $A$ is a kind of $R^2$ estimate, and can be thought of as the proportion of variance in one variable explained by another (or explained by a number of other variables - $A$ works for multivariate associations as well).

Usage

ma(d,partition,ht,hp,hs,ufp)

Arguments

d
the n x m data frame containing n observations of m variables for which the maximal joint/marginal likelihood ratio score is required.
partition
a list of column indices specifying variable groupings. Defaults to list(c(m),c(1:m-1)) where m = ncol(d) which indicates explaining the last variable by means of all the other variables in the data set.
ht
tangent for the hyperbolic correction, default ht = 43.6978644.
hp
power for the hyperbolic correction, default hp = 0.8120818.
hs
scale for the hyperbolic correction, default hs = 6.0049711.
ufp
for debugging purposes, default FALSE.

Value

Returns a list of values ...
A
a score (including hyperbolic correction) estimating association for the data
rawA
the association score before hyperbolic correction
jointKW
the optimal kernel width for the joint distribution
altLL
the optimal weighted log likelihood for the alternate distribution
nullLL
the optimal log likelihood for the marginal distribution
marginalKW
the optimal kernel width for the marginal distribution
weight
the optimal weight used for the mixture
LRstat
the LR statistic, required for computing p values.
nRows
n, the number of complete samples in the data set
mCols
m, the number of variables in the data set
partition
user supplied partition for the variables in the data set
ufp
user supplied debugging flag

Details

An estimate of association (possibly nonlinear) is computed using a ratio of maximum likelihoods for the marginal distribution and maximum weighted likelihoods for the joint distribution. Before the computation is carried out the data is ranked using the rwt function from the matie package. This estimate is usually conservative (ie low) and a small-samples hyperbolic correction is applied by adding an offset, os, to the joint likelihood given by: os = ( 1 - 1 / (1 + A * ht) ) * ( n ^ (hp) / hs ) before the likelihood ratio is re-computed. As the dimension of the data set increases so does the under-estimation of A even with the hyperbolic correction.

References

Discovering general multidimensional associations, http://arxiv.org/abs/1303.1828

See Also

rwt pd sbd shpd std

Examples

Run this code
    # bivariate association
    d <- shpd(n=1000,m=2,Rsq=0.9)
    ma(d)$A
    #
    # multivariate association (the proportion of variance in "Salary"
    # explained by "Hits" and "Years")
    data(baseballData)
    ma(baseballData,partition=list(11,c(2,7)))$A

Run the code above in your browser using DataLab