Learn R Programming

FunChisq (version 2.5.4)

fun.chisq.test: Model-Free Functional Chi-Squared and Exact Tests

Description

Asymptotic chi-squared, normalized chi-squared or exact tests on contingency tables to determine model-free functional dependency of the column variable on the row variable.

Usage

fun.chisq.test(
  x,
  method = c("fchisq", "nfchisq", "adapted",
             "exact", "exact.qp", "exact.dp", "exact.dqp",
             "default", "normalized", "simulate.p.value"),
  alternative = c("non-constant", "all"), log.p=FALSE,
  index.kind = c("conditional", "unconditional"),
  simulate.nruns = 2000,
  exact.mode.bound=TRUE
)

Value

A list with class "htest" containing the following components:

statistic

the functional chi-squared statistic if method = "fchisq", "default", or "exact"; or the normalized functional chi-squared statistic if method = "nfchisq" or "normalized".

parameter

degrees of freedom for the functional chi-squared statistic.

p.value

p-value of the functional test. If method = "fchisq" (or "default"), it is computed by an asymptotic chi-squared distribution; if method = "nfchisq" (or "normalized"), it is computed by the standard normal distribution; if method = "exact", it is computed by an exact hypergeometric distribution.

estimate

an estimate of function index between 0 and 1. The value of 1 indicates a strictly mathematical function. It is asymmetrical with respect to transpose of the input contingency table, different from the symmetrical Cramer's V based on the Pearson's chi-squared test statistic. See Zhong2019FANTOM5,KumarZSLS18FunChisq for the definition of function index.

Arguments

x

a matrix representing a contingency table. The row variable represents the independent variable or all unique combinations of multiple independent variables. The column variable is the dependent variable.

method

a character string to specify the method to compute the functional chi-squared test statistic and its p-value. The options are "fchisq" (equivalent to "default", the default), "nfchisq" (equivalent to "normalized"), "exact", "adapted", "exact.qp", "exact.dp", "exact.dqp" or "simulate.p.value". See Details.

Note: "default" and "normalized" are deprecated.

alternative

a character string to specify the alternative hypothesis. The options are "non-constant" (default, non-constant functions) and "all" (all types of functions including constant ones).

log.p

logical; if TRUE, the p-value is given as log(p). Taking the log improves the accuracy when p-value is close to zero. The default is FALSE.

index.kind

a character string to specify the kind of function index xi.f to be estimated. The options are "conditional" (default) and "unconditional". See Details.

simulate.nruns

A number to specify the number of tables generated to simulate the null distribution. Default is 2000. Only used when method="simulate.p.value".

exact.mode.bound

logical; if TRUE, a fast branch-and-bound algorithm is used for the exact functional test (method="exact"). If FALSE, a slow brute-force enumeration method is used to provide a reference for runtime analysis. Both options provide the same exact p-value. The default is TRUE.

Author

Yang Zhang, Hua Zhong, Hien Nguyen, Sajal Kumar, and Joe Song

Details

The functional chi-squared test determines whether the column variable is a function of the row variable in contingency table x zhang2013deciphering,zhang2014nonparametricFunChisq. This function supports three hypothesis testing methods:

When method="fchisq" (equivalent to "default", the default), the test statistic is computed as described in zhang2013deciphering,zhang2014nonparametricFunChisq and the p-value is computed using the chi-squared distribution.

When method="nfchisq" (equivalent to "normalized"), the test statistic is obtained by shifting and scaling the original test statistic zhang2013deciphering,zhang2014nonparametricFunChisq; and the p-value is computed using the standard normal distribution Box2005FunChisq. The normalized chi-squared, more conservative on the degrees of freedom, was used by the Best Performer NMSUSongLab in HPN-DREAM (DREAM8) Breast Cancer Network Inference Challenges.

When method="exact", "exact.qp" (quadratic programming) zhong2019eft,zhong2019modelfreeFunChisq, "exact.dp" (dynamic programming) nguyen2018modelfree,Nguyen2020EFTFunChisq, or "exact.dqp" (dynamic and quadratic programming) nguyen2018modelfree,Nguyen2020EFTFunChisq, an exact functional test is performed. The option of "exact" uses "exact.dqp", the fastest method. All methods compute an exact p-value.

When method="adapted", the adapted functional chi-squared test Kumar2022AFTFunChisq is used. The test statistic is obtained by evaluating the most populous portrait or square (number of rows <= number of columns) table in the contingency table x. The p-value is computed using the chi-squared distribution. This option should be used to determine the functional direction between variables in x.

For the "exact.qp" and "exact.dp" options, if the sample size is no more than 200 or the average cell count is less than five, and the table size is no more than 10 in either row or column, the exact test will not be called and the asymptotic functional chi-squared test (method="fchisq") is used instead.

For "exact.dqp", the exact functional test will always be performed.

For 2-by-2 contingency tables, the asymptotic test options (method="fchisq" or "nfchisq") are recommended to test functional dependency, instead of the exact functional test.

When method="simulate.p.value", a simulated null distribution is used to calculate p-value. The null distribution is a multinomial distribution that is the product of two marginal distributions. Like other Monte Carlo based methods, this method is slower but may be more accurate than other methods based on asymptotic distributions.

index.kind specifies the kind of function index to be computed. If the experimental design controls neither the row nor column marginal sums, index.kind = "unconditional" is recommended; If the column marginal sums are controlled, index.kind = "conditional" is recommended. The conditional function index is the square root of Goodman-Kruskal's tau goodman1954measuresFunChisq. The choice of index.kind affects only the function index xi.f value, but not the test statistic or p-value.

References

See Also

For data discretization, an option is optimal univariate clustering via package Ckmeans.1d.dp. A second option is joint multivariate discretization via package GridOnClusters.

For symmetrical dependency tests on discrete data, see Pearson's chi-squared test chisq.test, Fisher's exact test fisher.test, and mutual information methods in package entropy.

Examples

Run this code
# \donttest{
# Example 1. Asymptotic functional chi-squared test
x <- matrix(c(20,0,20,0,20,0,5,0,5), 3)
fun.chisq.test(x) # strong functional dependency
fun.chisq.test(t(x)) # weak functional dependency

# Example 2. Normalized functional chi-squared test
x <- matrix(c(8,0,8,0,8,0,2,0,2), 3)
fun.chisq.test(x, method="nfchisq") # strong functional dependency
fun.chisq.test(t(x), method="nfchisq") # weak functional dependency

# Example 3. Exact functional chi-squared test
x <- matrix(c(4,0,4,0,4,0,1,0,1), 3)
fun.chisq.test(x, method="exact") # strong functional dependency
fun.chisq.test(t(x), method="exact") # weak functional dependency

# Example 4. Exact functional chi-squared test on a real data set
#            (Shen et al., 2002)
# x is a contingency table with row variable for p53 mutation and
#   column variable for CIMP
x <- matrix(c(12,26,18,0,8,12), nrow=2, ncol=3, byrow=TRUE)

# Example 5. Adpated functional chi-squared test
x <- matrix(c(20, 0, 1, 0, 1, 20, 3, 2, 15, 2, 5, 2), 3, 4, byrow=TRUE)
fun.chisq.test(x, method="adapted") # strong functional dependency
fun.chisq.test(t(x), method="adapted") # weak functional dependency

# Test the functional dependency: p53 mutation -> CIMP
fun.chisq.test(x, method="exact")

# Test the functional dependency CIMP -> p53 mutation
fun.chisq.test(t(x), method="exact")

# Example 6. Asymptotic functional chi-squared test with simulated distribution
x <- matrix(c(20,0,20,0,20,0,5,0,5), 3)
fun.chisq.test(x, method="simulate.p.value")
fun.chisq.test(x, method="simulate.p.value", simulate.n = 1000)
# }

Run the code above in your browser using DataLab