Learn R Programming

minerva (version 1.5.10)

mic_strength: Compute the association strengh

Description

This function uses the null distribution of the tic_e computed with the function mictools. Based on the available pvalue and the permutation null distribution it identifies reliable association between variables.

Usage

mic_strength(x, pval, alpha = NULL, C = 5, pthr = 0.05, pval.col = NULL)

Arguments

x

a numeric matrix with N samples on the rows and M variables on the columns (NxM).

pval

a data.frame with pvalues for each pair of association of the x input matrix. It should contain two colums with the indices of the computed association according to the x input matrix

alpha

float (0, 1.0] or >=4 if alpha is in (0,1] then B will be max(n^alpha, 4) where n is the number of samples. If alpha is >=4 then alpha defines directly the B parameter. If alpha is higher than the number of samples (n) it will be limited to be n, so B = min(alpha, n) Default value is 0.6 (see Details).

C

a positive integer number, the C parameter of the mine statistic. See mine function for further details.

pthr

threshold on pvalue for measure to consider for computing mic_e

pval.col

an integer or character or vector relative to the columns of pval dataframe respectively for pvalue, association between variable 1, variable 2 in the x input matrix. See Details for further information.

Value

A dataframe with the tic_e Pvalue, the mic value and the column identifier regarding the input matrix x of the variables of which the association is computed.

Details

The method implemented here is a wrapper for the original method published by Albanese et al. (2018). The python version is available at https://github.com/minepy/mictools.

This function should be called after the estimation of the null distribution of tic_e scores based on permutations of the input data.

The mic association is computed only for the variables for which the pvalue in the pval data.frame is less then the threshold set with the pthr input parameter. We assume the first column of the pval data.frame contains the pvalue, this value can be changed using the pval.col[1] parameter.

The pval.col parameter, by default takes the first three columns in the pval data.frame, in particular the first column containing the pvalues of the association between variable in column pval.col[2] and pval.col[3]. If a character vector is provided names in pval.col are matched with the names in pval data.frame. If NULL is passed it is assumed the first column contains pvalue, while the 2 and 3 the index or name of the variable in x. If one value is passed it refers to the pvalue column and the consecutive two columns are assume to contain variable indexes.

See Also

mine, mictools, p.adjust

Examples

Run this code
# NOT RUN {
data(Spellman)
mydata <- as.matrix(Spellman[, 10:20])
ticenull <- mictools(mydata, nperm=1000)

## Use the nominal pvalue:
ms <- mic_strength(mydata, pval=ticenull$pval, alpha=NULL, pval.col = c(1, 4,5))

## Use the adjusted pvalue:
ms <- mic_strength(mydata, pval=ticenull$pval, alpha=NULL, pval.col = c(6, 4,5))

ms 

# }
# NOT RUN {
## Use qvalue
require(qvalue)
qobj <- qvalue(ticenull$pval$pval)
ticenull$pval$qvalue <- qobj$qvalue
ms <- mic_strength(mydata, pval=ticenull$pval, alpha=NULL, pval.col = c("qvalue", "Var1", "Var2"))

## Get the data from mictools repository

lnf <- "https://raw.githubusercontent.com/minepy/mictools/master/examples/datasaurus.txt"
datasaurus <- read.table(lnf, header=TRUE, row.names = 1, stringsAsFactors = FALSE)
datasaurus <- t(datasaurus)
ticenull <- mictools(datasaurus, nperm=200000)
micres <- mic_strength(mydata, ticenull$pval, pval.col=c(6, 4, 5))

## Plot distribution of pvalues
hist(ticenull$pval, breaks=50, freq=FALSE)

## Plot distribution of tic_e values
hist(ticenull$tic)

## Correct pvalues using qvalue package
require(qvalue)
require(ggplot2)
qobj <- qvalue(ticenull$pval$pval)
ticenull$pval$qvalue <- qobj$qvalue
micres <- mic_strength(datasaurus, ticenull$pval, pval.col=c("qvalue", "Var1", "Var2"))

hist(qobj$qvalue)

df <- data.frame(pi0.labmda=qobj$pi0.lambda, lambda=qobj$lambda, pi0.smooth=qobj$pi0.smooth)
gp0 <- ggplot(df, aes(lambda, pi0.labmda)) + geom_point() 
gp0 <- gp0 + geom_line(aes(lambda, pi0.smooth))
gp0 <- gp0 + geom_hline(yintercept = qobj$pi0, linetype="dashed", col="red")
# }

Run the code above in your browser using DataLab