Learn R Programming

minerva

R package for Maximal Information-Based Nonparametric Exploration computation

Install

  • Latest cran release
install.packages("minerva")
  • Development version
devtools::install_github('filosi/minerva')

Usage

  • Basic usage with helper function mine.
library(minerva)

x <- 0:200 / 200
y <- sin(10 * pi * x) + x
mine(x,y, n.cores=1)
  • Compute a single measure from the MINE suite using mine_stat.
    • Available mesures are: mic, mas, mev, mcn, tic, gmic
x <- 0:200 / 200
y <- sin(10 * pi * x) + x
mine_stat(x, y, measure="mic")
  • To compute the mic-r2 measure use the cor R function:
x <- 0:200 / 200
y <- sin(10 * pi * x) + x

r2 <- cor(x, y)
mm <- mine_stat(x, y, measure="mic")
mm - r2**2

## mine(x, y, n.cores=1)[[5]]

Compute statistic on matrices

  • All features in a single matrix (mine_compute_pstat).
  • All possible combination of features between two matrices (mine_compute_cstat).
    • When comparing two matrices the function check for euquality of number of rows of the two matrices. If the matrices have different number of rows then an error is thrown.
x <- matrix(rnorm(1000), ncol=10, nrow=10)
y <- as.matrix(rnorm(1000), ncol=10, nrow=20)

## Compare feature of the same matrix
pstats(x)

## Compare features of matrix x with feature in matrix y
cstats(x, y)

Mictools pipeline

This is inspired to the original implementation by Albanese et al. available in python here: https://github.com/minepy/mictools.

Reading the data from mictool repository

datasaurus <- read.table("https://raw.githubusercontent.com/minepy/mictools/master/examples/datasaurus.txt", 
header=TRUE, row.names=1, as.is=TRUE, stringsAsFactors=FALSE)
datasaurus.m <- t(datasaurus)

Compute null distribution for tic_e

Automatically compute:

  • tic_e null distribution based on permutations.
  • histogram of the distribution with cumulative distribution.
  • Observed values of tic_e for each pair of variable in datasaurus.
  • Observed distribution of tic_e.
  • P-value for each variable pair association.
ticnull <- mictools(datasaurus.m, nperm=10000, seed=1234)

## Get the names of the named list
names(ticnull)
##[1]  "tic"      "nulldist" "obstic"   "obsdist"  "pval"

Null Distribution
ticnull$nulldist
BinStartBinEndNullCountNullCumSum
0e+001e-0401e+05
1e-042e-0401e+05
2e-043e-0401e+05
3e-044e-0401e+05
4e-045e-0401e+05
5e-046e-0401e+05
..............
Observed distribution
ticnull$obsdist
BinStartBinEndCountCountCum
0e+001e-040325
1e-042e-040325
2e-043e-040325
3e-044e-040325
4e-045e-040325
5e-046e-040325
..............

Plot tic_e and pvalue distribution.

hist(ticnull$tic)

hist(ticenull$pval, breaks=50, freq=FALSE)

Use p.adjust.method to use a different pvalue correction method, or use the qvalue package to use Storey's qvalue.

## Correct pvalues using qvalue
qobj <- qvalue(ticnull$pval$pval)

## Add column in the pval data.frame
ticnull$pval$qvalue <- qobj$qvalue
ticnull$pval

Same table as above with the qvalue column added at the end.

pvalI1I2Var1Var2adj.P.Valqvalue
0.520212away_xbullseye_x0.951
0.953313away_xcircle_x0.991
0.044214away_xdino_x0.520
0.621915away_xdots_x0.951
0.892216away_xh_lines_x0.981
0.397217away_xhigh_lines_x0.911
......................

Strenght of the association (MIC)

## Use columns of indexes and FDR adjusted pvalue 
micres <- mic_strength(datasaurus.m, ticnull$pval, pval.col=c(6, 2, 3))
TicePvalMICI1I2
0.04570.42215
0.00000.63316
0.01960.50518
0.01620.36922
0.00000.631023
0.00000.571326
............

Association strength computed based on the qvalue adjusted pvalue

## Use qvalue adjusted pvalue 
micresq <- mic_strength(datasaurus.m, ticnull$pval, pval.col=c("qvalue", "Var1", "Var2"))
TicePvalMICI1I2
0.04010.42bullseye_xbullseye_y
0.00000.63circle_xcircle_y
0.01720.50dots_xdots_y
0.01430.36slant_up_xslant_up_y
0.00000.63star_xstar_y
0.00000.57x_shape_xx_shape_y
............

Citing minepy/minerva and mictools

minepy2013Davide Albanese, Michele Filosi, Roberto Visintainer, Samantha Riccadonna, Giuseppe Jurman and Cesare Furlanello. minerva and minepy:a C engine for the MINE suite and its R, Python and MATLAB wrappers. Bioinformatics (2013) 29(3): 407-408 first published online December 14, 2012
mictools2018Davide Albanese, Samantha Riccadonna, Claudio Donati, Pietro Franceschi. A practical tool for maximal information coefficient analysis. GigaScience (2018)

Copy Link

Version

Install

install.packages('minerva')

Monthly Downloads

1,133

Version

1.5.10

License

GPL-3

Maintainer

Last Published

June 17th, 2021

Functions in minerva (1.5.10)

mictools_null

This set of functions are helper function to compute null distribution of the tic_e and tic_e observed distribution from a matrix
minerva-package

The minerva package
mic_strength

Compute the association strengh
cstats

Compute statistics (MIC and normalized TIC) between each pair of the two collections of variables (convenience function). If n and m are the number of variables in X and Y respectively, then the statistic between the (row) i (for X) and j (for Y) is stored in mic[i, j] and tic[i, j].
mine

MINE family statistics Maximal Information-Based Nonparametric Exploration (MINE) statistics. mine computes the MINE family measures between two variables.
mictools

Function that implements the mictools pipeline. In particular it computes the null and observed distribution of the tic_e measure
mine_stat

This is an helper function to compute one mine statistic. It take two vectors of the same dimension as an input.
pstats

Compute pairwise statistics (MIC and normalized TIC) between variables (convenience function).
Spellman

CDC15 Yeast Gene Expression Dataset