mimr: mIMR (minimum Interaction max Relevance) filter

Description

Filter based on information theory which aims to prioritise direct causal relationships in feature selection problems where the ratio between the number of features and the number of samples is high. The approach is based on the notion of interaction which is informative about the relevance of an input subset as well as its causal relationship with the target.

Usage

mimr(X, Y, nmax = 5, init = FALSE, lambda = 0.5, spouse.removal = TRUE, caus = 1)

Arguments

: input matrix

: output vector

nmax

: number of returned features

init

: if TRUE it makes a search in the space of pairs of features to initialize the ranking, otherwise the first ranked feature is the one with the highest mutual information with the output

lambda

: weight $0 \le \lambda \le 1$ of the interaction term

spouse.removal

: TRUE OR FALSE. if TRUE it removes the spouses before ranking

caus

: if caus =1 it prioritizes causes otherwise (caus=-1) it prioritizes effects

Value

ranked vector of nmax indices of features

References

Bontempi G., Meyer P.E. (2010) Causal filter selection in microarray data. ICML10

Examples

Run this code

set.seed(0)
N<-500
n<-5
X<-array(rnorm(N*n),c(N,n))
Y<-X[,1]-3*X[,3]+4*X[,2]+rnorm(N,sd=0.5)
Z1<-Y+rnorm(N,sd=0.5)
## effect 1
Z2<-2*Y+rnorm(N,sd=0.5)
## effect 2
most.probable.causes<-mimr(cbind(X,Z1,Z2),Y,nmax=3,init=TRUE,spouse=FALSE,lambda=1)
## causes are in the first three columns of the feature dataset
most.probable.effects<-mimr(cbind(X,Z1,Z2),Y,nmax=3,init=TRUE,spouse=FALSE,lambda=1,caus=-1)
## effects are in the last two columns of the feature dataset

Run the code above in your browser using DataLab