Learn R Programming

MiPP (version 1.44.0)

mipp: MiPP-based Classification

Description

Finds optimal sets of genes for classification

Usage

mipp(x, y, x.test = NULL, y.test = NULL, probe.ID = NULL, rule = "lda", method.cut = "t.test", percent.cut = 0.01, model.sMiPP.margin = 0.01, min.sMiPP = 0.85, n.drops = 2, n.fold = 5, p.test = 1/3, n.split = 20, n.split.eval = 100)

Arguments

x
data matrix
y
class vector
x.test
test data matrix if available
y.test
test class vector if available
probe.ID
probe set IDs; if NULL, row numbers are assigned.
rule
classification rule: "lda","qda","logistic","svmlin","svmrbf"; the default is "lda".
method.cut
method for pre-selection; t-test is available.
percent.cut
proportion of pre-selected genes; the default is 0.01.
model.sMiPP.margin
smallest set of genes s.t. sMiPP
min.sMiPP
Adding genes stops if max sMiPP is at least min.sMiPP; the default is 0.85.
n.drops
Adding genes stops if sMiPP decreases (n.drops) times, in addition to min.sMiPP criterion.; the default is 2.
n.fold
number of folds; default is 5.
p.test
partition percent of train and test samples when test samples are not available; the default is 1/3 for test set.
n.split
number of splits; the default is 20.
n.split.eval
numbr of splits for evalutation; the default is 100.

Value

model
candiadate genes (for each split if no indep set is available
model.eval
Optimal sets of genes for each split when no indep set is available

References

Soukup M, Cho H, and Lee JK (2005). Robust classification modeling on microarray data using misclassification penalized posterior, Bioinformatics, 21 (Suppl): i423-i430.

Soukup M and Lee JK (2004). Developing optimal prediction models for cancer classification using gene expression data, Journal of Bioinformatics and Computational Biology, 1(4) 681-694

Examples

Run this code

##########
#Example 1: When an independent test set is available

data(leukemia)

#Normalize combined data
leukemia <- cbind(leuk1, leuk2)
leukemia <- mipp.preproc(leukemia, data.type="MAS4")

#Train set
x.train <- leukemia[,1:38]
y.train <- factor(c(rep("ALL",27),rep("AML",11)))

#Test set
x.test <- leukemia[,39:72]
y.test <- factor(c(rep("ALL",20),rep("AML",14)))


#Compute MiPP
out <- mipp(x=x.train, y=y.train, x.test=x.test, y.test=y.test, probe.ID = 1:nrow(x.train), n.fold=5, percent.cut=0.05, rule="lda")

#Print candidate models
out$model



##########
#Example 2: When an independent test set is not available

data(colon)

#Normalize data
x <- mipp.preproc(colon)
y <- factor(c("T", "N", "T", "N", "T", "N", "T", "N", "T", "N", 
       "T", "N", "T", "N", "T", "N", "T", "N", "T", "N",
       "T", "N", "T", "N", "T", "T", "T", "T", "T", "T", 
       "T", "T", "T", "T", "T", "T", "T", "T", "N", "T", 
       "T", "N", "N", "T", "T", "T", "T", "N", "T", "N", 
       "N", "T", "T", "N", "N", "T", "T", "T", "T", "N", 
       "T", "N"))


#Deleting comtaminated chips
x <- x[,-c(51,55,45,49,56)]
y <- y[ -c(51,55,45,49,56)]

#Compute MiPP
out <- mipp(x=x, y=y, probe.ID = 1:nrow(x), n.fold=5, p.test=1/3, n.split=5, n.split.eval=100, 
percent.cut= 0.1, rule="lda")

#Print candidate models for each split
out$model

#Print optimal models and independent evaluation for each split
out$model.eval

Run the code above in your browser using DataLab