bapred-package: The bapred package

Description

This is a short summary of the features of bapred, a package in R for the treatment and analysis of batch effects based in high-dimensional molecular data via batch effect adjustment and addon quantile normalization. Here, a special focus is set on phenotype prediction in the presence of batch effects.

Arguments

Details

Various tools dealing with batch effects, in particular enabling the removal of discrepancies between training and test sets in prediction scenarios. Moreover, addon quantile normalization and addon RMA normalization (Kostka & Spang, 2008) is implemented to enable integrating the quantile normalization step into prediction rules. The following batch effect removal methods are implemented: FAbatch, ComBat, (f)SVA, mean-centering, standardization, Ratio-A and Ratio-G. For each of these we provide an additional function which enables a posteriori ('addon') batch effect removal in independent batches ('test data'). Here, the (already batch effect adjusted) training data is not altered. For evaluating the success of batch effect adjustment several metrics are provided. Moreover, the package implements a plot for the visualization of batch effects using principal component analysis. The main functions of the package for batch effect adjustment are ba() and baaddon() which enable batch effect removal and addon batch effect removal, respectively, with one of the seven methods mentioned above. Another important function here is bametric() which is a wrapper function for all implemented methods for evaluating the success of batch effect removal. For (addon) quantile normalization and (addon) RMA normalization the functions qunormtrain(), qunormaddon(), rmatrain() and rmaaddon() can be used.

References

Hornung, R., Boulesteix, A.-L., Causeur, D. (2016). Combining location-and-scale batch effect adjustment with data cleaning by latent factor adjustment. BMC Bioinformatics 17:27, <10.1186/s12859-015-0870-z>.

Hornung, R., Causeur, D., Bernau, C., Boulesteix, A.-L. (2017). Improving cross-study prediction through addon batch effect adjustment and addon normalization. Bioinformatics 33(3):397<U+2013>404, <10.1093/bioinformatics/btw650>.

Examples

Run this code

# NOT RUN {
# Load example dataset:

data(autism)



# Subset this example dataset to reduce the
# computational burden of the toy examples:

# Random subset of 150 variables:
set.seed(1234)
Xsub <- X[,sample(1:ncol(X), size=150)]

# In cases of batches with more than 20 observations
# select 20 observations at random:
subinds <- unlist(sapply(1:length(levels(batch)), function(x) {
  indbatch <- which(batch==x)
  if(length(indbatch) > 20)
    indbatch <- sort(sample(indbatch, size=20))
  indbatch
}))
Xsub <- Xsub[subinds,]
batchsub <- batch[subinds]
ysub <- y[subinds]



# Split into training and test sets:

trainind <- which(batchsub %in% c(1,2))

Xsubtrain <- Xsub[trainind,]
ysubtrain <- ysub[trainind]
batchsubtrain <- factor(as.numeric(batchsub[trainind]), levels=c(1,2))


testind <- which(batchsub %in% c(3,4))

Xsubtest <- Xsub[testind,]
ysubtest <- ysub[testind]

batchsubtest <- as.numeric(batchsub[testind])
batchsubtest[batchsubtest==3] <- 1
batchsubtest[batchsubtest==4] <- 2
batchsubtest <- factor(batchsubtest, levels=c(1,2))



# Batch effect adjustment:

combatparams <- ba(x=Xsubtrain, y=ysubtrain, batch=batchsubtrain, 
  method = "combat")
Xsubtraincombat <- combatparams$xadj

meancenterparams <- ba(x=Xsubtrain, y=ysubtrain, batch=batchsubtrain, 
  method = "meancenter")
Xsubtrainmeancenter <- meancenterparams$xadj



# Addon batch effect adjustment:

Xsubtestcombat <- baaddon(params=combatparams, x=Xsubtest, 
  batch=batchsubtest)

Xsubtestmeancenter <- baaddon(params=meancenterparams, x=Xsubtest, 
  batch=batchsubtest)



# Metrics for evaluating the success of batch effect adjustment:

bametric(xba=Xsubtrain, batch=batchsubtrain, metric = "sep")
bametric(xba=Xsubtrainmeancenter, batch=batchsubtrain, metric = "sep")

bametric(x=Xsubtrain, batch=batchsubtrain, y=ysubtrain, 
  metric = "diffexpr", method = "meancenter")

bametric(xba=Xsubtrainmeancenter, x=Xsubtrain, metric = "cor")



# Principal component analysis plots for the visualization 
# of batch effects:

par(mfrow=c(1,3))
pcplot(x=Xsub, batch=batchsub, y=ysub, alpha=0.25, main="alpha = 0.25")
pcplot(x=Xsub, batch=batchsub, y=ysub, alpha=0.75, main="alpha = 0.75")
pcplot(x=Xsub, batch=batchsub, y=ysub, col=1:length(unique(batchsub)), 
  main="col = 1:length(unique(batchsub))")
par(mfrow=c(1,1))



# (Addon) quantile normalization:

qunormparams <- qunormtrain(x=Xsubtrain)

Xtrainnorm <- qunormparams$xnorm

Xtestaddonnorm <- qunormaddon(qunormparams, x=Xsubtest)
# }

Run the code above in your browser using DataLab