snm: Perform a supervised normalization of microarray data

Description

Implement Supervised Normalization of Microarrays on a gene expression matrix. Requires a set of biological covariates of interest and at least one probe-specific or intensity-dependent adjustment variable.

Usage

snm(raw.dat, bio.var=NULL, adj.var=NULL, int.var=NULL, weights=NULL, spline.dim = 4, num.iter = 10, lmer.max.iter=1000, nbins=20, rm.adj=FALSE, verbose=TRUE, diagnose=TRUE)

Arguments

raw.dat

An $m$ probes by $n$ arrays matrix of expression data. If the user wishes to remove intensity-dependent effects, then we request the matrix corresponds to the raw, log transformed data.

bio.var

A model matrix (see model.matrix) or data frame with $n$ rows of the biological variables. If NULL, then all probes are treated as "null" in the algorithm.

adj.var

A model matrix (see model.matrix) or data frame with $n$ rows of the probe-specific adjustment variables. If NULL, a model with an intercept term is used.

int.var

A data frame with $n$ rows of type factor with the unique levels of intensity-dependent effects. Each column parametrizes a unique source of intensity-dependent effect (e.g., array effects for column 1 and dye effects for column 2).

weights

A vector of length $m$. Values unchanged by algorithm, used to control the influence of each probe on the intensity-dependent array effects.

spline.dim

Dimension of basis spline used for array effects.

num.iter

Number of snm model fit iterations to run.

lmer.max.iter

Number of lmer iterations that are permitted. Set lmer.max.iter=NULL if no maximum is desired.

nbins

Number of bins used by binning strategy. Array effects are calculated from a $nbins$ x $n$ data matrix, where the $(i,j)$ value is equal to that bin $i$'s average intensity on array $j$.

rm.adj

If set to FALSE, then only the intensity dependent effects have been removed from the normalized data, implying the effects from the adjustment variables are still present. If TRUE, then the adjustment variables effects and the intensity dependent effects are both removed from the returned normalized data.

verbose

A flag telling the software whether or not to display a report after each iteration. TRUE produces the output.

diagnose

A flag telling the software whether or not to produce diagnostic output in the form of consecutive plots. TRUE produces the plot.

Value

norm.dat: The matrix of normalized data. The default setting is rm.adj=FALSE, which means that only the intensity-dependent effects have been subtracted from the data. If the user wants the adjustment variable effects removed as well, then set rm.adj=TRUE when calling the snm function.
pvalues: A vector of p-values testing the association of the biological variables with each probe. These p-values are obtained from an ANOVA comparing models where the full model contains both the probe-specific biological and adjustment variables versus a reduced model that just contains the probe-specific adjustment variables. The data used for this comparison has the intensity-dependent variables removed. These returned p-values are calculated after the final iteration of the algorithm.
pi0: The estimated proportion of true null probes $pi_0$, calculated after the final iteration of the algorithm.
iter.pi0s: A vector of length equal to num.iter containing the estimated $pi_0$ values at each iteration of the snm algorithm. These values should converge and any non-convergence suggests a problem with the data, the assumed model, or both
nulls: A vector indexing the probes utilized in estimating the intensity-dependent effects on the final iteration.
M: A matrix containing the estimated probe intensities for each array utilized in estimating the intensity-dependent effects on the final iteration. For memory parsimony, only a subset of values spanning the range is returned, currently nbins*100 values.
array.fx: A matrix of the final estimated intensity-dependent array effects. For memory parsimony, only a subset of values spanning the range is returned, currently nbins*100 values.
bio.var: The processed version of the same input variable.
adj.var: The processed version of the same input variable.
int.var: The processed version of the same input variable.
df0: Degrees of freedom of the adjustment variables.
df1: Degrees of freedom of the full model matrix, which includes the biological variables and the adjustment variables.
raw.dat: The input data.
rm.var: Same as the input (useful for later analyses).
call: Function call.

Details

This function implements the supervised normalization of microarrays algorithm described in Mecham, Nelson, and Storey (2010).

References

Mecham BH, Nelson PS, Storey JD (2010) Supervised normalization of microarrays. Bioinformatics, 26: 1308-1315.

Examples

Run this code

singleChannel <- sim.singleChannel(12345)
snm.obj <- snm(singleChannel$raw.data,
		      singleChannel$bio.var,
		      singleChannel$adj.var,
		      singleChannel$int.var)
ks.test(snm.obj$pval[singleChannel$true.nulls],"punif")
plot(snm.obj)
summary(snm.obj)
snm.fit = fitted(snm.obj)

Run the code above in your browser using DataLab