vsn2
.
The function vsn
remains in the package for backward
compatibility, but for new projects, please use vsn2
.vsn(intensities, lts.quantile = 0.5, verbose = interactive(), niter = 10, cvg.check = NULL, describe.preprocessing = TRUE, subsample, pstart, strata)
vsnh
for details.preprocessing
slot of the returned object. See details.strata
is not specified, one pair of parameters is fitted
for every sample (i.e. for every column of intensities
). If
strata
is specified, a pair of parameters is fitted for every
stratum within every sample. The strata are coded for by the different
integer values. The integer vector strata
can be obtained
from a factor fac
through as.integer(fac)
, from
a character vector str
through as.integer(factor(fac))
.ExpressionSet
.
Differences between the columns of the transformed intensities are
"generalized log-ratios", which are shrinkage estimators of the natural
logarithm of the fold change. For the transformation parameters,
please see the Details.
vsnh
. The parameters are estimated through
a robust variant of maximum likelihood. This assumes that for
the majority of genes the expression levels are not much different
across the samples, i.e., that only a minority of genes (less than
a fraction 1-lts.quantile
) is differentially expressed. Even if most genes on an array are differentially expressed, it may still
be possible to use the estimator: if a set of non-differentially expressed
genes is known, e.g. because they are external controls or reliable
'house-keeping genes', the transformation parameters can be fitted with
vsn
from the data of these genes, then the transformation can be
applied to all data with vsnh
.
Format: The format of the matrix of intensities is as follows:
for the two-color printed array technology, each row
corresponds to one spot, and the columns to the different arrays
and wave-lengths (usually red and green, but could be any number).
For example, if there are 10 arrays, the matrix would have 20 columns,
columns 1...10 containing the green intensities, and 11...20 the
red ones. In fact, the ordering of the columns does not matter to
vsn
, but it is your responsibility to keep track of it for
subsequent analyses.
For one-color arrays, each row corresponds to a probe, and each
column to an array.
Performance: This function is slow. That is due to the nested
iteration loops of the numerical optimization of the likelihood function
and the heuristic that identifies the non-outlying data points in the
least trimmed squares regression. For large arrays with many tens of
thousands of probes, you may want to consider random subsetting: that is,
only use a subset of the e.g. 10-20,000 rows of the data matrix
intensities
to fit the parameters, then apply the transformation
to all the data, using vsnh
. An example for this can be
seen in the function normalize.AffyBatch.vsn
, whose code
you can inspect by typing normalize.AffyBatch.vsn
on the R
command line.
Iteration control:
By default, if cvg.check
is NULL
, the function will run
the fixed number niter
of iterations in the least trimmed sum
of squares regression. More fine-grained control can be obtained by
passing a list with elements eps
and n
. If the maximum
change between transformed data values is smaller than eps
for
n
subsequent iterations, then the iteration terminates.
Estimated transformation parameters:
If describe.preprocessing
is TRUE
, the transformation
parameters are returned in the preprocessing
slot of the
experimentData
slot of the resulting
ExpressionSet
object, in the form
of a list
with three elements
vsnParams
: the parameter array (see vsnh
for details)
vsnParamsIter
: an array with dimensions
c(dim(vsnParams, niter))
that contains the parameter
trajectory during the iterative fit process (see also
vsnPlotPar
).
vsnTrimSelection
: a logical vector that for
each row of the intensities matrix reports whether it was below
(TRUE) or above (FALSE) the trimming threshold.
If intensities
has class
ExpressionSet
,
and its experimentData
slot has class
MIAME
, then this list is appended to any
existing entries in the preprocessing
slot. Otherwise, the
experimentData
object and its preprocessing
slot are created.
Parameter estimation for the calibration and variance stabilization of microarray data, Wolfgang Huber, Anja von Heydebreck, Holger Sueltmann, Annemarie Poustka, and Martin Vingron; Statistical Applications in Genetics and Molecular Biology (2003) Vol. 2 No. 1, Article 3. http://www.bepress.com/sagmb/vol2/iss1/art3.
vsnh
, vsnPlotPar
,
ExpressionSet-class
,
MIAME-class
,
normalize.AffyBatch.vsn
data(kidney)
log.na = function(x) log(ifelse(x>0, x, NA))
plot(log.na(exprs(kidney)), pch=".", main="log-log")
vsnkid = vsn(kidney) ## transform and calibrate
plot(exprs(vsnkid), pch=".", main="h-h")
meanSdPlot(vsnkid)
## this should always hold true
params = preproc(description(vsnkid))$vsnParams
stopifnot(all(vsnh(exprs(kidney), params) == exprs(vsnkid)))
Run the code above in your browser using DataLab