RUVrinv: Remove Unwanted Variation, ridged inverse method

Description

The RUV-rinv algorithm. Estimates and adjusts for unwanted variation using negative controls.

Usage

RUVrinv(Y, X, ctl, Z=1, eta=NULL, include.intercept=TRUE,
        fullW0=NULL, invsvd=NULL, lambda=NULL, k=NULL, l=NULL,
        randomization=FALSE, iterN=100000, inputcheck=TRUE)

Arguments

The data. A m by n matrix, where m is the number of samples and n is the number of features.

The factor(s) of interest. A m by p matrix, where m is the number of samples and p is the number of factors of interest. Very often p = 1. Factors and dataframes are also permissible, and converted to a matrix by design.matrix.

ctl

An index vector to specify the negative controls. Either a logical vector of length n or a vector of integers.

Any additional covariates to include in the model, typically a m by q matrix. Factors and dataframes are also permissible, and converted to a matrix by design.matrix. Alternatively, may simply be 1 (the default) for an intercept term. May also be NULL.

eta

Gene-wise (as opposed to sample-wise) covariates. These covariates are adjusted for by RUV-1 before any further analysis proceeds. Can be either (1) a matrix with n columns, (2) a matrix with n rows, (3) a dataframe with n rows, (4) a vector or factor of length n, or (5) simply 1, for an intercept term.

include.intercept

Applies to both Z and eta. When Z or eta (or both) is specified (not NULL) but does not already include an intercept term, this will automatically include one. If only one of Z or eta should include an intercept, this variable should be set to FALSE, and the intercept term should be included manually where desired.

fullW0

Can be included to speed up execution. Is returned by previous calls of RUV4, RUVinv, or RUVrinv (see below).

invsvd

Can be included to speed up execution. Generally used when calling RUV(r)inv many times with different values of lambda. Is returned by previous calls of RUV(r)inv (see below).

lambda

Ridge parameter. If unspecified, an appropriate default will be used.

When calculating the default value of lambda, a call to RUV4 is made. This parameter specifies the value of k to use. Otherwise, an appropriate default k will be used.

If lambda and k are both NULL, then k must be estimated using the getK routine. The getK routine only accepts a single-column X. If p > 1, l specifies which column of X should be used in the getK routine.

randomization

Whether the inverse-method variances should be computed using randomly generated factors of interest (as opposed to a numerical integral).

iterN

The number of random "factors of interest" to generate (used only when randomization=TRUE).

inputcheck

Perform a basic sanity check on the inputs, and issue a warning if there is a problem.

Value

A list containing

betahat

The estimated coefficients of the factor(s) of interest. A p by n matrix.

sigma2

Estimates of the features' variances. A vector of length n.

t statistics for the factor(s) of interest. A p by n matrix.

P-values for the factor(s) of interest. A p by n matrix.

Fstats

F statistics for testing all of the factors in X simultaneously.

Fpvals

P-values for testing all of the factors in X simultaneously.

multiplier

The constant by which sigma2 must be multiplied in order get an estimate of the variance of betahat

The number of residual degrees of freedom.

The estimated unwanted factors.

alpha

The estimated coefficients of W.

byx

The coefficients in a regression of Y on X (after both Y and X have been "adjusted" for Z). Useful for projection plots.

bwx

The coefficients in a regression of W on X (after X has been "adjusted" for Z). Useful for projection plots.

X. Included for reference.

k. Included for reference.

ctl

ctl. Included for reference.

Z. Included for reference.

eta

eta. Included for reference.

fullW0

Can be used to speed up future calls of RUV4.

lambda

lambda. Included for reference.

invsvd

Can be used to speed up future calls of RUV(r)inv.

include.intercept

include.intercept. Included for reference.

method

Character variable with value "RUVinv". Included for reference. (Note that RUVrinv is simply a wrapper to RUVinv, hence both return "RUVinv" as the method.)

Details

Implements the RUV-rinv algorithm as described in Gagnon-Bartsch, Jacob, and Speed (2013). This function is essentially just a wrapper to RUVinv, but with a little extra code to calculate the default value of lambda.

References

Using control genes to correct for unwanted variation in microarray data. Gagnon-Bartsch and Speed, 2012. Available at: http://biostatistics.oxfordjournals.org/content/13/3/539.full.

Removing Unwanted Variation from High Dimensional Data with Negative Controls. Gagnon-Bartsch, Jacob, and Speed, 2013. Available at: http://statistics.berkeley.edu/tech-reports/820.

Examples

Run this code

# NOT RUN {
## Create some simulated data
m = 50
n = 10000
nc = 1000
p = 1
k = 20
ctl = rep(FALSE, n)
ctl[1:nc] = TRUE
X = matrix(c(rep(0,floor(m/2)), rep(1,ceiling(m/2))), m, p)
beta = matrix(rnorm(p*n), p, n)
beta[,ctl] = 0
W = matrix(rnorm(m*k),m,k)
alpha = matrix(rnorm(k*n),k,n)
epsilon = matrix(rnorm(m*n),m,n)
Y = X%*%beta + W%*%alpha + epsilon

## Run RUV-rinv
fit = RUVrinv(Y, X, ctl)

## Get adjusted variances and p-values
fit = variance_adjust(fit)
# }

Run the code above in your browser using DataLab