mctrace: Compute trace of a large matrix by sample means

Description

Some matrices are too large to be represented as a matrix, even as a sparse matrix. Nevertheless, it can be possible to compute the matrix vector product fairly easy, and this is utilized to estimate the trace of the matrix.

Usage

mctrace(mat, N, tol = 0.001, maxsamples = Inf, trname = "", init)

Value

An estimate of the trace of the matrix represented by mat is returned.

Arguments

mat: square matrix, Matrix, function or list of factors.
N: integer. if mat is a function, the size of the matrix is specified here.
tol: numeric. Tolerance.
maxsamples: numeric. Maximum number of samples in the expectation estimation.
trname: character. Arbitrary name used in progress reports.
init: numeric. Initial guess for the trace.

Details

mctrace is used internally by fevcov and bccorr, but has been made public since it might be useful for other tasks as well.

For any matrix \(A\), the trace equals the sum of the diagonal elements, or the sum of the eigenvalues. However, if the size of the matrix is very large, we may not have a matrix representation, so the diagonal is not immediately available. In that case we can use the formula \(tr(A) = E(x^t A x)\) where \(x\) is a random vector with zero expectation and \(Var(x) = I\). We estimate the expectation with sample means. mctrace draws \(x\) in \(\{-1,1\}^N\), and evaluates mat on these vectors.

If mat is a function, it must be able to take a matrix of column vectors as input. Since \(x^t A x = (Ax,x)\) is evaluated, where \((\cdot,\cdot)\) is the Euclidean inner product, the function mat can perform this inner product itself. In that case the function should have an attribute attr(mat,'IP') <- TRUE to signal this.

If mat is a list of factors, the matrix for which to estimate the trace, is the projection matrix which projects out the factors. I.e. how many dimensions are left when the factors have been projected out. Thus, it is possible to estimate the degrees of freedom in an OLS where factors are projected out.

The tolerance tol is a relative tolerance. The iteration terminates when the normalized standard deviation of the sample mean (s.d. divided by absolute value of the current sample mean) goes below tol. Specify a negative tol to use the absolute standard deviation. The tolerance can also change during the iterations; you can specify tol=function(curest) {...} and return a tolerance based on the current estimate of the trace (i.e. the current sample mean).

Examples

Run this code


  A <- matrix(rnorm(25),5)
  fun <- function(x) A %*% x
  sum(diag(A))
  sum(eigen(A,only.values=TRUE)$values)
  # mctrace is not really useful for small problems.
  mctrace(fun,ncol(A),tol=0.05)
  # try a larger problem (3000x3000):
  f1 <- factor(sample(1500,3000,replace=TRUE))
  f2 <- factor(sample(1500,3000,replace=TRUE))
  fl <- list(f1,f2)
  mctrace(fl,tol=-5)
  # exact:
  length(f1) - nlevels(f1) - nlevels(f2) + nlevels(compfactor(fl))

Run the code above in your browser using DataLab