Learn R Programming

bigstatsr (version 0.6.2)

big_tcrossprodSelf: Tcrossprod

Description

Compute \(X.row X.row^T\) for a Filebacked Big Matrix X after applying a particular scaling to it.

Usage

big_tcrossprodSelf(X, fun.scaling = big_scale(center = FALSE, scale =
  FALSE), ind.row = rows_along(X), ind.col = cols_along(X),
  block.size = block_size(nrow(X)))

Arguments

X

A FBM.

fun.scaling

A function that returns a named list of mean and sd for every column, to scale each of their elements such as followed: $$\frac{X_{i,j} - mean_j}{sd_j}.$$ Default doesn't use any scaling.

ind.row

An optional vector of the row indices that are used. If not specified, all rows are used. Don't use negative indices.

ind.col

An optional vector of the column indices that are used. If not specified, all columns are used. Don't use negative indices.

block.size

Maximum number of columns read at once. Default uses block_size.

Value

A temporary FBM, with the following two attributes:

  • a numeric vector center of column scaling,

  • a numeric vector scale of column scaling.

Matrix parallelization

Large matrix computations (crossprods) are made block-wise and won't be parallelized in order to not have to reduce the size of these blocks. Instead, you may use Microsoft R Open in order to accelerate these block matrix computations.

See Also

tcrossprod

Examples

Run this code
# NOT RUN {
X <- big_attachExtdata()

# Comparing with tcrossprod
big_noscale <- big_scale(center = FALSE)
K <- big_tcrossprodSelf(X, fun.scaling = big_noscale)
class(K)
dim(K)
K$backingfile

true <- tcrossprod(X[])
all.equal(K[], true)

# Using only half of the data
n <- nrow(X)
ind <- sort(sample(n, n/2))
K2 <- big_tcrossprodSelf(X, fun.scaling = big_noscale, ind.row = ind)

true2 <- tcrossprod(X[ind, ])
all.equal(K2[], true2)
# }

Run the code above in your browser using DataLab