Learn R Programming

bigstatsr (version 1.6.1)

big_colstats: Standard univariate statistics

Description

Standard univariate statistics for columns of a Filebacked Big Matrix. For now, the sum and var are implemented (the mean and sd can easily be deduced, see examples).

Usage

big_colstats(X, ind.row = rows_along(X), ind.col = cols_along(X), ncores = 1)

Value

Data.frame of two numeric vectors sum and var with the corresponding column statistics.

Arguments

X

An object of class FBM.

ind.row

An optional vector of the row indices that are used. If not specified, all rows are used. Don't use negative indices.

ind.col

An optional vector of the column indices that are used. If not specified, all columns are used. Don't use negative indices.

ncores

Number of cores used. Default doesn't use parallelism. You may use nb_cores.

See Also

Examples

Run this code
set.seed(1)

X <- big_attachExtdata()

# Check the results
str(test <- big_colstats(X))

# Only with the first 100 rows
ind <- 1:100
str(test2 <- big_colstats(X, ind.row = ind))
plot(test$sum, test2$sum)
abline(lm(test2$sum ~ test$sum), col = "red", lwd = 2)

X.ind <- X[ind, ]
all.equal(test2$sum, colSums(X.ind))
all.equal(test2$var, apply(X.ind, 2, var))

# deduce mean and sd
# note that the are also implemented in big_scale()
means <- test2$sum / length(ind) # if using all rows,
                                 # divide by nrow(X) instead
all.equal(means, colMeans(X.ind))
sds <- sqrt(test2$var)
all.equal(sds, apply(X.ind, 2, sd))

Run the code above in your browser using DataLab