Learn R Programming

bootSVD

The R package bootSVD can be used to implement fast, exact bootstrap principal component analysis and singular value decompositions for high dimensional data, where the number of measurements per subject is much larger than the number of subjects. This package is based on the methodology outlined by Fisher et al. (2014), who demonstrate the method on a dataset of 352 brain magnetic resonace images (MRIs), with approximately 3 million measurements per subject.

The primary function in this package is the bootSVD function, for which we include a documented example based on simulated sleep electroencephalogram (EEG) data. When the data is too large to store in memory, functions in this package can also be applied to objects of class ff. These ff objects have a representation in memory, but store their primary contents on disk (see the ff package).

Speed improvements are driven by the fact that sample size (n) is much less than sample dimension, which allows a n-dimensional representation of the sample to be sufficient for many calculations.

To install:

## if needed
install.packages("devtools")

## main package
library(devtools)
install_github('aaronjfisher/bootSVD')

library(bootSVD)

## to access help pages
help(package=bootSVD)
?bootSVD

References:

Aaron Fisher, Brian Caffo, and Vadim Zipunnikov. Fast, Exact Bootstrap Principal Component Analysis for p>1 million. Working Paper, 2014. http://arxiv.org/abs/1405.0922

Copy Link

Version

Install

install.packages('bootSVD')

Monthly Downloads

338

Version

1.1

License

GPL-2

Maintainer

Last Published

February 2nd, 2021

Functions in bootSVD (1.1)

qrSVD

Wrapper for svd, which uses random preconditioning to restart when svd fails to converge
os

Quickly print an R object's size
genBootIndeces

Generate a random set of bootstrap resampling indeces
ffmatrixmult

Matrix multiplication with "ff_matrix" or "matrix" inputs
simEEG

Simulation functional EEG data
reindexMatricesByK

Used for calculation of low dimensional standard errors & percentiles, by re-indexing the \(A^b\) by PC index (\(k\)) rather than bootstrap index (\(b\)).
reindexVectorsByK

Used to study of the bootstrap distribution of the k^th singular values, by re-indexing the list of \(d^b\) vectors to be organized by PC index (\(k\)) rather than bootstrap index (\(b\)).
getMomentsAndMomentCI

Calculate bootstrap moments and moment-based confidence intervals for the PCs.
genQ

Generate random orthonormal matrix
EEG_leadingV

Leading 5 Principal Components (PCs) from EEG dataset
fastSVD

Fast SVD of a wide or tall matrix
bootSVD_LD

Calculate bootstrap distribution of \(n\)-dimensional PCs
As2Vs

Convert low dimensional bootstrap components to high dimensional bootstrap components
bootPCA

Quickly calculates bootstrap PCA results (wrapper for bootSVD)
EEG_mu

Functional mean from EEG dataset
EEG_score_var

Empirical variance of the first 5 score variables from EEG dataset
bootSVD

Calculates bootstrap distribution of PCA (i.e. SVD) results