Easy-to-use, efficient, flexible and scalable statistical tools. Package bigstatsr provides and uses Filebacked Big Matrices via memory-mapping. It provides for instance matrix operations, Principal Component Analysis, sparse linear supervised models, utility functions and more tools:::Rd_expr_doi("10.1093/bioinformatics/bty185").
An object of class FBM.
An object of class FBM.code256.
Vector of responses, corresponding to ind.train
.
Vector of responses, corresponding to ind.train
.
Must be only 0s and 1s.
An optional vector of the row indices that are used, for the training part. If not specified, all rows are used. Don't use negative indices.
An optional vector of the row indices that are used. If not specified, all rows are used. Don't use negative indices.
An optional vector of the column indices that are used. If not specified, all columns are used. Don't use negative indices.
Maximum number of columns read at once. Default uses block_size.
Number of cores used. Default doesn't use parallelism. You may use nb_cores.
A function with parameters X
, ind.row
and ind.col
,
and that returns a data.frame with $center
and $scale
for the columns
corresponding to ind.col
, to scale each of their elements such as followed:
$$\frac{X_{i,j} - center_j}{scale_j}.$$ Default doesn't use any scaling.
You can also provide your own center
and scale
by using as_scaling_fun()
.
Matrix of covariables to be added in each model to correct
for confounders (e.g. the scores of PCA), corresponding to ind.train
.
Default is NULL
and corresponds to only adding an intercept to each model.
You can use covar_from_df()
to convert from a data frame.
Matrix of covariables to be added in each model to correct
for confounders (e.g. the scores of PCA), corresponding to ind.row
.
Default is NULL
and corresponds to only adding an intercept to each model.
You can use covar_from_df()
to convert from a data frame.
Vector of same length of ind.col
to subtract from columns of X
.
Vector of same length of ind.col
to divide from columns of X
.
Large matrix computations are made block-wise and won't be parallelized
in order to not have to reduce the size of these blocks. Instead, you can use
the MKL
or OpenBLAS in order to accelerate these block matrix computations.
You can control the number of cores used by these optimized matrix libraries
with bigparallelr::set_blas_ncores()
.
Maintainer: Florian Privé florian.prive.21@gmail.com
Other contributors:
Michael Blum [thesis advisor]
Hugues Aschard hugues.aschard@pasteur.fr [thesis advisor]
Useful links: