Compute a matrix product between BGEN files and a matrix. This removes the
need to read an intermediate FBM object with snp_readBGEN()
to compute the
product. Moreover, when using dosages, they are not rounded to two decimal
places anymore.
snp_prodBGEN(
bgenfiles,
beta,
list_snp_id,
ind_row = NULL,
bgi_dir = dirname(bgenfiles),
read_as = c("dosage", "random"),
block_size = 1000,
ncores = 1
)
The product bgen_data[ind_row, 'list_snp_id'] %*% beta
.
Character vector of paths to files with extension ".bgen". The corresponding ".bgen.bgi" index files must exist.
A matrix (or a vector), with rows corresponding to list_snp_id
.
List of character vectors of SNP IDs to read, with one
vector per BGEN file. Each SNP ID should be in the form
"<chr>_<pos>_<a1>_<a2>"
(e.g. "1_88169_C_T"
or "01_88169_C_T"
).
If you have one BGEN file only, just wrap your vector of IDs with list()
.
This function assumes that these IDs are uniquely identifying variants.
An optional vector of the row indices (individuals) that
are used. If not specified, all rows are used. Don't use negative indices.
You can access the sample IDs corresponding to the genotypes from the .sample
file, and use e.g. match()
to get indices corresponding to the ones you want.
Directory of index files. Default is the same as bgenfiles
.
How to read BGEN probabilities? Currently implemented:
as dosages (rounded to two decimal places), the default,
as hard calls, randomly sampled based on those probabilities
(similar to PLINK option '--hard-call-threshold random
').
Maximum size of temporary blocks (in number of variants).
Default is 1000
.
Number of cores used. Default doesn't use parallelism.
You may use bigstatsr::nb_cores()
.
snp_readBGEN()