Polygenic Risk Scores for a grid of clumping and thresholding parameters.
Stacking over many Polygenic Risk Scores, corresponding to a grid of many different parameters for clumping and thresholding.
snp_grid_clumping(
G,
infos.chr,
infos.pos,
lpS,
ind.row = rows_along(G),
grid.thr.r2 = c(0.01, 0.05, 0.1, 0.2, 0.5, 0.8, 0.95),
grid.base.size = c(50, 100, 200, 500),
infos.imp = rep(1, ncol(G)),
grid.thr.imp = 1,
groups = list(cols_along(G)),
exclude = NULL,
ncores = 1
)snp_grid_PRS(
G,
all_keep,
betas,
lpS,
n_thr_lpS = 50,
grid.lpS.thr = 0.9999 * seq_log(max(0.1, min(lpS, na.rm = TRUE)), max(lpS, na.rm =
TRUE), n_thr_lpS),
ind.row = rows_along(G),
backingfile = tempfile(),
type = c("float", "double"),
ncores = 1
)
snp_grid_stacking(
multi_PRS,
y.train,
alphas = c(1, 0.01, 1e-04),
ncores = 1,
...
)
snp_grid_PRS()
: An FBM
(matrix on disk) that stores the C+T scores
for all parameters of the grid (and for each chromosome separately).
It also stores as attributes the input parameters all_keep
, betas
,
lpS
and grid.lpS.thr
that are also needed in snp_grid_stacking()
.
A FBM.code256
(typically <bigSNP>$genotypes
).
You shouldn't have missing values. Also, remember to do quality control,
e.g. some algorithms in this package won't work if you use SNPs with 0 MAF.
Vector of integers specifying each SNP's chromosome.
Typically <bigSNP>$map$chromosome
.
Vector of integers specifying the physical position
on a chromosome (in base pairs) of each SNP.
Typically <bigSNP>$map$physical.pos
.
Numeric vector of -log10(p-value)
associated with betas
.
An optional vector of the row indices (individuals) that
are used. If not specified, all rows are used.
Don't use negative indices.
Grid of thresholds over the squared correlation between
two SNPs for clumping. Default is c(0.01, 0.05, 0.1, 0.2, 0.5, 0.8, 0.95)
.
Grid for base window sizes. Sizes are then computed as
base.size / thr.r2
(in kb). Default is c(50, 100, 200, 500)
.
Vector of imputation scores. Default is all 1
if you do
not provide it.
Grid of thresholds over infos.imp
(default is 1
), but
you should change it (e.g. c(0.3, 0.6, 0.9, 0.95)
) if providing infos.imp
.
List of vectors of indices to define your own categories. This could be used e.g. to derive C+T scores using two different GWAS summary statistics, or to include other information such as functional annotations. Default just makes one group with all variants.
Vector of SNP indices to exclude anyway.
Number of cores used. Default doesn't use parallelism.
You may use bigstatsr::nb_cores()
.
Output of snp_grid_clumping()
(indices passing clumping).
Numeric vector of weights (effect sizes from GWAS) associated
with each variant (column of G
). If alleles are reversed, make sure to
multiply corresponding effects by -1
.
Length for default grid.lpS.thr
. Default is 50
.
Sequence of thresholds to apply on lpS
.
Default is a grid (of length n_thr_lpS
) evenly spaced on a logarithmic
scale, i.e. on a log-log scale for p-values.
Prefix for backingfiles where to store scores of C+T. As we typically use a large grid, this can result in a large matrix so that we store it on disk. Default uses a temporary file.
Type of backingfile values. Either "float"
(the default) or
"double"
. Using "float"
requires half disk space.
Output of snp_grid_PRS()
.
Vector of phenotypes. If there are two levels (binary 0/1),
it uses bigstatsr::big_spLogReg()
for stacking, otherwise bigstatsr::big_spLinReg()
.
Vector of values for grid-search. See bigstatsr::big_spLogReg()
.
Default for this function is c(1, 0.01, 0.0001)
.
Other parameters to be passed to bigstatsr::big_spLogReg()
. For example,
using covar.train
, you can add covariates in the model with all C+T scores.
You can also use pf.covar
if you do not want to penalize these covariates.