Calculates the likelihood intervals for genetic association in a genomic region of interest. Covariates can be accommodated.
evian(data, bim, xcols = NULL, ycol = NULL, covariateCol = NULL,
formula = NULL, robust = FALSE, model='additive', m=200,
bse = 5, lolim = NULL, hilim = NULL, kcutoff = c(8,32,100,1000),
multiThread = 1, family='gaussian',plinkCC=F)
a data frame includes a column for the response variable, one or multiple columns of genotype data (coded as 0, 1, 2,
or NA
), and optionally columns for covariates. Headers are assumed. If the data is from related individuals, an additional column named 'FID'
needs to be included to specify the related structure. Using the PLINK toolkit with option --recodeA
can produce the file in the required format and is recommended.
a data frame with six columns representing chromosome, SNP ID, physical distance, base pair position, effective allele, and reference allele. i.e. data from a file in PLINK binary format (bim). No header is assumed, but the ordering of the columns must follow the standard bim file format.
numeric; column index in the data
data frame for the column representing the response variable.
numeric vector; the column range in the data
where genotype information is stored. Note that although a range of X is required, only one SNP at a time is calculated.
numeric or numeric vector; optional argument specifying which columns represent covariates. If left as NULL
, no covariates will be included and the model Y~snp
will be used.
string; this is an alternative way of specifying model rather than using xcols
and ycol
arguments. This model follows the same format as the glm
function (e.g. Y~snp1+age+sex
). Note that in the case where multiple SNPs are included, only one SNP will be considered (e.g. given Y~snp1+snp2
, the function will consider snp1 as the parameter of interests). The function can automatically identify SNPs with rsID as proper Xs, and would treat all other predictors as covariates.
logical; default FALSE
. If TRUE
, then a robust adjustment is applied to the likelihood function to account for clustering in the data; See robust_forCluster.
a string that specifies the mode of inheritance parameterization: additive, dominant, recessive,
or overdominance
. Default additive
.
numeric; the density of the grid at which to compute the standardized likelihood function. A beta grid is defined as the grid of values for the SNP parameter used to evaluate the likelihood function.
numeric; the number of beta standard errors to utilize in constraining the beta grid limits. Beta grid is evaluated at \(\beta\) +/- bse
*s.e.
numeric; the lower limit for the grid or the minimum value of the regression parameter \(\beta\) used to calculate the likelihood function.
numeric; the upper limit for the grid or the maximum value of the regression parameter \(\beta\) used to calculate the likelihood funciton.
numeric or numeric vector; default = c(8,32,100,1000)
. The strength of evidence criterion k. The function will calculate the 1/k
standardized likelihood intervals for each value provided.
numeric; number of threads to use for parallel computing.
the link function for glm
.
A boolean type that specifies how case/control are coded. case/control were coded 1/0 if it is FALSE, and were coded 2/1 if TRUE.
This function outputs the row-combined the results from calculateEvianMLE
for each of the SNPs included in the data/bim files. The exact output for each SNP can be found in the calculateEvianMLE
documentation.
evian
is the main function called to calculate the 1/k
likelihood intervals for the additive, dominant, recessive, or overdominance genotypic models. This function calls calculateEvianMLE
in parallel to calculate the likelihood for each SNP. The calculation details can be found in calculateEvianMLE
.
The input for the data
and bim
arguments can be obtained from the PLINK files; data
is expected to follow PLINK format when run with the --recodeA
option and bim
can be obtained directly from a PLINK binary format file. Note if covariates are to be included, it is expected that the covariates are appended to the data
file with a header for each covariate.
The statistical model can be specified in two ways. Column index can be provided through the xcols
, ycol
, and covariateCol
arguments or through the formula
argument, which can accept a formula specified as the formula
argument in the R glm
function. We recommend using xcols
, ycol
, and covariateCol
arguments in most scenarios as this is relatively easier to input and it works for all the cases that we have considered so far. The alternative formula
argument is not able to detect non-rsID variants as parameters of interests, and is only suggested in the scenario where only a single variant is of interest and that its rsID is known in advance. Since the profileLikelihood can only accomendate scalar parameter and thus if multiple rsID variants are inputted through formula
option, it will only assume the first one to be parameter of interests.s
Parallel computing is avaliable through the use of the multiThread
argument. This parallelization uses the foreach
and doMC
packages and will typically reduce computation time significantly. Due to this dependency, parallelization is not available on Windows OS as foreach
and doMC
are not supported on Windows.
# NOT RUN {
data(evian_linear_raw)
data(evian_linear_bim)
rst1=evian(data=evian_linear_raw, bim=evian_linear_bim, xcols=10:ncol(evian_linear_raw),
ycol=6, covariateCol=c(5,7:9), robust=FALSE, model="additive", m=200, lolim=-0.4,
hilim=0.4, kcutoff = c(32,100), multiThread=1,family='gaussian',plinkCC=FALSE)
# }
Run the code above in your browser using DataLab