calculateEvianMLE: Profile likelihood calculation using regression models

Description

This is the function that calculates profileLikelihood for a single SNP. The main function evian calls this function repeatedly to obtain results for multiple SNPs.

Usage

calculateEvianMLE(snp, formula_tofit, model, data, bim, lolim, hilim,
                    m, bse, k, robust, family, plinkCC)

Arguments

snp

a string specifying the SNP of interests to be calculated.

formula_tofit

a formula object of the genetic model. The model should be formatted as y~nuisance parameters. The parameter of interest should not be included here.

model

a string specifying the mode of inheritance parameterization: additive, dominant, recessive, or overdominance. See details.

data

data frame; read from the argument data in the main function evian. It should contain the SNP ID specified in the snp argument as a column name.

bim

data frame; read from from the argument bim in the main function evian. Provides allele information (base pair, effect/reference alleles) for the SNP of interest.

lolim

numeric; the lower limit for the grid or the minimum value of the regression parameter $\beta$ used to calculate the likelihood function.

hilim

numeric; the upper limit for the grid or the maximum value of the regression parameter $\beta$ used to calculate the likelihood funciton.

numeric; the density of the grid at which to compute the standardized likelihood function. A beta grid is defined as the grid of values for the SNP parameter used to evaluate the likelihood function.

bse

numeric; the number of beta standard errors to utilize in constraining the beta grid limits. Beta grid is evaluated at $\beta$ +/- bse*s.e.

numeric or numeric vector; The strength of evidence criterion k. Reads from the input of kcutoff from the main evian function

robust

logical; if TRUE, then a robust adjustment is applied to the likelihood function to account for the cluster nature in the data. See robust_forCluster.

family

the link function for glm.

plinkCC

A boolean type that specifies how case/control are coded. case/control were coded 1/0 if it is FALSE, and were coded 2/1 if TRUE.

Value

This function outputs a list containg 4 elements that can be directly accessed using '$' operator.

theta

numeric vector; It stores all m $\beta$ values that used to estimate the standardized profile likelihood.

profile.lik.norm

numeric vector; the corresponding m standardized profile likelihood value at each of the $\beta$ values in theta. If robust=TRUE, then the values will be adjusted by the robust factor.

k_cutoff

numeric vector; It specifies which k-cutoff had been used in the calculation, ordered from the smallest k to the largest k.

SummaryStats

data frame; contains the summary statistics of the profile likelihood calculation. It contains the following columns:

mle: the estimates for SNP effect with respect to the effective allele
maxlr: maximum likelihood ratio in the beta grid defined by lolim and hilim
AF: allele frequency for the effective allele
SNP: SNP ID
bp: base pair position from the bim input
effect, ref: the effective allele and the other allele from the bim input
robustFactor: robust factor calculated, set to 1 if robust=FALSE.
lo_1, hi_1, lo_2, hi_2...: the lower and upper bound of the likelihood intervals for the kth cut-off in k_cutoff.

Details

calculateEvianMLE calculates the profile likelihood for a single SNP. A proper grid range is first established for $\beta$ then the standardized profile likelihood is evaluated at each of the m cuts uniformly spread across the grid. Based on the standardized profile likelihood, the MLE for $\beta$ is computed as well as the likelihood intervals for each value of k provided. For different genetic models, their coding schemes are shown as below:

Additive AA 0 AB 1 BB 2

Dominant AA 0 AB 1 BB 1

Recessive AA 0 AB 0 BB 1

Overdominance model A D AA 0 0 AB 1 1 BB 2 0

Specifically for the overdominance model, the column of interest is the D column.

References

Strug, L. J., Hodge, S. E., Chiang, T., Pal, D. K., Corey, P. N., & Rohde, C. (2010). A pure likelihood approach to the analysis of genetic association data: an alternative to Bayesian and frequentist analysis. Eur J Hum Genet, 18(8), 933-941. doi:10.1038/ejhg.2010.47

Strug, L. J., & Hodge, S. E. (2006). An alternative foundation for the planning and evaluation of linkage analysis. I. Decoupling "error probabilities" from "measures of evidence". Hum Hered, 61(3), 166-188. doi:10.1159/000094709

Royall, R. (1997). Statistical Evidence: A Likelihood Paradigm. London, Chapman and Hall.