trioGxE
estimates statistical interaction (GxE) between a single nucleotide polymorphism (SNP)
and a continuous environmental or non-genetic attributes
in case-parent trio data by fitting a generalized additive
model (GAM) using a penalized iteratively re-weighted least squares algorithm.trioGxE(data, pgenos, cgeno, cenv,
penmod = c("codominant","dominant","additive","recessive"),
k = NULL, knots = NULL, sp = NULL, lsp0 = NULL, lsp.grid = NULL,
control = list(maxit = 100, tol = 1e-07, trace = FALSE),
testGxE = FALSE, return.data = TRUE, ...)
data
that hold parental genotypes.data
that holds
the child genotypes.data
that holds
the non-genetic attribute being examined for interaction with genotype."codominant"
(default),
"dominant"
, "additive"
, or "recessive"
.penmod="codominant"
, a length-2 vector with positive integers must bpenmod="codominant"
, a list of two vectors must be provided.
For the other penetrance modes, a single vector must be provided.
When NULL
(default), knots will be placpenmod="codominant"
, a vector with two non-negative
numbers must be provided.
Otherwise, a single non-negative number must be provided.
When NULL
(default), a double (undeNULL
} (default), trioGxE
takes the log of spenmod=
"codominant"
, a list of two vectors of lengths $\geq 2$
must be provided. As the vector is longer, the grid becomes more refmaxit
: positive integer giving the maximal number of PIRLS iterations
tol
: positiFALSE
. User should not modify this argument.TRUE
(default), the data is returned.control
, which includes maxit
, tol
or trace
.trioGxE()
as an argument: returned when return.data=TRUE
NULL
) when smoothing parameters were not estimated but provided by the user.qr
for the list of values.model.mat
: The design matrix of the problem. The number of rows is equal to
$n_1+n_2+2n_3$, where $n_m$ is the number of $m$th informative
mating types. The number of columns is equal to the size of coefficient
.
pen.mat
: penalty matrix with size equal to the size of coefficient
.
bs.dim
: number of knots used for basis construction.
knots
: knot positions used for basis construction.
}FALSE
, sp
contains the smoothing parameter values estimated by the UBRE optimization.data
corresponding to
the child genotypes, parental genotypes and child's non-genetic attributes.test.trioGxE
function.lsp.grid
.coefficient
.trioGxE
fits data from case-parent trios to a GAM with smooth functions
representing gene-environment interaction ($G \times E$). The data
input must be a data frame containing the following 4 columns (of any order):
trioGxE
does basic error checking to ensure that
only the trios that are consistent with Mendelian segregation law
with complete genotype, environment and parental genotype information.
The function determines which trios are from informative parental mating types.
An informative parental pair has at least one heterozygote; such parental pair can
have offspring that are genetically different.
Under the assumption that the parents transmit the alleles to their child under
Mendel's law, with no mutation, there are three types of informative mating types
$G_p=1,2,3$:
Since GxE occurs when genotype relative risks vary with non-genetic attribute values $E=e$, GxE inference is based on the attribute-specific genotype relative risks, GRR$_h(e)$, expressed as $${\rm GRR}_h(e) = \frac{P(D=1|G=h,E=e)}{P(D=1|G=h-1,E=e)} = \exp{(\gamma_h + f_h(e))}, ~~h=1,2,$$ where $D=1$ indicates the affected status, $\gamma_1$ and $\gamma_2$ represent genetic main effect, and $f_1(e)$ and $f_2(e)$ are unknown smooth functions. The functions $f_h(e)$ represent GxE since GRRs vary with $E=e$ only when $f_1(e)\neq 0$ or $f_2(e)\neq 0$ vary with $E$. The expressions are followed by assuming a log-linear model of disease risk in Shin et al. (2010).
Under the co-dominant penetrance mode (i.e., penetrance="codominant"
),
GRR$_1(e)$ and GRR$_2(e)$ are estimated using the information
in the trios from the informative mating types $G_p=1$, 3 and those from $G_p=2$, 3, respectively.
Under a non-co-dominant penetrance mode, only one GRR function, GRR$(e)=\gamma+f(e)$,
is estimated from an appropriate set of informative trios.
Under the dominant penetrance mode (i.e., penetrance="dominant"
),
because GRR$_2(e)$ is 1 (i.e., $\gamma_2=0$ and $f_2(e)=0$),
GRR$(e)\equiv$ GRR$_1(e)$ is estimated based on the trios from $G_p=1$ and 3.
Under the recessive penetrance mode (i.e., penetrance="recessive"
),
because GRR$_1(e)$ is 1 (i.e., $\gamma_1=0$ and $f_1(e)=0$),
GRR$(e)\equiv$ GRR$_2(e)$ is estimated based on the trios from $G_p=2$ and 3.
Under the multiplicative or log-additive penetrance model (penetrance="additive"
),
since GRR$_1(e)$ = GRR$_2(e)$ (i.e., $\gamma_1=\gamma_2$ and $f_1=f_2$),
GRR$(e)\equiv$ GRR$_1(e) \equiv$ GRR$_2(e)$ is estimated based on all informative trios.
The interaction functions are approximated by cubic regression spline functions defined by
the knots specified through the arguments k
and knots
.
For each interaction function, at least three knots are chosen within the range of
the observed non-genetic attribute values.
Under the co-dominant mode, k[1]
and k[2]
knots, respectively, located at
knots[[1]]
and knots[[2]]
are used to construct the basis for
$f_1(e)$ and $f_2(e)$, respectively.
By default, a total of 5 knots are placed for each interaction function:
three interior knots at the 25th, 50th and 75th quantiles and two boundary knots
at the endpoints of the data in trios from $G_p=1$ or $3$, for $f_1(e)$,
and in trios from $G_p=2$ or $3$, for $f_2(e)$.
Similarly, under a non-co-dominant penetrance mode, when knots=NULL
,
k
knots are chosen based on the data in trios from
$G_p=1$ and $3$ (dominant mode);
in trios from all informative mating types (log-additive mode);
and in trios from $G_p=2$ or $3$ (recessive mode).
A standard model identifiability constraint is imposed on each interaction function,
which involves the sum of the interaction function over all observed attribute values of cases
in the appropriate set of informative trios.
For smoothing parameter estimation, trioGxE
finds the optimal values using either a double (if co-dominant) or
a single 1-dimensional grid search constructed based on the arguments lsp0
and lsp.grid
.
When lsp0 = NULL
, trioGxE
takes the log smoothing parameter estimates obtained
from a likelihood approach that makes inference of GxE conditional
on parental mating types, non-genetic attribute and partial information on child genotypes (Duke, 2007).
When lsp.grid = NULL
, trioGxE
takes the following 6 numbers to be the trial values of
the log-smoothing parameter for each interaction function:
-20 and 20, lsp0
and the quartiles of the truncated normal distributions
constructed based on lsp0
.
At each trial value in lsp.grid
, the prediction error criterion function,
UBRE (un-biased risk estimator, is minimized to find the optimal smoothing parameter.
For more details on how to estimate the smoothing parameters, see Appendix B.3 in Shin (2012).
For variance estimation, trioGxE
uses a Bayesian approach (Wood, 2006);
the resulting Bayesian credible intervals have been shown to have good frequentist coverage probabilities
as long as the bias is a relatively small fraction of the mean squared error (Marra and Wood, 2012)
trioSim
, plot.trioGxE
, test.trioGxE
## fitting a co-dominant model
data(hypoTrioDat)
simfit <- trioGxE(data=hypoTrioDat,pgenos=c("parent1","parent2"),cgeno="child",cenv="attr",
k=c(5,5),knots=NULL,sp=NULL)
## fitting a dominant model to the hypothetical data
simfit.dom <- trioGxE(data=hypoTrioDat,pgenos=c("parent1","parent2"),cgeno="child",cenv="attr",
penmod="dom",k=5,knots=NULL,sp=NULL)
Run the code above in your browser using DataLab