stepwiseqtl(cross, chr, pheno.col=1, qtl, formula, max.qtl=10, covar=NULL, method=c("imp", "hk"), model=c("normal", "binary"), incl.markers=TRUE, refine.locations=TRUE, additive.only=FALSE, scan.pairs=FALSE, penalties, keeplodprofile=TRUE, keeptrace=FALSE, verbose=TRUE, tol=1e-4, maxit=1000, require.fullrank=FALSE)
cross
. See
read.cross
for details.-
to have all
chromosomes but those considered. A logical (TRUE/FALSE) vector may
also be used."qtl"
, as created by
makeqtl
) to use as a starting point.refineqtl
to
refine the QTL locations after each step of forward and backward
selection.verbose
is an integer > 1, even more information is printed."qtl"
, as produced by
makeqtl
), with attributes "formula"
,
indicating the model formula, and "pLOD"
indicating the
penalized LOD score.If keeplodprofile=TRUE
, LOD profiles from the last pass through
the refinement algorithm are retained as an attribute,
"lodprofile"
, to the object. These may be plotted with
plotLodProfile
.If keeptrace=TRUE
, the output will contain an attribute
"trace"
containing information on the best model at each step
of forward and backward elimination. This is a list of objects of
class "compactqtl"
, which is similar to a QTL object (as
produced by makeqtl
) but containing just
a vector of chromosome IDs and positions for the QTL. Each will also
have attributes "formula"
(containing the model formula) and
"pLOD"
(containing the penalized LOD score.
imp
: multiple imputation is used, as described by Sen
and Churchill (2001). hk
: Haley-Knott regression is used (regression of the
phenotypes on the multipoint QTL genotype probabilities), as described
by Haley and Knott (1992).We consider QTL models allowing pairwise interactions among QTL but with an enforced hierarchy in which inclusion of a pairwise interaction requires the inclusion of both of the corresponding main effects. Additive covariates may be included, but currently we do not explore QTL:covariate interactions. Also, the penalized LOD score criterion is currently defined only for autosomal loci, and results with the X chromosome should be considered with caution.
The penalized LOD score is of the form $pLOD(g) = LOD(g)
- Tm pm - Th ph - Tl pl$ where $g$ denotes a model,
$pm$ is the number of QTL in the model ("main effects"),
$ph$ is the number of pairwise interactions that will be
given a heavy interaction penalty, $pl$ is the number of pairwise
interactions that will be given a light interaction penalty,
$Tm$ is the penalty on main effects, $Th$ is the heavy
interaction penalty, and $Tl$ is the light interaction
penalty. The penalties
argument is the vector $(Tm, Th, Tl)$. If $Tl$ is missing (penalties
has a vector of length 2), we assume $Tl = Th$, and so
all pairwise interactions are assigned the same penalty.
The "heavy" and "light" interaction penalties can be a bit confusing. Consider the clusters of QTL that are connected via one or more pairwise interactions. To each such cluster, we assign at most one "light" interaction penalty, and give all other pairwise interactions the heavy interaction penalty. In other words, if $pi$ is the total number of pairwise interactions for a QTL model, we let $pl$ be the number of clusters of connected QTL with at least one pairwise interaction, and then let $ph = pi - pl$.
Let us give an explicit example. Consider a model with 6 QTL, and
with interactions between QTL 2 and 3, QTL 4 and 5 and QTL 4 and 6
(so we have the model formula
y ~ Q1 + Q2 + Q3 + Q4 + Q5 + Q6 + Q2:Q3 + Q4:Q5 + Q4:Q6
).
There are three clusters of connected QTL: (1), (2,3) and (4,5,6). We
would assign 6 main effect penalties ($Tm$), 2 light
interaction penalties ($Tl$), and 1 heavy interaction penalty
($Th$).
Manichaikul et al. (2009) described a system for deriving the
three penalties on the basis of permutation results from a
two-dimensional, two-QTL genome scan (as calculated with
scantwo
). These may be calculated with the
function calc.penalties
.
A forward/backward search method is used, with the aim to optimize the penalized LOD score criterion. That is, we seek to identify the model with maximal the penalized LOD score. The search algorithm was based closely on an algorithm described by Zeng et al. (1999).
We use forward selection to a model of moderate size (say 10 QTL),
followed by backward elimination all the way to the null model. The
chosen model is that which optimizes the penalized LOD score
criterion, among all models visited. The detailed algorithm is as
follows. Note that if additive.only=TRUE
, no pairwise
interactions are considered.
scan.pairs=TRUE
, start with a two-dimensional, two-QTL genome
scan instead. If an initial QTL model were defined through the
arguments qtl
and formula
, start with this model and
jump immediately to step 2.
scan.pairs=TRUE
perform a two-dimensional, two-QTL
scan, seeking to add a pair of novel QTL, either additive or
interacting.
refine.locations=TRUE
).
In this forward/backward algorithm, it is likely best to build up to an overly large model and then prune it back. Note that there is no "stopping rule"; the chosen model is that which optimizes the model comparison criterion, among all models visited. The search can be time consuming, particularly if a two-dimensional scan is performed at each forward step. Such two-dimensional scans may be useful for identifying QTL linked in repulsion (having effects of opposite sign) or interacting QTL with limited marginal effects, but our limited experience suggests that they are not necessary; important linked or interacting QTL pairs can be picked up in the forward selection to a large model, and will be retained in the backward elimination phase.
Broman, K. W. and Speed, T. P. (2002) A model selection approach for the identification of quantitative trait loci in experimental crosses (with discussion). J Roy Stat Soc B 64, 641--656, 731--775.
Haley, C. S. and Knott, S. A. (1992) A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69, 315--324.
Sen, Ś. and Churchill, G. A. (2001) A statistical framework for quantitative trait mapping. Genetics 159, 371--387.
Zeng, Z.-B., Kao, C.-H. and Basten, C. J. (1999) Estimating the genetic architecture of quantitative traits. Genetical Research, 74, 279--289.
calc.penalties
,
plotModel
, makeqtl
,
fitqtl
, refineqtl
,
addqtl
, addpair
data(fake.bc)
## Not run: fake.bc <- calc.genoprob(fake.bc, step=2.5)
outsw <- stepwiseqtl(fake.bc, max.qtl=3, method="hk", keeptrace=TRUE)
# best model
outsw
plotModel(outsw)
# path through model space
thetrace <- attr(outsw, "trace")
# plot of these
par(mfrow=c(3,3))
for(i in seq(along=thetrace))
plotModel(thetrace[[i]], main=paste("pLOD =",round(attr(thetrace[[i]],"pLOD"), 2)))
Run the code above in your browser using DataLab