Perform a two-dimensional genome scan with a two-QTL model, with possible allowance for covariates.
scantwo(cross, chr, pheno.col=1, model=c("normal","binary"),
method=c("em","imp","hk","mr","mr-imp","mr-argmax"),
addcovar=NULL, intcovar=NULL, weights=NULL,
use=c("all.obs", "complete.obs"),
incl.markers=FALSE, clean.output=FALSE,
clean.nmar=1, clean.distance=0,
maxit=4000, tol=1e-4,
verbose=TRUE, n.perm, perm.Xsp=FALSE, perm.strata=NULL,
assumeCondIndep=FALSE, batchsize=250, n.cluster=1)
If n.perm
is missing, the function returns a list with class
"scantwo"
and containing three components. The first component
is a matrix of dimension [tot.pos x tot.pos]; the upper triangle
contains the LOD scores for the additive model, and the lower triangle
contains the LOD scores for the full model. The diagonal contains the
results of scanone
. The second component of the
output is a data.frame indicating the locations at which the two-QTL
LOD scores were calculated. The first column is the chromosome
identifier, the second column is the position in cM, the third column
is a 1/0 indicator for ease in later pulling out only the equally
spaced positions, and the fourth column indicates whether the position
is on the X chromosome or not. The final component is a version of
the results of scanone
including sex and/or cross
direction as additive covariates, which is needed for a proper
calculation of conditional LOD scores.
If n.perm
is specified, the function returns a list with six
different LOD scores from each of the permutation replicates.
First, the maximum LOD score for the full model (two QTLs plus an
interaction). Second, for each pair of
chromosomes, we take the difference between the full LOD and the
maximum single-QTL LOD for those two chromosomes, and then maximize
this across chromosome pairs. Third, for each pair of chromosomes we
take the difference between the maximum full LOD and the maximum
additive LOD, and then maximize this across chromosome pairs. Fourth,
the maximum LOD score for the additive QTL model. Fifth, for each
pair of chromosomes, we take the difference between the additive LOD
and the maximum single-QTL LOD for those two chromosomes, and then
maximize this across chromosome pairs. Finally, the maximum
single-QTL LOD score (that is, from a single-QTL scan). The latter is
not used in summary.scantwo
, but does get
calculated at each permutation, so we include it for the sake of
completeness.
If n.perm
is specified and perm.Xsp=TRUE
, the result is
a list with the permutation results for the regions A:A, A:X, and X:X,
each of which is a list with the six different LOD scores. Independent
permutations are performed in each region, n.perm
is the number
of permutations for the A:A region; additional permutations are are
used for the A:X and X:X parts, as estimates of quantiles farther out
into the tails are needed.
An object of class cross
. See
read.cross
for details.
Optional vector indicating the chromosomes for which LOD
scores should be calculated. This should be a vector of character
strings referring to chromosomes by name; numeric values are
converted to strings. Refer to chromosomes with a preceding -
to have all chromosomes but those considered. A logical (TRUE/FALSE)
vector may also be used.
Column number in the phenotype matrix which should be
used as the phenotype. This can be a vector of integers; for methods
"hk"
and "imp"
this can be considerably faster than doing
them one at a time. One may also give character strings matching
the phenotype names. Finally, one may give a numeric vector of
phenotypes, in which case it must have the length equal to the number
of individuals in the cross, and there must be either non-integers or
values < 1 or > no. phenotypes; this last case may be useful for studying
transformations.
The phenotype model: the usual normal model or a model for binary traits.
Indicates whether to use the
the EM algorithm, imputation, Haley-Knott regression, or marker
regression. Marker regression is performed either by dropping
individuals with missing genotypes ("mr"
), or by first filling
in missing data using a single imputation ("mr-imp"
) or by the
Viterbi algorithm ("mr-argmax"
).
Additive covariates.
Interactive covariates (interact with QTL genotype).
Optional weights of individuals. Should be either NULL
or a vector of length n.ind containing positive weights. Used only
in the case model="normal"
.
In the case that multiple phenotypes are selected to be scanned, this argument indicates whether to use all individuals, including those missing some phenotypes, or just those individuals that have data on all selected phenotypes.
If FALSE, do calculations only at points on an
evenly spaced grid. If calc.genoprob
or
sim.geno
were run with
stepwidth="variable"
or stepwidth="max"
, we force incl.markers=TRUE
.
If TRUE, clean the output with
clean.scantwo
, replacing LOD scores for pairs of
positions that are not well separated with 0. In permutations, this
will be done for each permutation replicate. This can be important
for the case of method="em"
, as there can be difficulty with
algorithm convergence in these regions.
If clean.output=TRUE
, this is the number of
markers that must separate two positions.
If clean.output=TRUE
, this is the cM distance
that must separate two positions.
Maximum number of iterations; used
only with method "em"
.
Tolerance value for determining convergence; used only with
method "em"
.
If TRUE, display information about the progress of
calculations. For method "em"
, if verbose
is an integer
above 1, further details on the progress of the algorithm will be
displayed.
If specified, a permutation test is performed rather than an analysis of the observed data. This argument defines the number of permutation replicates.
If n.perm
> 0, so that a permutation test will
be performed, this indicates whether separate permutations should be
performed for the autosomes and the X chromosome, in order to get an
X-chromosome-specific LOD threshold. In this case, additional
permutations are performed for the X chromosome.
If n.perm
> 0, this may be used to perform a
stratified permutation test. This should be a vector with the same
number of individuals as in the cross data. Unique values indicate
the individual strata, and permutations will be performed within the
strata.
If TRUE, assume conditional independence of QTL genotypes given marker genotypes. This is an approximation, but it may speed things up.
The number of phenotypes (or permutations) to be run
as a batch; used only for methods "hk"
and "imp"
.
If the package snow
is available and
n.perm
> 0, permutations are run in parallel using this number
of nodes.
The X chromosome must be treated specially in QTL mapping.
As in scanone
, if both males and females are
included, male hemizygotes are allowed to be different from female
homozygotes, and the null hypothesis must be changed in order to ensure
that sex- or pgm-differences in the phenotype do not results in spurious
linkage to the X chromosome. (See the help file for
scanone
.)
If n.perm
is specified and perm.Xsp=TRUE
,
X-chromosome-specific permutations are performed, to obtain separate
thresholds for the regions A:A, A:X, and X:X.
Karl W Broman, broman@wisc.edu; Hao Wu
Standard interval mapping (method="em"
) and Haley-Knott
regression (method="hk"
) require that multipoint genotype probabilities are
first calculated using calc.genoprob
. The
imputation method uses the results of sim.geno
.
The method "em"
is standard interval mapping by the EM algorithm
(Dempster et al. 1977; Lander and Botstein 1989). Marker regression
(method="mr"
) is simply linear regression of phenotypes on
marker genotypes (individuals with missing genotypes are
discarded). Haley-Knott regression (method="hk"
) uses the
regression of phenotypes on multipoint genotype probabilities. The
imputation method (method="imp"
) uses the pseudomarker
algorithm described by Sen and Churchill (2001).
Individuals with missing phenotypes are dropped.
In the presence of covariates, the full model is $$y = \mu + \beta_{q_1} + \beta_{q_2} + \beta_{q_1 \times q_2} + A \gamma + Z \delta_{q_1} + Z \delta_{q_2} + Z \delta_{q_1 \times q_2} + \epsilon$$ where \(q_1\) and \(q_2\) are the unknown QTL genotypes at two locations, A is a matrix of covariates, and Z is a matrix of covariates that interact with QTL genotypes. The columns of Z are forced to be contained in the matrix A.
The above full model is compared to the additive QTL model, $$y = \mu + \beta_{q_1} + \beta_{q_2} + A \gamma + Z \delta_{q_1} + Z \delta_{q_2} + \epsilon$$ and also to the null model, with no QTL, $$y = \mu + A \gamma + \epsilon$$
In the case that n.perm
is specified, the R function
scantwo
is called repeatedly.
For model="binary"
, a logistic regression model is used.
Churchill, G. A. and Doerge, R. W. (1994) Empirical threshold values for quantitative trait mapping. Genetics 138, 963--971.
Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977) Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. B, 39, 1--38.
Haley, C. S. and Knott, S. A. (1992) A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69, 315--324.
Lander, E. S. and Botstein, D. (1989) Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121, 185--199.
Sen, Ś. and Churchill, G. A. (2001) A statistical framework for quantitative trait mapping. Genetics 159, 371--387.
Soller, M., Brody, T. and Genizi, A. (1976) On the power of experimental designs for the detection of linkage between marker loci and quantitative loci in crosses between inbred lines. Theor. Appl. Genet. 47, 35--39.
plot.scantwo
, summary.scantwo
,
scanone
, max.scantwo
,
summary.scantwoperm
,
c.scantwoperm
data(fake.f2)
fake.f2 <- subset(fake.f2, chr=18:19)
fake.f2 <- calc.genoprob(fake.f2, step=5)
out.2dim <- scantwo(fake.f2, method="hk")
plot(out.2dim)
# permutations
permo.2dim <- scantwo(fake.f2, method="hk", n.perm=2)
if (FALSE) permo.2dim <- scantwo(fake.f2, method="hk", n.perm=1000)
summary(permo.2dim, alpha=0.05)
# summary with p-values
summary(out.2dim, perms=permo.2dim, pvalues=TRUE,
alphas=c(0.05, 0.10, 0.10, 0.05, 0.10))
# covariates
data(fake.bc)
fake.bc <- subset(fake.bc, chr=16:17)
fake.bc <- calc.genoprob(fake.bc, step=10)
ac <- pull.pheno(fake.bc, c("sex","age"))
ic <- pull.pheno(fake.bc, "sex")
out <- scantwo(fake.bc, method="hk", pheno.col=1,
addcovar=ac, intcovar=ic)
plot(out)
Run the code above in your browser using DataLab