h.types(dat, response.var, snp.vars, adj.vars, types.lab, cntl.lab, subset=NULL, method=NULL, side=2, logit=FALSE, test.type="Score", zmax.args=NULL, pval.args=NULL, p.bound = 1, NSAMP=5000, NSAMP0=50000)
response.var
to be
included in the analysis. If NULL, then all subtypes will be included. No default.response.var
. No default.dat
) indicating the subset of rows of the data frame to be included in the analysis. Default is NULL, all rows are used.i
,
the set of control subjects is formed by taking the complement of disease subtype i
, ie the original controls and the cases
not defined by disease subtype i
.z.max
as a named list. This option can be useful if the user wants to
restrict subset searches in some structured way, for example, incorporating ordering constraints.p.dlm
as a named list. This option can be useful if the user wants to
restrict subset searches in some structured way, for example, incorporating ordering constraints.p.bound
< 1. The default is 50000. See details.logit
is TRUE and contains 3 vectors named (pval, beta, sd) of length same as snp.vars.(2) "Subset.Case.Control" (output for subset-based case-control analysis):
This list is non-null when method
is NULL or "case-control". The output contains, 3 vectors named (pval, beta, sd) of length same as snp.vars
and a logical matrix named "pheno" with one row for each snp and one column for each disease subtype. For a particular SNP and disease-subtype, the
corresponding entry is "TRUE" if that disease subtype is included the best subset of disease subtypes that is identified to be associated with
the SNP in the subset-based case-control analysis. In the output, the p-value is automatically adjusted for multiple testing
due to subset search. The beta and sd corresponds to estimate of log-odds-ratio and standard error for a SNP from a logistic regression analysis involving
the cases of the identified disease subtypes and the controls.(3) "Subset.Case.Complement" (output for subset-based case-complement analysis):
This list is non-null when method
is NULL or "case-complement". The output contains, 3 vectors named (pval, beta, sd) of length same as snp.vars
and a logical matrix named "pheno" with one row for each snp and one column for each disease subtype. For a particular SNP and disease-subtype, the
corresponding entry is "TRUE" if that disease subtype is included the best subset of disease subtypes that is identified to be associated with
the SNP in the subset-based case-complement analysis. In the output, the p-value is automatically adjusted for multiple testing
due to subset search. The beta and sd corresponds to estimate of log-odds-ratio and standard error for the SNP from a logistic regression analysis involving
the cases of the selected disease subtypes and the whole complement set of subjects that includes original controls and the cases of unselected disease subtypes.
h.summary
and h.forestPlot
. For a particular SNP, if any of the genotypes are missing, then those
subjects will be removed from the analysis for that SNP.Currently ASSET calculates p-values by a stochastic approximation to the DLM formula as described in
Bhattacharjee et al. (In Preparation). The method works by simulating truncated multivariate normal
variates by importance sampling to estimate the probability term appearing in the DLM formula.
Since version 2.0.0, the previous meth.pval
="DLM" option to calculate upper bound
p-values (as in Bhattacharjee et al. 2012) has been dropped as the current stochastic
approximation is expected to be more accurate in all cases although slightly slower.
The new p-value method also enables pre-screening of traits by the p.bound
argument.
Specifying a p-value upper bound through p.bound
, helps in speeding up the code when the number of traits or subtypes is relatively large. For example if p.bound=0.25
is
chosen, on an average (under the null) only a quarter of the traits will be used for subset search,
allowing more traits to be analyzed in a computationally feasible manner.
The arguments NSAMP
and NSAMP0
give the number of importance sampling replicates to be generated.
Either of these can be increased to achieve more accuracy at the cost of computational speed or vice versa.
h.summary
, h.forestPlot
# Use the example data
data(ex_types, package="ASSET")
# Display the first 10 rows of the data and a table of the subtypes
data[1:10, ]
table(data[, "TYPE"])
# Define the input arguments to h.types.
snps <- paste("SNP_", 1:3, sep="")
adj.vars <- c("CENTER_1", "CENTER_2", "CENTER_3")
types <- paste("SUBTYPE_", 1:5, sep="")
# SUBTYPE_0 will denote the controls
res <- h.types(data, "TYPE", snps, adj.vars, types, "SUBTYPE_0", subset=NULL,
method="case-control", side=2, logit=FALSE, test.type="Score",
zmax.args=NULL, pval.args=NULL)
h.summary(res)
Run the code above in your browser using DataLab