h.traits(snp.vars, traits.lab, beta.hat, sigma.hat, ncase=NULL, ncntl=NULL, cor=NULL, cor.numr=FALSE, search=NULL, side=2, meta=FALSE, zmax.args=NULL, pval.args=NULL, p.bound=1, NSAMP=5000, NSAMP0=50000)
k
studies/traits being analyzed.
The order of this vector must match the columns of beta.hat
and sigma.hat
No default.snp.vars
) by (k
) (or a vector of length k
when 1 SNP is passed). Each row gives the coefficients obtained
from the analysis of that SNP across the k
studies/traits. No default.k
studies. This can be same for each SNP,
in which case ncase
is a vector of length k
.
Alternatively if the number of non-missing cases analyzed for each SNP is known, then
ncase
can be a length(snp.vars
) by (k
) matrix.
If NULL, then ncase
will be set to 1/sigma.hat^2
.
The default is NULL.ncase
(above) for controls. The default is NULL.k
by k
matrix of inter-study phenotypic correlations or a list containing three case/control overlap matrices (for
case-control studies)
named N11
, N00
and N10
. See details. Default is NULL, so that studies are assumed to be independent.
The rows and columns of all matrices needs to be in the same order as traits.lab
and columns
of beta.hat
and sigma.hat
side
should be 2. For one-tailed tests,
side
should be 1 (positive tail is assumed). Default is 2.
The option is ignored when search
=2 since the two-sided subset search is automatically a two-tailed test.z.max
as a named list. This option can be useful if the user wants to
restrict subset searches in some structured way, for example, incorporating some ordering constraints.p.dlm
as a named list. This option can be useful if the user wants to
restrict subset searches in some structured way, for example, incorporating some ordering constraints.p.bound
< 1. The default is 50000. See details.meta
is TRUE and contains 3 vectors named (pval, beta, sd) of length same as snp.vars.(2) "Subset.1sided" (one-sided subset search):
This list is non-null when search
is NULL or 1 and contains, 4 vectors named (pval, beta, sd, sd.meta) of length same as snp.vars
and a logical matrix named "pheno" with one row for each snp and one column for each phenotype. For a particular SNP and phenotype, the entry
has "TRUE" if this phenotype was in the selected subset for that SNP. In the output, the p-value is automatically adjusted for multiple testing
due to subset search. The beta and sd correspond to the standard fixed-effect meta-analysis estimate and corresponding standard error estimate
for the regression coefficient of a SNP based only on those studies/traits that are included in the identified subset.
The vector sd.meta gives the meta-analysis standard errors for estimates of beta based on studies in the identified subset ignoring the randomness of the subset.(3) Subset.2sided (two-sided subset search)
This list is non-null when search
is NULL or 2 and contains 9 vectors named
(pval, pval.1, pval.2, beta.1, sd.1, beta.2, sd.2, sd.1.meta, sd.2.meta) of length
same as snp.vars and two matrices named "pheno.1" and "pheno.2" giving logical indicators of a phenotype being among the positively or negatively
associated subsets (respectively) as identified by 2-sided subset search. In the output, while pval provides the significance of the overall test-statistics that
combined association signals from two directions, pval.1 and and pval.2 return the corresponding level of significance for each of the component one-sided
test-statistics in the positive and negative directions. The values (beta.1, sd.1, beta.2, sd.2) denote the corresponding meta-analysis estimate of regression coefficients
and standard errors for the identifed subsets of traits/studies that show association in positive and negative directions, respectively.
The vector sd.1.meta and sd.2.meta give the meta-analysis standard errors for estimates of beta based on studies in the identified subset ignoring the randomness of the subset in positive and negative directions, respectively.The other objects in the list are the input arguments passed into h.traits
.
k
studies/traits.
The one-sided method automatically guarantees identification of studies/traits that have associations in the same direction and thus is useful in applications where
it is desirable to identify SNPs that shows effects in the same direction across multiple traits/studies. The two-sided subset search, applies one-side subset search separately for positively and negatively associated traits for a given SNP and then combines the association
signals from two directions into a single combined chi-square type statistic. The method is sensitive in detecting SNPs that may be associated with different
traits in different directions.The methods allow for accounting for correlation among studies/subject that might arise due to shared subjects across distinct studies or due to correlation
among related traits in the same study. For application of the method for meta-analysis of case-control studies, the matrices N11
, N10
and N00
denote the number subjects that are shared between studies by case-control status. By defintion, the
diagonals of the matrices N11
and N00
contain the number of cases and controls, respectively, in the k
studies. Also, by definition,
the diagonal of N10
is zero since cases cannot serve as controls and vice versa in the same study. The most common situation
may involve shared controls accross studies, ie non-zero off-diagonal elements of the matrix N00.
The output standard errors are approximate (based on inverting p-values) and are used for constructing confidence
intervals in h.summary
and h.forestPlot
.
Currently ASSET calculates p-values by a stochastic approximation to the DLM formula as described in
Bhattacharjee et al. (In Preparation). The method works by simulating truncated multivariate normal
variates by importance sampling to estimate the probability term appearing in the DLM formula.
Since version 2.0.0, the previous meth.pval
="DLM" option to calculate upper bound
p-values (as in Bhattacharjee et al. 2012) has been dropped as the current stochastic
approximation is expected to be more accurate in all cases although slightly slower.
The new p-value method also enables pre-screening of traits by the p.bound
argument.
Specifying a p-value upper bound through p.bound
, helps in speeding up the code when the number of traits or subtypes is relatively large. For example if p.bound=0.25
is
chosen, on an average (under the null) only a quarter of the traits will be included in subset search,
allowing more traits to be analyzed in a computationally feasible manner. Note that the studies being
maximized over will vary from SNP to SNP, and appropriate multiple-testing adjustment is done internally to account for this pre-selection.
The arguments NSAMP
and NSAMP0
give the number of importance sampling replicates to be generated.
Either of these can be increased to achieve more accuracy at the cost of computational speed or vice versa.
Bhattacharjee et al. Pre-screening and Meta-analysis based on Subset: A Fast and Powerful Approach to Pleitropic Analysis Across a Large Number of Traits (In Preparation)
h.summary
, h.forestPlot
# Use the example data
data(ex_trait, package="ASSET")
# Display the data, and case/control overlap matrices
data
N00
N11
N10
# Define the input arguments to h.traits
snps <- as.vector(data[, "SNP"])
traits.lab <- paste("Trait_", 1:6, sep="")
beta.hat <- as.matrix(data[, paste(traits.lab, ".Beta", sep="")])
sigma.hat <- as.matrix(data[, paste(traits.lab, ".SE", sep="")])
cor <- list(N11=N11, N00=N00, N10=N10)
ncase <- diag(N11)
ncntl <- diag(N00)
# Now let us call h.traits on these summary data.
res <- h.traits(snps, traits.lab, beta.hat, sigma.hat, ncase=ncase,
ncntl=ncntl, cor=cor, cor.numr=FALSE, search=NULL,
side=2, meta=TRUE, zmax.args=NULL)
h.summary(res)
Run the code above in your browser using DataLab