p.dlm: Discrete Local Maxima approximate p-value.

Description

Function to obtain Discrete Local Maxima based estimates of p-values for z-scores maximized over subsets (of traits or subtypes), with possible restrictions and weights. Should not be called directly. See details.

Usage

p.dlm(t.vec, z.sub, search, side, cor.def=NULL, cor.args=NULL, sizes=NULL, p.bound=1, sub.def=NULL, sub.args=NULL, NSAMP=5000, NSAMP0=5e4)

Arguments

t.vec

Numeric vector of (positive) points for which to calculate p-values, i.e. general observed Z-max values. No default.

z.sub

Integer vector of the studies (traits) or subtypes being analyzed. No default.

0, 1 or 2. Search option, with 0 indicating subtype analysis, 1 and 2 denote one-sided and two-sided subset-search. No default.

side

Either 1 or 2. For two-tailed tests (where absolute values of Z-scores are maximized), side should be 2. For one-tailed tests, side should be 1 (positive tail assumed). No default. Ignored when search is 2.

cor.def

A function with at least 3 arguments which calculates correlation between its first argument (a subset) and its second argument (subsets such as its neighbors). The third argument is the number of traits/subtypes and the function should return a vector of correlations with the neighbors. If NULL or a non-function value is specified, internal default functions for the corresponding search option are used.

cor.args

Other arguments to be passed to cor.def. These can include sample sizes and overlaps of different studies or subtypes and analysis option such as case-control or case-complement that affect the correlation structure. If cor.def is NULL, then ncase and ncntl must be specified in this list.

sizes

Sizes of equivalence classes of traits. By default, no two traits or studies are equivalent. This argument is for internal use.

p.bound

Maximum p-value above which studies are not considered in the maximization. Default is 1. See details.

sub.def

A function to restrict subsets, e.g., order restrictions in subtype analysis. Should accept a subset (a logical vector of size k) as its first argument and should return TRUE if the subset satisfies restrictions and FALSE otherwise. Default is NULL implying all (2^k - 1) subsets are considered in the maximum.

sub.args

Other arguments to be passed to sub.def as list. Default is NULL (i.e. none).

NSAMP

Number of samples from a truncated multivariate normal distribution used to compute the DLM p-value. The default is 5000.

NSAMP0

Number of samples from truncated multivariate normal distribution used to calculate the probability of the truncation region in DLM p-value calculation. For 1-sided subset search this is ignored unless p.bound < 1. The default is 50000.

Value

A numeric vector of estimated p-values.

Details

The function is vectorized to handle blocks of SNPs at a time. This is a helper function that is called internally by h.traits and h.types and should not be called directly. The arguments of this function that have defaults, e.g. sub.def can be customized using the argument pval.args in h.traits and h.types. Specifying a p-value upper bound through p.bound, helps in speeding up the code when the number of traits or subtypes is relatively large. For example if p.bound=0.25 is chosen, on an average (under the null) only a quarter of the traits will be maximized, allowing more traits to be analyzed in a computationally feasible manner.

Note that currently the DLM p-values are stochastic in nature (based on importance sampling). To get replicable results set.seed can be used.

Examples

Run this code


  # A function to define the correlations between a subset and its neighbors
  # Returned values should not exceed the value of 1
  cor.def <- function(subset, neighbors, k, ncase, ncntl) {
    n <- ncol(neighbors)
    mat <- matrix(subset, nrow=k, ncol=n, byrow=FALSE)
    cor <- (mat + neighbors)*(1:k)/(k^2)
    cor <- colSums(cor)
    cor <- cor/max(cor)
    dim(cor) <- c(n, 1)

    cor  
  }

  # Subset definition
  sub.def <- function(logicalVec, args) {
    # Only allow the cummulative subsets:
    # TRUE FALSE FALSE FALSE ...
    # TRUE TRUE FALSE FALSE ...
    # TRUE TRUE TRUE FALSE ...
    # etc
    sum <- sum(logicalVec)  
    ret <- all(logicalVec[1:sum])

    ret
  }

  k     <- 5
  t.vec <- 1:k
  z.sub <- rep(1, k)

  p.dlm(t.vec, z.sub, 1, 2, cor.def=cor.def, sub.def=sub.def,
         cor.args=list(ncase=rep(1000, k), ncntl=rep(1000,k)))

Run the code above in your browser using DataLab