Diagnostic plots for globaltest: Global Test diagnostic plots

Description

Plots to visualize the result of a Global Test in terms of the contributions of the covariates and the subjects.

Usage

covariates(object, what = c("p-value", "statistic", "z-score", "weighted"), cluster = "average", alpha = 0.05, sort = TRUE, zoom = FALSE, legend = TRUE, plot = TRUE, colors, alias, help.lines = FALSE, cex.labels = 0.6, pdf, trace)
features(...)
subjects(object, what = c("p-value", "statistic", "z-score", "weighted"), cluster = "average", sort = TRUE, mirror = TRUE, legend = TRUE, colors, alias, help.lines = FALSE, cex.labels = 0.6, pdf)

Arguments

object

A gt.object, usually created by a call to gt. The object must contain only a single test result, unless the pdf argument is used. See the help page of gt.object on reducing such an object in case it contains more than one test.

what

Gives a choice between various presentations of the same plot. See below under details.

cluster

The type of hierarchical clustering performed for the dendrogram. Default is average linkage clustering. For other options, see hclust. Setting cluster = "none" or cluster = FALSE suppresses the dendrogram altogether.

alpha

Parameter between 0 and 1. Sets the level of family-wise error control in the multiple testing procedure performed on the dendrogram. See below under details.

sort

If TRUE, the plot sorts the bars with the most significant covariates and subjects to the left, as far as is possible within the constraints of the dendrogram (if present).

zoom

If TRUE, discards non-significant branches from the dendrogram with the corresponding covariates. This is especially useful for large sets to "zoom" in on the significant results. If no dendrogram is requested, zoom = TRUE discards all covariates that are not significant after Holm multiple testing correction.

legend

If TRUE, draws a legend in the plot. To override the default labels of the legend, legend may also be given as a character vector with the labels of the legend.

plot

If FALSE, suppress all plotting.

colors

The colors to be used for the bars. See rgb for details on color specification.

alias

Optional alternative labels for the bars in the plots. Should be a character vector of the same length as the number of covariates or subjects, respectively.

help.lines

If TRUE, prints grey dotted lines that help connect the dendrogram to the bars.

cex.labels

Magnification factor for the x-axis labels.

pdf

Optional filename (character) of the pdf file to which the plots are to be written. If a filename is provided in pdf, many covariates or subjects plots of multiple tests can be made with a single call to covariates or subjects, writing the results to a pdf file.

trace

If TRUE, prints progress information. Note that printing progress information involves printing of backspace characters, which is not compatible with use of Sweave. Defaults to gt.options()$trace.

mirror

If TRUE, plots the reverse of the scores for the subjects with negative residual response, so that "good" scores are positive for all subjects.

...

All arguments of features are identical to those of covariates.

Value

covariates function returns an object of class gt.object. Several methods are available to access this object: see gt.object. The subjects function returns a matrix. If called to make multiple plots, both functions return NULL.

Details

These two diagnostic plots decompose the test statistics into the contributions of the different covariates and subjects to make the influence of these covariates and subjects visible.

The covariates plot exploits the fact that the global test statistic for a set of alternative covariates can be written as a weighted sum of the global test statistics for each single contributing covariate. By displaying these component global test results in a bar plot the covariates plot gives insight into the subset of covariates that is most responsible for the significant test result. The plot can show the p-values of the component tests on a reversed log scale (the default); their test statistics, with stripes showing their mean and standard deviation under the null hypothesis; the z-scores of these test statistics, standardized to mean zero and standard deviation one; or the weighted test statistics, where the test statistics are multiplied by the relative weight that each covariate carries in the overall test. See the Vignette for more details.

The dendrogram of the covariates plot is based on correlation distance if the directional argument was set to TRUE in the call to gt, and uses absolute correlation distance otherwise. The coloring of the dendrogram is based on the multiple testing procedure of Meinshausen (2008): this procedure controls the family-wise error rate on all 2n-1 hypotheses associated with the subsets of covariates induced by the clustering graph. All significant subsets are colored black; non-significant ones remain grey. This coloring serves as an additional aid to find the subsets of the covariates most contributing to a significant test result.

The features function is a synonym for covariates, using exactly the same arguments.

The subjects plot exploits the fact that the global test can be written as a sum of contributions of each individual. Each of these contributions is itself a test statistic for the same null hypothesis as the full global test, but one which puts a greater weight on the observed information of a specific subject. These test statistic of subject i is significant if, for the other subjects, similarity in the alternative covariates to subject i tends to coincide with similarity in residual response to subject i. Like the covariates plot, the subjects plot can show the p-values of these component tests on a reversed log scale (the default); their test statistics, with stripes showing their mean and standard deviation under the null hypothesis; the z-scores of these test statistics, standardized to mean zero and standard deviation one; or the weighted test statistics, where the test statistics are multiplied by the relative weight that each covariate carries in the overall test. Setting mirror=FALSE reverses the bars of subjects with a negative residual response (not applicable if p-values are plotted). The resulting statistics values have the additional interpretation that they are proportional to the first order estimates of the linear predictors of each subject under the alternative, i.e. subjects with positive values have higher means under the alternative than under the null, and subjects with negative values have lower means under the alternative than under the null. See the Vignette for more details.

The dendrogram of the subjects plot is always based on correlation distance. There is no analogue to Meinshausen's multiple testing method for this dendrogram, so multiple testing is not performed.

References

General theory and properties of the global test are described in

Goeman, Van de Geer and Van Houwelingen (2006) Journal of the Royal Statistical Society, Series B 68 (3) 477-493.

Meinshausen's method for multiple testing

Meinshausen (2008) Biometrika 95 (2) 265-278.

For more references related to applications of the test, see the vignette GlobalTest.pdf included with this package.

Examples

Run this code

    # Simple examples with random data here
    # Real data examples in the Vignette

    # Random data: covariates A,B,C are correlated with Y
    set.seed(1)
    Y <- rnorm(20)
    X <- matrix(rnorm(200), 20, 10)
    X[,1:3] <- X[,1:3] + Y
    colnames(X) <- LETTERS[1:10]

    # Preparation: test
    res <- gt(Y,X)

    # Covariates
    covariates(res)
    covariates(res, what = "w")
    covariates(res, zoom = TRUE)

    # Subjects
    subjects(res)
    subjects(res, what = "w", mirror = FALSE)

    # Change legend, colors or labels
    covariates(res, legend = c("upregulated", "downregulated"))
    covariates(res, col = rainbow(2))
    covariates(res, alias = letters[1:10])

    # Extract data from the plot
    out <- covariates(res)
    result(out)
    extract(out)

Run the code above in your browser using DataLab