MaxstatTest: Maximally Selected Statistics

Description

Testing the independence of a set of ordered or numeric covariates and a response of arbitrary measurement scale against cutpoint alternatives.

Usage

## S3 method for class 'formula':
maxstat_test(formula, data, subset = NULL, weights = NULL, \dots)
## S3 method for class 'IndependenceProblem':
maxstat_test(object, 
    distribution = c("asymptotic", "approximate"), 
    teststat = c("max", "quad"),
    minprob = 0.1, maxprob = 1 - minprob, ...)

Arguments

formula

a formula of the form y ~ x1 + ... + xp | block where y and covariates x1 to xp can be variables measured at arbitrary scales; block is an optional factor for stratification.

data

an optional data frame containing the variables in the model formula.

subset

an optional vector specifying a subset of observations to be used.

weights

an optional formula of the form ~ w defining integer valued weights for the observations.

object

an object inheriting from class IndependenceProblem.

distribution

a character, the null distribution of the test statistic can be approximated by its asymptotic distribution (asymptotic) or via Monte-Carlo resampling (approximate). Alternatively, the functions

teststat

a character, the type of test statistic to be applied: a maximum type statistic (max) or a quadratic form (quad).

minprob

a fraction between 0 and 0.5; consider only cutpoints greater than the minprob * 100 % quantile of x.

maxprob

a fraction between 0.5 and 1; consider only cutpoints smaller than the maxprob * 100 % quantile of x.

...

further arguments to be passed to or from methods.

Value

An object inheriting from class IndependenceTest-class with methods show, statistic, expectation, covariance and pvalue. The null distribution can be inspected by pperm, dperm, qperm and support methods.

Details

The null hypothesis of independence of all covariates to the response y against simple cutpoint alternatives is tested.

For an unordered covariate x, all possible partitions into two groups are evaluated. The cutpoint is then a set of levels defining one of the two groups.

References

Rupert Miller & David Siegmund (1982). Maximally Selected Chi Square Statistics. Biometrics 38, 1011--1016.

Berthold Lausen & Martin Schumacher (1992). Maximally Selected Rank Statistics. Biometrics 48, 73--85.

Torsten Hothorn & Berthold Lausen (2003). On the Exact Distribution of Maximally Selected Rank Statistics. Computational Statistics & Data Analysis 43, 121--137.

Berthold Lausen, Torsten Hothorn, Frank Bretz & Martin Schumacher (2004). Optimally Selected Prognostic Factors. Biometrical Journal 46, 364--374.

J"org M"uller & Torsten Hothorn (2004). Maximally Selected Two-Sample Statistics as a new Tool for the Identification and Assessment of Habitat Factors with an Application to Breeding Bird Communities in Oak Forests. European Journal of Forest Research, 123, 218--228.

Torsten Hothorn & Achim Zeileis (2008). Generalized maximally selected statistics, Biometrics, 64(4), 1263--1269.

Examples

Run this code

### analysis of the tree pipit data in Mueller and Hothorn (2004)
  maxstat_test(counts ~ coverstorey, data = treepipit)

  ### and for all possible covariates (simultaneously)
  mt <- maxstat_test(counts ~ ., data = treepipit)
  show(mt)$estimate

  ### reproduce applications in Sections 7.2 and 7.3 
  ### of Hothorn & Lausen (2003) with limiting distribution

  maxstat_test(Surv(time, event) ~  EF, data = hohnloser, 
      ytrafo = function(data) trafo(data, surv_trafo = function(x) 
         logrank_trafo(x, ties = "HL")))

  data("sphase", package = "TH.data")
  maxstat_test(Surv(RFS, event) ~  SPF, data = sphase,
      ytrafo = function(data) trafo(data, surv_trafo = function(x)
         logrank_trafo(x, ties = "HL")))

Run the code above in your browser using DataLab