Learn R Programming

coin (version 1.3-0)

Transformations: Functions for Data Transformation

Description

Transformations for factors and numeric variables.

Usage

id_trafo(x)
rank_trafo(x, ties.method = c("mid-ranks", "random"))
normal_trafo(x, ties.method = c("mid-ranks", "average-scores"))
median_trafo(x, mid.score = c("0", "0.5", "1"))
savage_trafo(x, ties.method = c("mid-ranks", "average-scores"))
consal_trafo(x, ties.method = c("mid-ranks", "average-scores"), a = 5)
koziol_trafo(x, ties.method = c("mid-ranks", "average-scores"), j = 1)
klotz_trafo(x, ties.method = c("mid-ranks", "average-scores"))
mood_trafo(x, ties.method = c("mid-ranks", "average-scores"))
ansari_trafo(x, ties.method = c("mid-ranks", "average-scores"))
fligner_trafo(x, ties.method = c("mid-ranks", "average-scores"))
logrank_trafo(x, ties.method = c("mid-ranks", "Hothorn-Lausen",
                                 "average-scores"),
              weight = logrank_weight, …)
logrank_weight(time, n.risk, n.event,
               type = c("logrank", "Gehan-Breslow", "Tarone-Ware", "Prentice",
                        "Prentice-Marek", "Andersen-Borgan-Gill-Keiding",
                        "Fleming-Harrington", "Gaugler-Kim-Liao", "Self"),
               rho = NULL, gamma = NULL)
f_trafo(x)
of_trafo(x, scores = NULL)
zheng_trafo(x, increment = 0.1)
maxstat_trafo(x, minprob = 0.1, maxprob = 1 - minprob)
fmaxstat_trafo(x, minprob = 0.1, maxprob = 1 - minprob)
ofmaxstat_trafo(x, minprob = 0.1, maxprob = 1 - minprob)
trafo(data, numeric_trafo = id_trafo, factor_trafo = f_trafo,
      ordered_trafo = of_trafo, surv_trafo = logrank_trafo,
      var_trafo = NULL, block = NULL)
mcp_trafo(…)

Arguments

x

an object of class "numeric", "factor", "ordered" or "Surv".

ties.method

a character, the method used to handle ties. The score generating function either uses the mid-ranks ("mid-ranks", default) or, in the case of rank_trafo, randomly broken ties ("random"). Alternatively, the average of the scores resulting from applying the score generating function to randomly broken ties are used ("average-scores"). See logrank_test for a detailed description of the methods used in logrank_trafo.

mid.score

a character, the score assigned to observations exactly equal to the median: either 0 ("0", default), 0.5 ("0.5") or 1 ("1"); see median_test.

a

a numeric vector, the values taken as the constant \(a\) in the Conover-Salsburg scores. Defaults to 5.

j

a numeric, the value taken as the constant \(j\) in the Koziol-Nemec scores. Defaults to 1.

weight

a function where the first three arguments must correspond to time, n.risk, and n.event given below. Defaults to logrank_weight.

time

a numeric vector, the ordered distinct time points.

n.risk

a numeric vector, the number of subjects at risk at each time point specified in time.

n.event

a numeric vector, the number of events at each time point specified in time.

type

a character, one of "logrank" (default), "Gehan-Breslow", "Tarone-Ware", "Prentice", "Prentice-Marek", "Andersen-Borgan-Gill-Keiding", "Fleming-Harrington" or "Self"; see logrank_test.

rho

a numeric vector, the \(\rho\) constant when type is "Tarone-Ware", "Fleming-Harrington" or "Self"; see logrank_test. Defaults to NULL, implying 0.5 for type = "Tarone-Ware" and 0 otherwise.

gamma

a numeric vector, the \(\gamma\) constant when type is "Fleming-Harrington" or "Self"; see logrank_test. Defaults to NULL, implying 0.

scores

a numeric vector or list, the scores corresponding to each level of an ordered factor. Defaults to NULL, implying 1:nlevels(x).

increment

a numeric, the score increment between the order-restricted sets of scores. A fraction greater than 0, but smaller than or equal to 1. Defaults to 0.1.

minprob

a numeric, a fraction between 0 and 0.5; see maxstat_test. Defaults to 0.1.

maxprob

a numeric, a fraction between 0.5 and 1; see maxstat_test. Defaults to 1 - minprob.

data

an object of class "data.frame".

numeric_trafo

a function to be applied to elements of class "numeric" in data, returning a matrix with nrow(data) rows and an arbitrary number of columns. Defaults to id_trafo.

factor_trafo

a function to be applied to elements of class "factor" in data, returning a matrix with nrow(data) rows and an arbitrary number of columns. Defaults to f_trafo.

ordered_trafo

a function to be applied to elements of class "ordered" in data, returning a matrix with nrow(data) rows and an arbitrary number of columns. Defaults to of_trafo.

surv_trafo

a function to be applied to elements of class "Surv" in data, returning a matrix with nrow(data) rows and an arbitrary number of columns. Defaults to logrank_trafo.

var_trafo

an optional named list of functions to be applied to the corresponding variables in data. Defaults to NULL.

block

an optional factor whose levels are interpreted as blocks. trafo is applied to each level of block separately. Defaults to NULL.

logrank_trafo(): further arguments to be passed to weight. mcp_trafo(): factor name and contrast matrix (as matrix or character) in a tag = value format for multiple comparisons based on a single unordered factor; see mcp in package multcomp.

Value

A numeric vector or matrix with nrow(x) rows and an arbitrary number of columns. For trafo, a named matrix with nrow(data) rows and an arbitrary number of columns.

Details

The utility functions documented here are used to define specialized test procedures.

id_trafo is the identity transformation.

rank_trafo, normal_trafo, median_trafo, savage_trafo, consal_trafo and koziol_trafo compute rank scores, normal scores, median scores, Savage scores, Conover-Salsburg scores (see neuropathy) and Koziol-Nemec scores, respectively, for location problems.

klotz_trafo, mood_trafo, ansari_trafo and fligner_trafo compute Klotz scores, Mood scores, Ansari-Bradley scores and Fligner-Killeen scores, respectively, for scale problems.

logrank_trafo computes weighted logrank scores for right-censored data, allowing for a user-defined weight function through the weight argument (see GTSG).

f_trafo computes dummy matrices for factors and of_trafo assigns scores to ordered factors. For ordered factors with two levels, the scores are normalized to the \([0, 1]\) range. zheng_trafo computes a finite collection of order-restricted scores for ordered factors (see jobsatisfaction, malformations and vision).

maxstat_trafo, fmaxstat_trafo and ofmaxstat_trafo compute scores for cutpoint problems (see maxstat_test).

trafo applies its arguments to the elements of data according to the classes of the elements. A trafo function with modified default arguments is usually supplied to independence_test via the xtrafo or ytrafo arguments. Fine tuning, i.e., different transformations for different variables, is possible by supplying a named list of functions to the var_trafo argument.

mcp_trafo computes contrast matrices for factors.

Examples

Run this code
# NOT RUN {
## Dummy matrix, two-sample problem (only one column)
f_trafo(gl(2, 3))

## Dummy matrix, K-sample problem (K columns)
x <- gl(3, 2)
f_trafo(x)

## Score matrix
ox <- as.ordered(x)
of_trafo(ox)
of_trafo(ox, scores = c(1, 3:4))
of_trafo(ox, scores = list(s1 = 1:3, s2 = c(1, 3:4)))
zheng_trafo(ox, increment = 1/3)

## Normal scores
y <- runif(6)
normal_trafo(y)

## All together now
trafo(data.frame(x = x, ox = ox, y = y), numeric_trafo = normal_trafo)

## The same, but allows for fine-tuning
trafo(data.frame(x = x, ox = ox, y = y), var_trafo = list(y = normal_trafo))

## Transformations for maximally selected statistics
maxstat_trafo(y)
fmaxstat_trafo(x)
ofmaxstat_trafo(ox)

## Apply transformation blockwise (as in the Friedman test)
trafo(data.frame(y = 1:20), numeric_trafo = rank_trafo, block = gl(4, 5))

## Multiple comparisons
dta <- data.frame(x)
mcp_trafo(x = "Tukey")(dta)

## The same, but useful when specific contrasts are desired
K <- rbind("2 - 1" = c(-1,  1, 0),
           "3 - 1" = c(-1,  0, 1),
           "3 - 2" = c( 0, -1, 1))
mcp_trafo(x = K)(dta)
# }

Run the code above in your browser using DataLab