ace: Alternating Conditional Expectations

Description

Uses the alternating conditional expectations algorithm to find the transformations of y and x that maximize the proportion of variation in y explained by x. When x is a matrix, it is transformed so that its columns are equally weighted when predicting y.

Usage

ace(...)
# S3 method for default
ace(
  x,
  y,
  wt = NULL,
  cat = NULL,
  mon = NULL,
  lin = NULL,
  circ = NULL,
  delrsq = 0.01,
  control = NULL,
  on.error = warning,
  ...
)
# S3 method for formula
ace(
  formula,
  data = NULL,
  subset = NULL,
  na.action = getOption("na.action"),
  ...
)
# S3 method for ace
summary(object, ...)
# S3 method for ace
print(x, ..., digits = 4)
# S3 method for ace
plot(
  x,
  ...,
  which = 1:(x$p + 1),
  caption = c(list("Response Y ACE Transformation"), as.list(paste("Carrier",
    rownames(x$x), "ACE Transformation"))),
  xlab = "Original",
  ylab = "Transformed",
  ask = prod(par("mfcol")) < length(which) && dev.interactive()
)

Value

A structure with the following components:

x: the input x matrix.
y: the input y vector.
tx: the transformed x values.
ty: the transformed y values.
rsq: the multiple R-squared value for the transformed values.
l: the codes for cat, mon, ...

Arguments

...: additional arguments which go ignored for ace call. Included for S3 dispatch consistency. They are utilized when using print as they get passed to cat. Also when plotting an ace object they are passed to plot.
x: matrix; A matrix containing the independent variables.
y: numeric; A vector containing the response variable.
wt: numeric; An optional vector of weights.
cat: integer; An optional integer vector specifying which variables assume categorical values. Positive values in cat refer to columns of the x matrix and zero to the response variable. Variables must be numeric, so a character variable should first be transformed with as.numeric() and then specified as categorical.
mon: integer; An optional integer vector specifying which variables are to be transformed by monotone transformations. Positive values in mon refer to columns of the x matrix and zero to the response variable.
lin: integer; An optional integer vector specifying which variables are to be transformed by linear transformations. Positive values in lin refer to columns of the x matrix and zero to the response variable.
circ: integer; An integer vector specifying which variables assume circular (periodic) values. Positive values in circ refer to columns of the x matrix and zero to the response variable.
delrsq: numeric(1); termination threshold. Iteration stops when R-squared changes by less than delrsq in 3 consecutive iterations (default 0.01).
control: named list; control parameters to set. Documented at set_control.
on.error: function; call back for when ierr is not equal to zero. Defaults to warning.
formula: formula; an object of class "formula": a symbolic description of the model to be smoothed.
data: an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which ace is called.
subset: an optional vector specifying a subset of observations to be used in the fitting process. Only used when a formula is specified.
na.action: a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The ‘factory-fresh’ default is na.omit. Another possible value is NULL, no action. Value na.exclude can be useful.
object: an S3 ace object
digits: rounding digits for summary/print
which: when plotting an ace object which plots to produce.
caption: a list of captions for a plot.
xlab: the x-axis label when plotting.
ylab: the y-axis label when plotting.
ask: when plotting should the terminal be asked for input between plots.

References

Breiman and Friedman, Journal of the American Statistical Association (September, 1985).

The R code is adapted from S code for avas() by Tibshirani, in the Statlib S archive; the FORTRAN is a double-precision version of FORTRAN code by Friedman and Spector in the Statlib general archive.

Examples

Run this code


TWOPI <- 8*atan(1)
x <- runif(200,0,TWOPI)
y <- exp(sin(x)+rnorm(200)/2)
a <- ace(x,y)
par(mfrow=c(3,1))
plot(a$y,a$ty)  # view the response transformation
plot(a$x,a$tx)  # view the carrier transformation
plot(a$tx,a$ty) # examine the linearity of the fitted model

# example when x is a matrix
X1 <- 1:10
X2 <- X1^2
X <- cbind(X1,X2)
Y <- 3*X1+X2
a1 <- ace(X,Y)
par(mfrow=c(1,1))
plot(rowSums(a1$tx),a1$y)
(lm(a1$y ~ a1$tx)) # shows that the colums of X are equally weighted

# From D. Wang and M. Murphy (2005), Identifying nonlinear relationships
# regression using the ACE algorithm.  Journal of Applied Statistics,
# 32, 243-258.
X1 <- runif(100)*2-1
X2 <- runif(100)*2-1
X3 <- runif(100)*2-1
X4 <- runif(100)*2-1

# Original equation of Y:
Y <- log(4 + sin(3*X1) + abs(X2) + X3^2 + X4 + .1*rnorm(100))

# Transformed version so that Y, after transformation, is a
# linear function of transforms of the X variables:
# exp(Y) = 4 + sin(3*X1) + abs(X2) + X3^2 + X4

a1 <- ace(cbind(X1,X2,X3,X4),Y)

# For each variable, show its transform as a function of
# the original variable and the of the transform that created it,
# showing that the transform is recovered.
par(mfrow=c(2,1))

plot(X1,a1$tx[,1])
plot(sin(3*X1),a1$tx[,1])

plot(X2,a1$tx[,2])
plot(abs(X2),a1$tx[,2])

plot(X3,a1$tx[,3])
plot(X3^2,a1$tx[,3])

plot(X4,a1$tx[,4])
plot(X4,a1$tx[,4])

plot(Y,a1$ty)
plot(exp(Y),a1$ty)

Run the code above in your browser using DataLab