Fit a polychotomous regression and multiple classification using linear splines and selected tensor products.
polyclass(data, cov, weight, penalty, maxdim, exclude, include,
additive = FALSE, linear, delete = 2, fit, silent = TRUE,
normweight = TRUE, tdata, tcov, tweight, cv, select, loss, seed)
The output is an object of class polyclass
, organized
to serve as input for plot.polyclass
,
beta.polyclass
,
summary.polyclass
, ppolyclass
(fitted probabilities),
cpolyclass
(fitted classes) and rpolyclass
(random classes).
The function returns a list with the following members:
the command that was executed.
number of covariates.
number of dimensions of the fitted model.
number of classes.
number of basis functions.
number of possible actions that are considered.
matrix of size nbas x (nclass + 4)
. each row is a basis function.
First element: first covariate involved (NA
= constant);
second element: which knot (NA
means: constant or linear);
third element: second covariate involved (NA
means: this is a function
of one variable);
fourth element: knot involved (if the third element is NA
, of no relevance);
fifth, sixth,... element: beta (coefficient) for class one, two, ...
a matrix with ncov
rows.
Covariate i
has row i+1
, time has row 1.
First column: number of knots in this dimension;
other columns: the knots, appended with NA
s to make it a matrix.
in how many sets was the data divided for cross-validation.
Only provided if method = 2
.
the loss matrix used in cross-validation and test set.
Only provided if method = 1
or method = 2
.
the parameter used in the AIC criterion. Only provided if method = 0
.
0 = AIC, 1 = test set, 2 = cross-validation.
column i
gives the range of the i
-th covariate.
matrix with eight or eleven columns. Summarizes fits.
Column one indicates the dimension, column
column two the AIC or loss value, whichever was
used during the model selection
appropriate, column three four and five give the training set log-likelihood,
(misclassification) loss and squared error loss, columns six to
eight give the same information for the test set, column nine (or column
six if method = 0
or method = 2
) indicates whether the
model was fitted during the addition stage (1) or during the deletion stage (0),
column ten and eleven (or seven and eight) the minimum and maximum
penalty parameter for which AIC would have selected this model.
sample size.
the sample size of the test set. Only prvided if method = 1
.
sum of the case weights.
names of the covariates.
(numerical) names of the classes.
the penalty value that was determined optimal by
by cross validation. Only provided if method = 2
.
table with three columns. Column one and two indicate the penalty parameter
range for which the cv-loss in column three would be realized.
Only provided if method = 2
.
the random seed that was used to determine the order
of the cases for cross-validation.
Only provided if method = 2
.
were complete basis functions deleted at once (2), were only individual dimensions deleted (1) or was only the addition stage of the model selection carried out (0)?
moments of basisfunctions. Needed for beta.polyclass
.
if a test set is provided, or if the model is selected using cross validation, was the model selected that minimized (misclassification) loss (0), that maximized test set log-likelihood (1) or that minimized test set squared error loss (2)?
matrix with three columns. The first two elements in a line indicate the subspace to which the line refers. The third element indicates the percentage of variance explained by that subspace.
sum of the test set case weights (only if method = 1
).
vector of classes:
data
should ranges over consecutive integers with 0 or 1 as the minimum value.
covariates: matrix with as many rows as the length of data
.
optional vector of case-weights. Should have the same length as
data
.
the parameter to be used in the AIC criterion if the
model selection is carried out by AIC. The program chooses
the number of knots that minimizes -2 * loglikelihood + penalty * (dimension)
.
The default is to use penalty = log(length(data))
as in BIC. If the model
selection is carried out by cross-validation or using a test set, the
program uses the number of knots that minimizes
loss + penalty * dimension * (loss for smallest model)
. In this case
the default of penalty
is 0.
maximum dimension (default is
\(\min(n, 4 * n^{1/3}*(cl-1)\), where
\(n\) is length(data)
and
\(cl\) the number of classes.
combinations to be excluded - this should be a matrix with 2
columns - if for example exclude[1, 1] = 2
and exclude[1, 2] = 3
no
interaction between covariate 2 and 3 is included. 0 represents time.
those combinations that can be included. Should have the same format
as exclude
. Only one of exclude
and include
can be specified .
should the model selection be restricted to additive models?
vector indicating for which of the variables no knots should
be entered. For example, if linear = c(2, 3)
no knots for either covariate
2 or 3 are entered. 0 represents time.
should complete basis functions be deleted at once (2), should only individual dimensions be deleted (1) or should only the addition stage of the model selection be carried out (0)?
polyclass
object. If fit
is specified, polyclass
adds
basis functions starting with those in fit
.
suppresses the printing of diagnostic output about basis functions added or deleted, Rao-statistics, Wald-statistics and log-likelihoods.
should the weights be normalized so that they average to one? This option has only an effect if the model is selected using AIC.
test set. Should satisfy the same requirements as data
, cov
and
weight
. If
all test set weights are one, tweight
can be omitted. If tdata
and tcov
are
specified, the model selection is carried out using this test set, irrespective
of the input for penalty
or cv
.
in how many subsets should the data be divided for cross-validation? If cv
is
specified and tdata is omitted, the model selection is carried out by
cross-validation.
if a test set is provided, or if the model is selected using cross validation, should the model be select that minimizes (misclassification) loss (0), that maximizes test set log-likelihood (1) or that minimizes test set squared error loss (2)?
a rectangular matrix specifying the loss function, whose
size is the number of
classes times number of actions.
Used for cross-validation and test set model
selection. loss[i, j]
contains the loss for
assigning action j
to an object whose true class is i
.
The default is 1 minus the identity matrix.
loss
does not need to be square.
optional
seed for the random number generator that determines the sequence of the
cases for cross-validation. If the seed has length 12 or more,
the first twelve elements are assumed to be .Random.seed
, otherwise
the function set.seed
is used.
If seed
is 0 or rep(0, 12)
, it is assumed that the user
has already provided a (random) ordering.
If seed
is not provided, while a fit with
an element fit\$seed
is provided,
.Random.seed
is set using set.seed(fit\$seed)
. Otherwise
the present value of .Random.seed
is used.
Charles Kooperberg clk@fredhutch.org.
Charles Kooperberg, Smarajit Bose, and Charles J. Stone (1997). Polychotomous regression. Journal of the American Statistical Association, 92, 117--127.
Charles J. Stone, Mark Hansen, Charles Kooperberg, and Young K. Truong. The use of polynomial splines and their tensor products in extended linear modeling (with discussion) (1997). Annals of Statistics, 25, 1371--1470.
polymars
,
plot.polyclass
,
summary.polyclass
,
beta.polyclass
,
cpolyclass
,
ppolyclass
,
rpolyclass
.
data(iris)
fit.iris <- polyclass(iris[,5], iris[,1:4])
Run the code above in your browser using DataLab