The object returned by the earth
function.
This is an S3
model of class
"earth"
.
It is a list with the components listed below.
Term refers to a term created during the
forward pass (each line of the output from format.earth
is a term).
Term number 1 is always the intercept.
rss
Residual sum-of-squares (RSS) of the model (summed over all responses,
if y
has multiple columns).
rsq
1-rss/tss
.
R-Squared of the model (calculated over all responses,
and calculated using the weights
argument if it was supplied).
A measure of how well the model fits the training data.
Note that tss
is the total sum-of-squares, sum((y - mean(y))^2)
.
gcv
Generalized Cross Validation (GCV) of the model (summed over all responses).
The GCV is calculated using the penalty
argument.
For details of the GCV calculation, see
equation 30 in Friedman's MARS paper and earth:::get.gcv
.
grsq
1-gcv/gcv.null
.
An estimate of the predictive power of the model (calculated over all responses,
and calculated using the weights
argument if it was supplied).
gcv.null
is the GCV of an intercept-only model.
See “Can GRSq
be negative?” in the vignette.
bx
Matrix of basis functions applied to x
.
Each column corresponds to a selected term.
Each row corresponds to a row in in the input matrix x
,
after taking subset
.
See model.matrix.earth
for an example of bx
handling.
Example bx
:
(Intercept) h(Girth-12.9) h(12.9-Girth) h(Girth-12.9)*h(...
[1,] 1 0.0 4.6 0
[2,] 1 0.0 4.3 0
[3,] 1 0.0 4.1 0
...
dirs
Matrix with one row per MARS term, and with with ij-th element equal to
0
if predictor j is not in term i
-1
if an expression of the form h(const - xj)
is in term i
1
if an expression of the form h(xj - const)
is in term i
2
if predictor j should enter term i linearly
(either because specified by the linpreds
argument or because earth
discovered that a knot was unnecessary).
This matrix includes all terms generated by the forward pass,
including those not in selected.terms
.
Note that here the terms may not all be in pairs, because
although the forward pass add terms as hinged pairs (so both sides of
the hinge are available as building blocks for further terms), it also
deletes linearly dependent terms before handing control to the pruning pass.
Example dirs
:
Girth Height
(Intercept) 0 0 # intercept
h(12.9-Girth) -1 0 # 2nd term uses Girth
h(Girth-12.9) 1 0 # 3rd term uses Girth
h(Girth-12.9)*h(Height-76) 1 1 # 4th term uses Girth and Height
...
cuts
Matrix with ij-th element equal to the cut point (hinge value)
for predictor j in term i.
This matrix includes all terms generated by the forward pass,
including those not in selected.terms
.
Note for programmers: the precedent is to use dirs
for term names etc. and to only use cuts
where cut information needed.
Example cuts
:
Girth Height
(Intercept) 0 0 # intercept, no cuts
h(12.9-Girth) 12.9 0 # 2nd term has cut at 12.9
h(Girth-12.9) 12.9 0 # 3rd term has cut at 12.9
h(Girth-12.9)*h(Height-76) 12.9 76 # 4th term has two cuts
...
prune.terms
A matrix specifying which terms appear in which pruning pass subsets.
The row index of prune.terms
is the model size.
(The model size is the number of terms in the model.
The intercept is counted as a term.)
Each row is a vector of term numbers for the best model of that size.
An element is 0 if the term is not in the model, thus prune.terms
is a
lower triangular matrix, with dimensions nprune x nprune
.
The model selected by the pruning pass is at row number length(selected.terms)
.
Example prune.terms
:
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 0 0 0 0 0 0 # intercept-only model
[2,] 1 2 0 0 0 0 0 # best 2 term model uses terms 1,2
[3,] 1 2 4 0 0 0 0 # best 3 term model uses terms 1,2,4
[4,] 1 2 6 9 0 0 0 # and so on
...
selected.terms
Vector of term numbers in the selected model.
Can be used as a row index vector into cuts
and dirs
.
The first element selected.terms[1]
is always 1, the intercept.
fitted.values
Fitted values.
A matrix with dimensions nrow(y) x ncol(y)
after factors in y
have been expanded.
residuals
Residuals.
A matrix with dimensions nrow(y) x ncol(y)
after factors in y
have been expanded.
coefficients
Regression coefficients.
A matrix with dimensions length(selected.terms) x ncol(y)
after factors in y
have been expanded.
Each column holds the least squares coefficients from regressing that
column of y
on bx
.
The first row holds the intercept coefficient(s).
rss.per.response
A vector of the RSS for each response.
Length is the number of responses, i.e., ncol(y)
after factors in y
have been expanded.
The rss
component above is equal to sum(rss.per.response)
.
rsq.per.response
A vector of the R-Squared for each response
(where R-Squared is calculated using the weights
argument if it was supplied).
Length is the number of responses.
gcv.per.response
A vector of the GCV for each response.
Length is the number of responses.
The gcv
component above is equal to sum(gcv.per.response)
.
grsq.per.response
A vector of the GRSq for each response
(calculated using the weights
argument if it was supplied).
Length is the number of responses.
rss.per.subset
A vector of the RSS
for each model subset generated by the pruning pass.
Length is nprune
.
For multiple responses, the RSS is summed over all responses for each subset.
The rss
above is
rss.per.subset[length(selected.terms)]
.
The RSS of an intercept only-model is rss.per.subset[1]
.
gcv.per.subset
A vector of the GCV for each model in prune.terms
.
Length is nprune
.
For multiple responses, the GCV is summed over all responses for each subset.
The gcv
above is gcv.per.subset[length(selected.terms)]
.
The GCV of an intercept-only model is gcv.per.subset[1]
.
leverages
Diagonal of the hat matrix (from the linear regression of the response on bx
).
penalty,nk,thresh
Copies of the corresponding arguments to earth
.
pmethod,nprune
Copies of the corresponding arguments to earth
.
weights,wp
Copies of the corresponding arguments to earth
.
termcond
Reason the forward pass terminated (an integer).
call
The call used to invoke earth
.
terms
Model frame terms.
This component exists only if the model was built using earth.formula
.
modvars
A matrix specifying which input variables
are used in each column of the model matrix.
(This field is new in earth 5.2.0.)
Columns correspond to columns of the model matrix (same as cols of dirs
, see above).
Rows correspond to variables in the formula.
For example, the formula:
survived ~ age + pclass + sqrt(age) - sex
results in:
attr(terms,"factors")
:
age pclass sqrt(age)
survived 0 0 0 # the response will be dropped
age 1 0 0
pclass 0 1 0
sqrt(age) 0 0 1 # sqrt(age) will be merged with age
sex 0 0 0 # sex is unused and will be dropped
modvars
:
age pclass2nd pclass3rd sqrt(age)
age 1 0 0 1 # age and sqrt(age) use "age"
pclass 0 1 1 0 # pclass2nd and pclass3rd use "pclass"
Note that for models built with earth.default
(x,y
models),
``derived variables'' are not combined in modvars
as they are for formula models.
The row names of modvars
match the column names of x
,
after factor expansion.
Columns in x
named "age"
and "sqrt(age)"
will be treated as two separate variables.
namesx
Variable names in the input data. Deprecated (subsumed by modvars
).
xlevels
This component exists only if the model was built using earth.formula
.
Same as lm
. A record of the levels of the factors used in fitting,
needed under certain conditions by predict.earth
.
levels
This component exists only if the model was built using earth.default
.
Levels of y
if y
is a factor
,
c(FALSE,TRUE)
if y
is logical
,
Else NULL
.
The following fields appear only if earth
's argument keepxy
is TRUE
.
x
,y
,data
,subset
Copies of the corresponding arguments to earth
.
Only exist if keepxy=TRUE
.
The following fields appear only if earth
's glm
argument is used.
glm.list
List of GLM models. Each element is the value returned by earth
's
internal call to glm
for each response.
Thus if there is a single response (or a single binomial pair, see
“Binomial pairs” in the vignette)
this will be a one element list and you access the GLM model with
earth.mod$glm.list[[1]]
.
glm.coefficients
GLM regression coefficients.
Analogous to the coefficients
field described above but for the GLM model(s).
A matrix with dimensions length(selected.terms) x ncol(y)
after factors in y
have been expanded.
Each column holds the coefficients from the GLM regression of that
column of y
on bx
.
This duplicates, for convenience, information buried in glm.list
.
glm.stats
GLM summary statistics such as devratio
, AIC
, and iters
.
glm.bpairs
Is NULL
unless there are paired binomial columns.
Else a logical vector c(TRUE, FALSE)
.
See “Binomial pairs” in the vignette.
Retained for backwards compatibility with old versions of earth.
The following fields appear only if the nfold
argument is greater than 1.
cv.list
List of earth
models, one model for each fold (ncross * nfold
models).
The fold models have two extra fields,
icross
(an integer from 1
to ncross
)
and ifold
(an integer from 1
to nfold
).
To save memory, lengthy fields
in the fold models are removed unless you use keepxy=TRUE
.
The “lengthy fields” are $bx
, $fitted.values
, and $residuals
.
cv.nterms
Vector of length ncross * nfold + 1
.
Number of MARS terms in the model generated at each cross-validation fold,
with the final element being the mean of these.
cv.nvars
Vector of length ncross * nfold + 1
.
Number of predictors in the model generated at each cross-validation fold,
with the final element being the mean of these.
cv.groups
Specifies which cases went into which folds.
Matrix with two columns and number of rows equal to the the number of cases nrow(x)
Elements of the first column specify the cross-validation number, 1:ncross
.
Elements of the second column specify the fold number, 1:nfold
.
cv.rsq.tab
Matrix with ncross * nfold + 1
rows and nresponse+1
columns,
where nresponse
is the number of responses,
i.e., ncol(y)
after factors in y
have been expanded.
The first nresponse
elements of a row are the cv.rsq
's on
the out-of-fold data for each response of the model generated at that row's fold.
(A cv.rsq
is calculated from predictions on the out-of-fold data
using the best model built from the in-fold data;
where “best” means the model was selected using the in-fold GCV.
The R-Squareds are calculated using the weights
argument if it was supplied.
The final column holds the row mean (a weighted mean if wp
if specified)).
The final row holds the column means.
The values in this final row is the mean cv.rsq
printed by summary.earth
.
Example for a single response model (where the mean
column
is redundant but included for uniformity with multiple response models):
y mean
fold1 0.909 0.909
fold2 0.869 0.869
fold3 0.952 0.952
fold4 0.157 0.157
fold5 0.961 0.961
mean 0.769 0.769
Example for a multiple response model:
y1 y2 y3 mean
fold1 0.915 0.951 0.944 0.937
fold2 0.962 0.970 0.970 0.968
fold3 0.914 0.940 0.942 0.932
fold4 0.907 0.929 0.925 0.920
fold5 0.947 0.987 0.979 0.971
mean 0.929 0.955 0.952 0.946
cv.class.rate.tab
Like cv.rsq.tab
but is the classification rate at each fold
i.e. the fraction of classes correctly predicted.
Models with discrete response only.
Calculated with thresh=.5
for binary responses.
For responses with more than two
levels, the final row is the overall classification rate. The other
rows are the classification rates for each level (the level
versus not-the-level), which are usually higher than the overall
classification rate (predicting the level versus not-the-level is
easier than correctly predicting one of many levels).
The weights
argument is ignored for all cross-validation stats except R-Squareds.
cv.maxerr.tab
Like cv.rsq.tab
but is the MaxErr
at each fold.
This is the signed max absolute value at each fold.
Results are aggregated for the final column and final row
using the signed max absolute value.
The signed max absolute value is defined
as the maximum of the absolute difference
between the predicted and observed response values, multiplied
by -1
if the sign of that difference is negative.
cv.auc.tab
Like cv.rsq.tab
but is the AUC
at each fold.
Binomial models only.
cv.cor.tab
Like cv.rsq.tab
but is the cor
at each fold.
Poisson models only.
cv.deviance.tab
Like cv.rsq.tab
but is the MeanDev
at each fold.
Binomial models only.
cv.calib.int.tab
Like cv.rsq.tab
but is the CalibInt
at each fold.
Binomial models only.
cv.calib.slope.tab
Like cv.rsq.tab
but is the CalibSlope
at each fold.
Binomial models only.
cv.oof.rsq.tab
Generated only if keepxy=TRUE
or pmethod="cv"
.
A matrix with ncross * nfold + 1
rows and max.nterms
columns,
Each element holds an out-of-fold RSq (oof.rsq
),
calculated from predictions from the out-of-fold observations using
the model built with the in-fold data. The final row is the mean over
all folds.
The R-Squareds are calculated using the weights
argument if it was supplied.
cv.infold.rsq.tab
Generated only if keepxy=TRUE
.
Like cv.oof.rsq.tab
but from predictions made on the in-fold observations.
cv.oof.fit.tab
Generated only if the varmod.method
argument is used.
Predicted values on the out-of-fold data.
Dataframe with nrow(data)
rows and ncross
columns.
The following field appears only if the varmod.method
is specified.
varmod
An object of class "varmod"
.
See the varmod
help page for a description.
Only appears if the varmod.method
argument is used.
earth