generateFunctionalANOVAData: Generate a functional ANOVA decomposition

Description

Decompose a learned prediction function as a sum of components estimated via partial dependence.

Usage

generateFunctionalANOVAData(obj, input, features, depth = 1L, fun = mean,
  bounds = c(qnorm(0.025), qnorm(0.975)), resample = "none", fmin, fmax,
  gridsize = 10L, ...)

Arguments

obj

[WrappedModel] Result of train.

input

[data.frame | Task] Input data.

features

[character] A vector of feature names contained in the training data. If not specified all features in the input will be used.

depth

[integer(1)] An integer indicating the depth of interaction amongst the features to compute. Default 1.

fun

[function] A function that accepts a numeric vector and returns either a single number such as a measure of location such as the mean, or three numbers, which give a lower bound, a measure of location, and an upper bound. Note if three numbers are returned they must be in this order. Two variables, data and newdata are made available to fun internally via a wrapper. `data` is the training data from `input` and `newdata` contains a single point from the prediction grid for features along with the training data for features not in features. This allows the computation of weights based on comparisons of the prediction grid to the training data. The default is the mean.

bounds

[numeric(2)] The value (lower, upper) the estimated standard error is multiplied by to estimate the bound on a confidence region for a partial dependence. Ignored if predict.type != "se" for the learner. Default is the 2.5 and 97.5 quantiles (-1.96, 1.96) of the Gaussian distribution.

resample

[character(1)] Defines how the prediction grid for each feature is created. If “bootstrap” then values are sampled with replacement from the training data. If “subsample” then values are sampled without replacement from the training data. If “none” an evenly spaced grid between either the empirical minimum and maximum, or the minimum and maximum defined by fmin and fmax, is created. Default is “none”.

fmin

[numeric] The minimum value that each element of features can take. This argument is only applicable if resample = NULL and when the empirical minimum is higher than the theoretical minimum for a given feature. This only applies to numeric features and a NA should be inserted into the vector if the corresponding feature is a factor. Default is the empirical minimum of each numeric feature and NA for factor features.

fmax

[numeric] The maximum value that each element of features can take. This argument is only applicable if resample = "none" and when the empirical maximum is lower than the theoretical maximum for a given feature. This only applies to numeric features and a NA should be inserted into the vector if the corresponding feature is a factor. Default is the empirical maximum of each numeric feature and NA for factor features.

gridsize

[integer(1)] The length of the prediction grid created for each feature. If resample = "bootstrap" or resample = "subsample" then this defines the number of (possibly non-unique) values resampled. If resample = NULL it defines the length of the evenly spaced grid created. Default 10.

...

additional arguments to be passed to predict.

Value

[FunctionalANOVAData]. A named list, which contains the computed effects of the specified depth amongst the features. Object members:

data

[data.frame] Has columns for the prediction: one column for regression and an additional two if bounds are used. The “effect” column specifies which features the prediction corresponds to.

task.desc

[TaskDesc] Task description.

target

The target feature for regression.

features

[character] Features argument input.

interaction

[logical(1)] Whether or not the depth is greater than 1.

References

Giles Hooker, “Discovering additive structure in black box functions.” Proceedings of the 10th ACM SIGKDD international conference on Knowledge discovery and data mining (2004): 575-580.

Examples

Run this code

fit = train("regr.rpart", bh.task)
fa = generateFunctionalANOVAData(fit, bh.task, c("lstat", "crim"), depth = 2L)
plotPartialDependence(fa)

Run the code above in your browser using DataLab