relative: Relative Landmarking Meta-features

Description

Relative Landmarking measures are landmarking measures using ranking strategy.

Usage

relative(...)
# S3 method for default
relative(
  x,
  y,
  features = "all",
  summary = c("mean", "sd"),
  size = 1,
  folds = 10,
  score = "accuracy",
  ...
)
# S3 method for formula
relative(
  formula,
  data,
  features = "all",
  summary = c("mean", "sd"),
  size = 1,
  folds = 10,
  score = "accuracy",
  ...
)

Arguments

...

Further arguments passed to the summarization functions.

A data.frame contained only the input attributes.

A factor response vector with one label for each row/component of x.

features

A list of features names or "all" to include all them.

summary

A list of summarization functions or empty for all values. See post.processing method to more information. (Default: c("mean", "sd"))

size

The percentage of examples subsampled. Values different from 1 generate the subsampling-based relative landmarking metafeatures. (Default: 1.0)

folds

The number of k equal size subsamples in k-fold cross-validation.(Default: 10)

score

The evaluation measure used to score the classification performance. c("accuracy", "balanced.accuracy", "kappa"). (Default: "accuracy").

formula

A formula to define the class column.

data

A data.frame dataset contained the input attributes and class. The details section describes the valid values for this group.

Value

A list named by the requested meta-features.

Details

The following features are allowed for this method:

"bestNode": Construct a single decision tree node model induced by the most informative attribute to establish the linear separability (multi-valued).
"eliteNN": Elite nearest neighbor uses the most informative attribute in the dataset to induce the 1-nearest neighbor. With the subset of informative attributes is expected that the models should be noise tolerant (multi-valued).
"linearDiscr": Apply the Linear Discriminant classifier to construct a linear split (non parallel axis) in the data to establish the linear separability (multi-valued).
"naiveBayes": Evaluate the performance of the Naive Bayes classifier. It assumes that the attributes are independent and each example belongs to a certain class based on the Bayes probability (multi-valued).
"oneNN": Evaluate the performance of the 1-nearest neighbor classifier. It uses the euclidean distance of the nearest neighbor to determine how noisy is the data (multi-valued).
"randomNode": Construct a single decision tree node model induced by a random attribute. The combination with "bestNode" measure can establish the linear separability (multi-valued).
"worstNode": Construct a single decision tree node model induced by the worst informative attribute. The combination with "bestNode" measure can establish the linear separability (multi-valued).

References

Johannes Furnkranz, Johann Petrak, Pavel Brazdil, and Carlos Soares. On the use of Fast Subsampling Estimates for Algorithm Recommendation. Technical Report, pages 1-9, 2002.

Examples

Run this code

# NOT RUN {
## Extract all meta-features using formula
relative(Species ~ ., iris)

## Extract some meta-features
relative(iris[1:4], iris[5], c("bestNode", "randomNode", "worstNode"))

## Use another summarization function
relative(Species ~ ., iris, summary=c("min", "median", "max"))

## Use 2 folds and balanced accuracy
relative(Species ~ ., iris, folds=2, score="balanced.accuracy")

## Extract the subsapling relative landmarking
relative(Species ~ ., iris, size=0.7)
# }

Run the code above in your browser using DataLab