Statistical meta-features are the standard statistical measures to describe the numerical properties of a distribution of data. As it requires only numerical attributes, the categorical data are transformed to numerical.
statistical(...)# S3 method for default
statistical(
x,
y,
features = "all",
summary = c("mean", "sd"),
by.class = FALSE,
transform = TRUE,
...
)
# S3 method for formula
statistical(
formula,
data,
features = "all",
summary = c("mean", "sd"),
by.class = FALSE,
transform = TRUE,
...
)
Further arguments passed to the summarization functions.
A data.frame contained only the input attributes.
A factor response vector with one label for each row/component of x.
A list of features names or "all"
to include all them.
The details section describes the valid values for this group.
A list of summarization functions or empty for all values. See
post.processing method to more information. (Default:
c("mean", "sd")
)
A logical value indicating if the meta-features must be computed for each group of samples belonging to different output classes. (Default: FALSE)
A logical value indicating if the categorical attributes
should be transformed. If FALSE
they will be ignored. (Default:
TRUE
)
A formula to define the class column.
A data.frame dataset contained the input attributes and class The details section describes the valid values for this group.
A list named by the requested meta-features.
The following features are allowed for this method:
Canonical correlations between the predictive attributes and the class (multi-valued).
Center of gravity, which is the distance between the instance in the center of the majority class and the instance-center of the minority class.
Absolute attributes correlation, which measure the
correlation between each pair of the numeric attributes in the dataset
(multi-valued). This measure accepts an extra argument called
method = c("pearson", "kendall", "spearman")
. See
cor
for more details.
Absolute attributes covariance, which measure the covariance between each pair of the numeric attributes in the dataset (multi-valued).
Number of the discriminant functions.
Eigenvalues of the covariance matrix (multi-valued).
Geometric mean of attributes (multi-valued).
Harmonic mean of attributes (multi-valued).
Interquartile range of attributes (multi-valued).
Kurtosis of attributes (multi-valued).
Median absolute deviation of attributes (multi-valued).
Maximum value of attributes (multi-valued).
Mean value of attributes (multi-valued).
Median value of attributes (multi-valued).
Minimum value of attributes (multi-valued).
Number of attributes pairs with high correlation
(multi-valued when by.class=TRUE
).
Number of attributes with normal distribution. The
Shapiro-Wilk Normality Test is used to assess if an attribute is or not is
normally distributed (multi-valued only when by.class=TRUE
).
Number of attributes with outliers values. The
Turkey's boxplot algorithm is used to compute if an attributes has or does
not have outliers (multi-valued only when by.class=TRUE
).
Range of Attributes (multi-valued).
Standard deviation of the attributes (multi-valued).
Statistic test for homogeneity of covariances.
Skewness of attributes (multi-valued).
Attributes sparsity, which represents the degree of discreetness of each attribute in the dataset (multi-valued).
Trimmed mean of attributes (multi-valued). It is the arithmetic mean excluding the 20% of the lowest and highest instances.
Attributes variance (multi-valued).
Wilks Lambda.
This method uses simple binarization to transform the categorical attributes
when transform=TRUE
.
Ciro Castiello, Giovanna Castellano, and Anna M. Fanelli. Meta-data: Characterization of input features for meta-learning. In 2nd International Conference on Modeling Decisions for Artificial Intelligence (MDAI), pages 457 - 468, 2005.
Shawkat Ali, and Kate A. Smith. On learning algorithm selection for classification. Applied Soft Computing, volume 6, pages 119 - 138, 2006.
Other meta-features:
clustering()
,
complexity()
,
concept()
,
general()
,
infotheo()
,
itemset()
,
landmarking()
,
model.based()
,
relative()
# NOT RUN {
## Extract all meta-features
statistical(Species ~ ., iris)
## Extract some meta-features
statistical(iris[1:4], iris[5], c("cor", "nrNorm"))
## Extract all meta-features without summarize the results
statistical(Species ~ ., iris, summary=c())
## Use another summarization function
statistical(Species ~ ., iris, summary=c("min", "median", "max"))
## Extract statistical measures using by.class approach
statistical(Species ~ ., iris, by.class=TRUE)
## Do not transform the data (using only categorical attributes)
statistical(Species ~ ., iris, transform=FALSE)
# }
Run the code above in your browser using DataLab