Learn R Programming

ECoL (version 0.3.0)

correlation: Measures of feature correlation

Description

Regression task. These measures calculate the correlation of the values of the features to the outputs. If at least one feature is highly correlated to the output, this indicates that simpler functions can be fitted to the data.

Usage

correlation(...)

# S3 method for default correlation(x, y, measures = "all", summary = c("mean", "sd"), ...)

# S3 method for formula correlation(formula, data, measures = "all", summary = c("mean", "sd"), ...)

Arguments

...

Not used.

x

A data.frame contained only the input attributes.

y

A response vector with one value for each row/component of x.

measures

A list of measures names or "all" to include all them.

summary

A list of summarization functions or empty for all values. See summarization method to more information. (Default: c("mean", "sd"))

formula

A formula to define the output column.

data

A data.frame dataset contained the input and output attributes.

Value

A list named by the requested correlation measure.

Details

The following measures are allowed for this method:

"C1"

Maximum feature correlation to the output (C1) calculate the maximum absolute value of the Spearman correlation between each feature and the outputs.

"C2"

Average feature correlation to the output (C2) computes the average of the Spearman correlations of all features to the output.

"C3"

Individual feature efficiency (C3) calculates, for each feature, the number of examples that must be removed from the dataset until a high Spearman correlation value to the output is achieved.

"C4"

Collective feature efficiency (C4) computes the ratio of examples removed from the dataset based on an iterative process of linear fitting between the features and the target attribute.

References

Ana C Lorena and Aron I Maciel and Pericles B C Miranda and Ivan G Costa and Ricardo B C Prudencio. (2018). Data complexity meta-features for regression problems. Machine Learning, 107, 1, 209--246.

See Also

Other complexity-measures: balance, dimensionality, linearity, neighborhood, network, overlapping, smoothness

Examples

Run this code
# NOT RUN {
## Extract all correlation measures for regression task
data(cars)
correlation(speed ~ ., cars)
# }

Run the code above in your browser using DataLab