Regression task. In regression problems, the smoother the function to be fitted to the data, the simpler it shall be. Larger variations in the inputs and/or outputs, on the other hand, usually indicate the existence of more intricate relationships between them.
smoothness(...)# S3 method for default
smoothness(x, y, measures = "all",
summary = c("mean", "sd"), ...)
# S3 method for formula
smoothness(formula, data, measures = "all",
summary = c("mean", "sd"), ...)
Not used.
A data.frame contained only the input attributes.
A response vector with one value for each row/component of x.
A list of measures names or "all"
to include all them.
A list of summarization functions or empty for all values. See
summarization method to more information. (Default:
c("mean", "sd")
)
A formula to define the output column.
A data.frame dataset contained the input and output attributes.
A list named by the requested smoothness measure.
The following measures are allowed for this method:
Output distribution (S1) monitors whether the examples joined in the MST have similar output values. Lower values indicate simpler problems, where the outputs of similar examples in the input space are also next to each other.
Input distribution (S2) measure how similar in the input space are data items with similar outputs based on distance.
Error of a nearest neighbor regressor (S3) calculates the mean squared error of a 1-nearest neighbor regressor using leave-one-out.
Non-linearity of nearest neighbor regressor (S4) calculates the mean squared error of a 1-nearest neighbor regressor to the new randomly interpolated points.
Ana C Lorena and Aron I Maciel and Pericles B C Miranda and Ivan G Costa and Ricardo B C Prudencio. (2018). Data complexity meta-features for regression problems. Machine Learning, 107, 1, 209--246.
Other complexity-measures: balance
,
correlation
, dimensionality
,
linearity
, neighborhood
,
network
, overlapping
# NOT RUN {
## Extract all smoothness measures for regression task
data(cars)
smoothness(speed ~ ., cars)
# }
Run the code above in your browser using DataLab