rule_single: Outlying univariate continuous association rule finder

Description

This function allows you to search for association rules on outlying univariate continuous features against a binary label. The predicted label is 0, and the overfitting severity is very high (see: Kaggle's Santander Customer Satisfaction competition). It can be used to score outliers first, then make rules afterwards if needed. Verbosity is automatic and cannot be removed. In case you need this function without verbosity, please compile the package after removing verbose messages.

Usage

rule_single(data, label, train_rows = length(label), iterations = 1000,
  minimal_score = 25, minimal_node = 5, false_negatives = 2,
  seed = 11111, scoring = TRUE, ruling = TRUE)

Arguments

data

The data.frame containing the features to make association rules on, or the scoring matrix. Missing values are not allowed.

label

The target label as an integer vector (each value must be either 0 or 1). 1 must be the miniority label.

train_rows

The rows used for training the association rules. Must be your training set, whose length is equal to length(labels). Defaults to length(label).

iterations

The amount of iterations allowed for limited-memory Gradient Descent

minimal_score

The association rule finder will not accept any node under the allowed outlying score. Defaults to 25.

minimal_node

The association rule finder will not accept any node containing under that specific amount of samples. Defaults to 5.

false_negatives

The association rule will allow at most (false_negatives - 1) false negatives. A higher allows a more permissive algorithm, lower makes it very difficult to converge (or to find any rule at all). Defaults to 2.

seed

The random seed for reproducibility. Defaults to 11111.

scoring

Whether to score features before computing the association rules. Defaults to TRUE.

ruling

Whether to rule features (useful when you only want the scores). Defaults to TRUE.

Value

A list with one to three elements: "scores" the outlying scores for features, "parsed_scores" for the association rule result on specific features, and "output" for the association rule general result per observation.

Examples

Run this code

## Not run: ------------------------------------
# scored_data <- rule_single(data = data, label = NA, scoring = TRUE, ruling = FALSE)
# rules <- rule_single(data = scored_data, label = target,
# iterations = 100, scoring = FALSE, ruling = TRUE)
# preds <- preds[rules$output[(length(target)+1):(nrow(data))] == 0] <- 0
## ---------------------------------------------

Run the code above in your browser using DataLab