pmml.neighbr: Generate PMML for a neighbr object from the neighbr package.

Description

Generate PMML for a neighbr object from the neighbr package.

Usage

# S3 method for neighbr
pmml(
  model,
  model_name = "kNN_model",
  app_name = "SoftwareAG PMML Generator",
  description = "K Nearest Neighbors Model",
  copyright = NULL,
  model_version = NULL,
  transforms = NULL,
  missing_value_replacement = NULL,
  ...
)

Arguments

model

A neighbr object.

model_name

A name to be given to the PMML model.

app_name

The name of the application that generated the PMML.

description

A descriptive text for the Header element of the PMML.

The copyright notice for the model.

model_version

A string specifying the model version.

transforms

Data transformations.

missing_value_replacement

Value to be used as the 'missingValueReplacement' attribute for all MiningFields.

...

Further arguments passed to or from other methods.

Value

PMML representation of the neighbr object.

Details

The model is represented in the PMML NearestNeighborModel format.

The current version of this converter does not support transformations (transforms must be left as NULL), sets categoricalScoringMethod to "majorityVote", sets continuousScoringMethod to "average", and isTransoformed to "false".

Examples

Run this code

# NOT RUN {
# Continuous features with continuous target, categorical target,
# and neighbor ranking:

library(neighbr)
data(iris)

# Add an ID column to the data for neighbor ranking:
iris$ID <- c(1:150)

# Train set contains all predicted variables, features, and ID column:
train_set <- iris[1:140, ]

# Omit predicted variables and ID column from test set:
test_set <- iris[141:150, -c(4, 5, 6)]

fit <- knn(
  train_set = train_set, test_set = test_set,
  k = 3,
  categorical_target = "Species",
  continuous_target = "Petal.Width",
  comparison_measure = "squared_euclidean",
  return_ranked_neighbors = 3,
  id = "ID"
)

fit_pmml <- pmml(fit)


# Logical features with categorical target and neighbor ranking:

library(neighbr)
data("houseVotes84")

# Remove any rows with N/A elements:
dat <- houseVotes84[complete.cases(houseVotes84), ]

# Change all {yes,no} factors to {0,1}:
feature_names <- names(dat)[!names(dat) %in% c("Class", "ID")]
for (n in feature_names) {
  levels(dat[, n])[levels(dat[, n]) == "n"] <- 0
  levels(dat[, n])[levels(dat[, n]) == "y"] <- 1
}

# Change factors to numeric:
for (n in feature_names) {
  dat[, n] <- as.numeric(levels(dat[, n]))[dat[, n]]
}

# Add an ID column for neighbor ranking:
dat$ID <- c(1:nrow(dat))

# Train set contains features, predicted variable, and ID:
train_set <- dat[1:225, ]

# Test set contains features only:
test_set <- dat[226:232, !names(dat) %in% c("Class", "ID")]

fit <- knn(
  train_set = train_set, test_set = test_set,
  k = 5,
  categorical_target = "Class",
  comparison_measure = "jaccard",
  return_ranked_neighbors = 3,
  id = "ID"
)

fit_pmml <- pmml(fit)
# }
# NOT RUN {
# }