pmml.xgb.Booster: Generate PMML for a xgb.Booster object from the xgboost package

Description

Generate PMML for a xgb.Booster object from the xgboost package

Usage

# S3 method for xgb.Booster
pmml(model, model.name = "xboost_Model",
  app.name = "R", description = "Extreme Gradient Boosting Model",
  copyright = NULL, transforms = NULL, inputFeatureNames = NULL,
  outputLabelName = NULL, outputCategories = NULL,
  xgbDumpFile = NULL, unknownValue = NULL,
  parentInvalidValueTreatment = "returnInvalid",
  childInvalidValueTreatment = "asIs", ...)

Arguments

model

an object created by the 'xgboost' function

model.name

optional; the model name.

app.name

optional; name where the model was created.

description

optional; description of the model.

optional; a copyright statement.

transforms

optional; any pre-processing information from the pmmlTransformations package.

inputFeatureNames

input variable names used in training the model

outputLabelName

name of the predicted field

outputCategories

possible values of the predicted field, for classification models.

xgbDumpFile

name of file saved using 'xgb.dump' function.

unknownValue

optional; a missing value replacement.

parentInvalidValueTreatment

invalid value treatment at the top MiningField level.

childInvalidValueTreatment

invalid value treatment at the model segment MiningField level.

...

further arguments passed to other methods.

Value

PMML representation of the xgb.Booster object.

Details

The xgboost function takes as its input either an xgb.DMatrix object or a numeric matrix. The input field information is not stored in the R model object, hence the field information must be passed on as inputs. This enables the PMML to specify field names in its model representation. The R model object does not store information about the fitted tree structure either. However, this information can be extracted from the xgb.model.dt.tree function and the file saved using the xgb.dump function. The xgboost library is therefore needed in the environmant and this saved file is needed as an input as well.

The following objectives are currently supported: multi:softprob, multi:softmax, binary:logistic.

The pmml exporter will throw an error if the xgboost model model only has one tree.

The exporter only works with numeric matrices. Sparse matrices must be converted to matrix objects before training an xgboost model for the export to work correctly.

Examples

Run this code

# NOT RUN {
# Standard example using the xgboost package example model
# make the xgboost model using xgb.DMatrix object as inputs
# }
# NOT RUN {
library(xgboost)
data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
train <- agaricus.train
test <- agaricus.test
model1 <- xgboost(data = train$data, label = train$label, max_depth = 2,eta = 1, nthread = 2, 
                  nrounds = 2, objective = "binary:logistic")
# }
# NOT RUN {
# the input feature names for the xgb.DMatrix object can be extracted as colnames(train$data)
# the output field name and categories must be inferred. Looking at train$label informs us 
# that the output categories are either 0 or 1. The name cannot be inferred and so will be 
# given a name "prediction1" save the tree information required in an external file
# }
# NOT RUN {
xgb.dump(model1, "model1.dumped.trees")
# }
# NOT RUN {
# Now all requiredinput parameters are known:
# }
# NOT RUN {
pmml(model1,inputFeatureNames=colnames(train$data),outputLabelName="prediction1",
 outputCategories=c("0","1"),xgbDumpFile="model1.dumped.trees")
# }
# NOT RUN {
# use iris dataset to make a multinomial model
# input data as a matrix
# }
# NOT RUN {
model2 <- xgboost(data = as.matrix(iris[,1:4]), label = as.numeric(iris[,5])-1, 
               max_depth = 2, eta = 1, nthread = 2, nrounds = 2, objective = "multi:softprob",
               num_class=3)
# }
# NOT RUN {
               
# The field names are easily extracted from the columnnames and the categories are converted to
# numeric format by xgboost.
# save the tree information file
# }
# NOT RUN {
xgb.dump(model2, "model2.dumped.trees")

pmml(model2,inputFeatureNames=colnames(as.matrix(iris[,1:4])),outputLabelName="Species",
outputCategories=c(1,2,3),xgbDumpFile="model2.dumped.trees")
# }
# NOT RUN {
# }