Automated training, tuning and validation of machine learning models. Models are tuned, resampled and validated on an experimental dataset and trained on the full dataset and validated/tested on external datasets. Classification models tune the probability threshold automatically and returns the results. Each model contains information on performance, model object and evaluation plots.
autoMLmodel(
train,
test = NULL,
score = NULL,
target = NULL,
testSplit = 0.2,
tuneIters = 10,
tuneType = "random",
models = "all",
perMetric = "auc",
varImp = 10,
liftGroup = 50,
maxObs = 10000,
uid = NULL,
pdp = FALSE,
positive = 1,
htmlreport = FALSE,
seed = 1991,
verbose = FALSE
)
List output contains trained models and results
[data.frame | Required] training set
[data.frame | Optional] optional testing set to validate models on. If none is provided, one will be created internally. Default of NULL
[data.frame | Optional] optional score the models on best trained model based on AUC. If none is provided, scorelist will be null. Default of NULL
[integer | Required] if a target is provided classification or regression models will be trained, if left as NULL unsupervised models will be trained. Default of NULL
[numeric | Optional] percentage of data to allocate to the test set. Stratified sampling is done. Default of 0.1
[integer | Optional] number of tuning iterations to search for optimal hyper parameters. Default of 10
[character | Optional] tune method applied, list of options are:
"random" - random search hyperparameter tuning
"frace" - frace uses iterated f-racing algorithm for the best solution from irace package
[character | Optional] which models to train. Default option is all. Please find below the names for each of the methods
randomForest - random forests using the randomForest package
ranger - random forests using the ranger package
xgboost - gradient boosting using xgboost
rpart - decision tree classification using rpart
glmnet - regularised regression from glmnet
logreg - logistic regression from stats
[character | Optional] model validation metric. Default is "auc"
auc - area under the curve; mlr::auc
accuracy - accuracy; mlr::acc
balancedAccuracy - balanced accuracy; mlr::bac
brier - brier score; mlr::brier
f1 - F1 measure; mlr::f1
meanPrecRecall - geometric mean of precision and recall; mlr::gpr
logloss - logarithmic loss; mlr:logloss
[integer | Optional] number of important features to plot
[integer | Optional] lift value to validate the test model performance
[numeric | Optional] number of observations in the experiment training dataset on which models are trained, tuned and resampled. Default of 40,000. If the training dataset has less than 40k observations then all the observations will be used
[character | Optional] unique variables to keep in test output data
[logical | Optional] partial dependence plot for important variables
[character | Optional] positive class for the target variable
[logical | Optional] to view the model outcome in html format
[integer | Optional] random number seed for reproducible results
[logical | Optional] display executions steps on console. Default is FALSE
all the models trained using mlr train function, all of the functionality in mlr package can be applied to the autoMLmodel outcome
autoMLmodel provides below the information of the various machine learning classification models
trainedModels - model level list output contains trained model object, hyper parameters, tuned data, test data, performance and Model plots
results - summary of all trained model result like AUC, Precision, Recall, F1 score
modelexp - model gain chart
predicted_score - predicted score
datasummary - summary of the input data
mlr train
makeLearner
tuneParams
# \donttest{
# Run only Logistic regression model
mymodel <- autoMLmodel( train = heart, test = NULL, target = 'target_var',
testSplit = 0.2, tuneIters = 10, tuneType = "random", models = "logreg",
varImp = 10, liftGroup = 50, maxObs = 4000, uid = NULL, seed = 1991)
# }
Run the code above in your browser using DataLab