tune: Tuning Model Parameters

Description

This method is a member function of class "RecoSys" that uses cross validation to tune the model parameters.

The common usage of this method is

r = Reco()
r$tune(train_data, opts = list(dim      = c(10L, 20L),
                               costp_l1 = c(0, 0.1),
                               costp_l2 = c(0.01, 0.1),
                               costq_l1 = c(0, 0.1),
                               costq_l2 = c(0.01, 0.1),
                               lrate    = c(0.01, 0.1))
)

Value

A list with two components:

min: Parameter values with minimum cross validated loss. This is a list that can be passed to the opts argument in $train().

res

A data frame giving the supplied candidate values of tuning parameters, and one column showing the loss function value associated with each combination.

Arguments

r: Object returned by Reco().
train_data: An object of class "DataSource" that describes the source of training data, typically returned by function data_file(), data_memory(), or data_matrix().
opts: A number of candidate tuning parameter values and extra options in the model tuning procedure. See section Parameters and Options for details.

Parameters and Options

The opts argument should be a list that provides the candidate values of tuning parameters and some other options. For tuning parameters (dim, costp_l1, costp_l2, costq_l1, costq_l2, and lrate), users can provide a numeric vector for each one, so that the model will be evaluated on each combination of the candidate values. For other non-tuning options, users should give a single value. If a parameter or option is not set by the user, the program will use a default one.

See below for the list of available parameters and options:

dim: Tuning parameter, the number of latent factors. Can be specified as an integer vector, with default value c(10L, 20L).
costp_l1: Tuning parameter, the L1 regularization cost for user factors. Can be specified as a numeric vector, with default value c(0, 0.1).
costp_l2: Tuning parameter, the L2 regularization cost for user factors. Can be specified as a numeric vector, with default value c(0.01, 0.1).
costq_l1: Tuning parameter, the L1 regularization cost for item factors. Can be specified as a numeric vector, with default value c(0, 0.1).
costq_l2: Tuning parameter, the L2 regularization cost for item factors. Can be specified as a numeric vector, with default value c(0.01, 0.1).
lrate: Tuning parameter, the learning rate, which can be thought of as the step size in gradient descent. Can be specified as a numeric vector, with default value c(0.01, 0.1).
loss: Character string, the loss function. Default is "l2", see section Parameters and Options in $train() for details.
nfold: Integer, the number of folds in cross validation. Default is 5.
niter: Integer, the number of iterations. Default is 20.
nthread: Integer, the number of threads for parallel computing. Default is 1.
nbin: Integer, the number of bins. Must be greater than nthread. Default is 20.
nmf: Logical, whether to perform non-negative matrix factorization. Default is FALSE.
verbose: Logical, whether to show detailed information. Default is FALSE.
progress: Logical, whether to show a progress bar. Default is TRUE.

Author

Yixuan Qiu <https://statr.me>

References

W.-S. Chin, Y. Zhuang, Y.-C. Juan, and C.-J. Lin. A Fast Parallel Stochastic Gradient Method for Matrix Factorization in Shared Memory Systems. ACM TIST, 2015.

W.-S. Chin, Y. Zhuang, Y.-C. Juan, and C.-J. Lin. A Learning-rate Schedule for Stochastic Gradient Methods to Matrix Factorization. PAKDD, 2015.

W.-S. Chin, B.-W. Yuan, M.-Y. Yang, Y. Zhuang, Y.-C. Juan, and C.-J. Lin. LIBMF: A Library for Parallel Matrix Factorization in Shared-memory Systems. Technical report, 2015.

Examples

Run this code

if (FALSE) {
train_set = system.file("dat", "smalltrain.txt", package = "recosystem")
train_src = data_file(train_set)
r = Reco()
set.seed(123) # This is a randomized algorithm
res = r$tune(
    train_src,
    opts = list(dim = c(10, 20, 30),
                costp_l1 = 0, costq_l1 = 0,
                lrate = c(0.05, 0.1, 0.2), nthread = 2)
)
r$train(train_src, opts = res$min)
}

Run the code above in your browser using DataLab