This method is a member function of class "RecoSys
"
that uses cross validation to tune the model parameters.
The common usage of this method is
r = Reco()
r$tune(train_data, opts = list(dim = c(10L, 20L),
costp_l1 = c(0, 0.1),
costp_l2 = c(0.01, 0.1),
costq_l1 = c(0, 0.1),
costq_l2 = c(0.01, 0.1),
lrate = c(0.01, 0.1))
)
A list with two components:
min
Parameter values with minimum cross validated loss.
This is a list that can be passed to the
opts
argument in $train()
.
res
A data frame giving the supplied candidate values of tuning parameters, and one column showing the loss function value associated with each combination.
Object returned by Reco
().
An object of class "DataSource" that describes the source
of training data, typically returned by function
data_file()
, data_memory()
,
or data_matrix()
.
A number of candidate tuning parameter values and extra options in the model tuning procedure. See section Parameters and Options for details.
The opts
argument should be a list that provides the candidate values
of tuning parameters and some other options. For tuning parameters (dim
,
costp_l1
, costp_l2
, costq_l1
, costq_l2
,
and lrate
), users can provide a numeric vector for each one, so that
the model will be evaluated on each combination of the candidate values.
For other non-tuning options, users should give a single value. If a parameter
or option is not set by the user, the program will use a default one.
See below for the list of available parameters and options:
dim
Tuning parameter, the number of latent factors.
Can be specified as an integer vector, with default value
c(10L, 20L)
.
costp_l1
Tuning parameter, the L1 regularization cost for user factors.
Can be specified as a numeric vector, with default value
c(0, 0.1)
.
costp_l2
Tuning parameter, the L2 regularization cost for user factors.
Can be specified as a numeric vector, with default value
c(0.01, 0.1)
.
costq_l1
Tuning parameter, the L1 regularization cost for item factors.
Can be specified as a numeric vector, with default value
c(0, 0.1)
.
costq_l2
Tuning parameter, the L2 regularization cost for item factors.
Can be specified as a numeric vector, with default value
c(0.01, 0.1)
.
lrate
Tuning parameter, the learning rate, which can be thought
of as the step size in gradient descent.
Can be specified as a numeric vector, with default value
c(0.01, 0.1)
.
loss
Character string, the loss function. Default is "l2", see
section Parameters and Options in $train()
for details.
nfold
Integer, the number of folds in cross validation. Default is 5.
niter
Integer, the number of iterations. Default is 20.
nthread
Integer, the number of threads for parallel computing. Default is 1.
nbin
Integer, the number of bins. Must be greater than nthread
.
Default is 20.
nmf
Logical, whether to perform non-negative matrix factorization.
Default is FALSE
.
verbose
Logical, whether to show detailed information. Default is
FALSE
.
progress
Logical, whether to show a progress bar. Default is TRUE
.
Yixuan Qiu <https://statr.me>
W.-S. Chin, Y. Zhuang, Y.-C. Juan, and C.-J. Lin. A Fast Parallel Stochastic Gradient Method for Matrix Factorization in Shared Memory Systems. ACM TIST, 2015.
W.-S. Chin, Y. Zhuang, Y.-C. Juan, and C.-J. Lin. A Learning-rate Schedule for Stochastic Gradient Methods to Matrix Factorization. PAKDD, 2015.
W.-S. Chin, B.-W. Yuan, M.-Y. Yang, Y. Zhuang, Y.-C. Juan, and C.-J. Lin. LIBMF: A Library for Parallel Matrix Factorization in Shared-memory Systems. Technical report, 2015.
$train()
if (FALSE) {
train_set = system.file("dat", "smalltrain.txt", package = "recosystem")
train_src = data_file(train_set)
r = Reco()
set.seed(123) # This is a randomized algorithm
res = r$tune(
train_src,
opts = list(dim = c(10, 20, 30),
costp_l1 = 0, costq_l1 = 0,
lrate = c(0.05, 0.1, 0.2), nthread = 2)
)
r$train(train_src, opts = res$min)
}
Run the code above in your browser using DataLab