Functions in this page are used to specify the source of data in the recommender system.
They are intended to provide the input argument of functions such as
$tune()
, $train()
, and $predict()
.
Currently three data formats are supported: data file (via function data_file()
),
data in memory as R objects (via function data_memory()
), and data stored as a
sparse matrix (via function data_matrix()
).
data_file(path, index1 = FALSE, ...)data_memory(user_index, item_index, rating = NULL, index1 = FALSE, ...)
data_matrix(mat, ...)
An object of class "DataSource" as required by
$tune()
, $train()
, and $predict()
.
Path to the data file.
Whether the user indices and item indices start with 1
(index1 = TRUE
) or 0 (index1 = FALSE
).
Currently unused.
An integer vector giving the user indices of rating scores.
An integer vector giving the item indices of rating scores.
A numeric vector of the observed entries in the rating matrix.
Can be specified as NULL
for testing data, in which case
it is ignored.
A dgTMatrix
(if it has ratings/values) or ngTMatrix
(if it is binary) sparse matrix, with users corresponding to rows
and items corresponding to columns.
Yixuan Qiu <https://statr.me>
In $tune()
and $train()
, functions in this page
are used to specify the source of training data.
data_file()
expects a text file that describes a sparse matrix
in triplet form, i.e., each line in the file contains three numbers
row col value
representing a number in the rating matrix with its location. In real applications, it typically looks like
user_index item_index rating
The smalltrain.txt
file in the dat
directory of this package
shows an example of training data file.
If the sparse matrix is given as a dgTMatrix
or ngTMatrix
object
(triplets/COO format defined in the Matrix package), then the function
data_matrix()
can be used to specify the data source.
If user index, item index, and ratings are stored as R vectors in memory,
they can be passed to data_memory()
to form the training data source.
By default the user index and item index start with zeros, and the option
index1 = TRUE
can be set if they start with ones.
From version 0.4 recosystem supports two special types of matrix factorization: the binary matrix factorization (BMF), and the one-class matrix factorization (OCMF). BMF requires ratings to take value from \({-1, 1}\), and OCMF requires all the ratings to be positive.
In $predict()
, functions in this page provide the source of
testing data. The testing data have the same format as training data, except
that the value (rating) column is not required, and will be ignored if it is
provided. The smalltest.txt
file in the dat
directory of this
package shows an example of testing data file.
$tune()
, $train()
, $predict()