sar: Fit a SAR model

Description

Fit a SAR model

Usage

sar(...)
# S3 method for data.frame
sar(
  x,
  user = "user",
  item = "item",
  time = "time",
  event = "event",
  weight = "weight",
  ...
)
# S3 method for default
sar(
  user,
  item,
  time,
  event = NULL,
  weight = NULL,
  support_threshold = 1,
  allowed_items = NULL,
  allowed_events = c(Click = 1, RecommendationClick = 2, AddShopCart = 3,
    RemoveShopCart = -1, Purchase = 4),
  by_user = TRUE,
  similarity = c("jaccard", "lift", "count"),
  half_life = 30,
  catalog_data = NULL,
  catalog_formula = item ~ .,
  cold_to_cold = FALSE,
  cold_item_model = NULL,
  ...
)
# S3 method for sar
print(x, ...)

Value

An S3 object representing the SAR model. This is essentially the item-to-item similarity matrix in sparse format, along with the original transaction data used to fit the model.

Arguments

...: For sar(), further arguments to pass to the cold-items feature model.
x: A data frame. For the print method, a SAR model object.
user, item, time, event, weight: For the default method, vectors to use as the user IDs, item IDs, timestamps, event types, and transaction weights for SAR. For the data.frame method, the names of the columns in the data frame x to use for these variables.
support_threshold: The SAR support threshold. Items that do not occur at least this many times in the data will be considered "cold".
allowed_items: A character or factor vector of allowed item IDs to use in the SAR model. If supplied, this will be used to categorise the item IDs in the data.
allowed_events: The allowed values for events, if that argument is supplied. Other values will be discarded.
by_user: Should the analysis be by user ID, or by user ID and timestamp? Defaults to userID only.
similarity: Similarity metric to use; defaults to Jaccard.
half_life: The decay period to use when weighting transactions by age.
catalog_data: A dataset to use for building the cold-items feature model.
catalog_formula: A formula for the feature model used to compute similarities for cold items.
cold_to_cold: Whether the cold-items feature model should include the cold items themselves in the training data, or only warm items.
cold_item_model: The type of model to use for cold item features.

Cold items

SAR has the ability to handle cold items, meaning those which have not been seen by any user, or which have only been seen by a number of users less than support_threshold. This is done by using item features to predict similarities. The method used for this is set by the cold_items_model argument:

If this is NULL (the default), a manual algorithm is used that correlates each feature in turn with similarity, and produces a predicted similarity based on which features two items have in common.
If this is the name of a modelling function, such as "lm" or "randomForest", a model of that type is fit on the features and used to predict similarity. In particular, use "lm" to get a model that is (approximately) equivalent to that used by the Azure web service API.

The data frame and features used for cold items are given by the catalog_data and catalog_formula arguments. catalog_data should be a data frame whose first column is item ID. catalog_formula should be a one-sided formula (no LHS).

This feature is currently experimental, and subject to change.

Details

Smart Adaptive Recommendations (SAR) is a fast, scalable, adaptive algorithm for personalized recommendations based on user transaction history and item descriptions. It produces easily explainable/interpretable recommendations and handles "cold item" and "semi-cold user" scenarios.

Central to how SAR works is an item-to-item co-occurrence matrix, which is based on how many times two items occur for the same users. For example, if a given user buys items \(i_1\) and \(i_2\), then the cell \((i_1, i_2)\) is incremented by 1. From this, an item similarity matrix can be obtained by rescaling the co-occurrences according to a given metric. Options for the metric include Jaccard (the default), lift, and counts (which means no rescaling).

Note that the similarity matrix in SAR thus only includes information on which users transacted which items. It does not include any other information such as item ratings or features, which may be used by other recommender algorithms.

#' The SAR implementation in R should be usable on datasets with up to a few million rows and several thousand items. The main constraint is the size of the similarity matrix, which in turn depends (quadratically) on the number of unique items. The implementation has been successfully tested on the MovieLens 20M dataset, which contains about 138,000 users and 27,000 items. For larger datasets, it is recommended to use the Azure web service API.

Examples

Run this code


data(ms_usage)

## all of these fit the same model:

# fit a SAR model from a series of vectors
mod1 <- sar(user=ms_usage$user, item=ms_usage$item, time=ms_usage$time)

# fit a model from a data frame, naming the variables to use
mod2 <- sar(ms_usage, user="user", item="item", time="time")

# fit a model from a data frame, using default variable names
mod3 <- sar(ms_usage)

Run the code above in your browser using DataLab