Learn R Programming

Contextual: Multi-Armed Bandits in R

Overview

R package facilitating the simulation and evaluation of context-free and contextual Multi-Armed Bandit policies.

The package has been developed to:

  • Ease the implementation, evaluation and dissemination of both existing and new contextual Multi-Armed Bandit policies.
  • Introduce a wider audience to contextual bandit policies' advanced sequential decision strategies.

Package links:

Installation

To install contextual from CRAN:

install.packages('contextual')

To install the development version (requires the devtools package):

install.packages("devtools")
devtools::install_github('Nth-iteration-labs/contextual')

When working on or extending the package, clone its GitHub repository, then do:

install.packages("devtools")
devtools::install_deps(dependencies = TRUE)
devtools::build()
devtools::reload()

clean and rebuild...

Overview of core classes

Contextual consists of six core classes. Of these, the Bandit and Policy classes are subclassed and extended when implementing custom (synthetic or offline) bandits and policies. The other four classes (Agent, Simulator, History, and Plot) are the workhorses of the package, and generally need not be adapted or subclassed.

Documentation

See the demo directory for practical examples and replications of both synthetic and offline (contextual) bandit policy evaluations.

When seeking to extend contextual, it may also be of use to review "Extending Contextual: Frequently Asked Questions", before diving into the source code.

How to replicate figures from two introductory context-free Multi-Armed Bandits texts:

Basic, context-free multi-armed bandit examples:

Examples of both synthetic and offline contextual multi-armed bandit evaluations:

An example how to make use of the optional theta log to create interactive context-free bandit animations:

Some more extensive vignettes to get you started with the package:

Paper offering a general overview of the package's structure & API:

Policies and Bandits

Overview of contextual's growing library of contextual and context-free bandit policies:

GeneralContext-freeContextualOther
Random Oracle Fixed Epsilon-Greedy Epsilon-First UCB1, UCB2 Thompson Sampling BootstrapTS Softmax Gradient GittinsCMAB Naive Epsilon-Greedy Epoch-Greedy LinUCB (General, Disjoint, Hybrid)Linear Thompson Sampling ProbitTS LogitBTSGLMUCB Lock-in Feedback (LiF)

Overview of contextual's bandit library:

Basic SyntheticContextual SyntheticOfflineContinuous
Basic Bernoulli Bandit Basic Gaussian Bandit Contextual Bernoulli Contextual Logit Contextual Hybrid Contextual Linear Contextual WheelReplay Evaluator Bootstrap ReplayPropensity WeightingDirect MethodDoubly RobustContinuum

Alternative parallel backends

By default, "contextual" uses R's built-in parallel package to facilitate parallel evaluation of multiple agents over repeated simulation. See the demo/alternative_parallel_backends directory for several alternative parallel backends:

Maintainers

Robin van Emden: author, maintainer* Maurits Kaptein: supervisor*

* Tilburg University / Jheronimus Academy of Data Science.

If you encounter a clear bug, please file a minimal reproducible example on GitHub.

Copy Link

Version

Install

install.packages('contextual')

Monthly Downloads

54

Version

0.9.8.4

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Last Published

July 25th, 2020

Functions in contextual (0.9.8.4)

BootstrapTSPolicy

Policy: Thompson sampling with the online bootstrap
ContextualBinaryBandit

Bandit: ContextualBinaryBandit
BasicBernoulliBandit

Bandit: BasicBernoulliBandit
ContextualEpsilonGreedyPolicy

Policy: ContextualEpsilonGreedyPolicy with unique linear models
Agent

Agent
ContextualBernoulliBandit

Bandit: Naive Contextual Bernouilli Bandit
ContextualHybridBandit

Bandit: ContextualHybridBandit
ContextualEpochGreedyPolicy

Policy: A Time and Space Efficient Algorithm for Contextual Linear Bandits
BasicGaussianBandit

Bandit: BasicGaussianBandit
Bandit

Bandit: Superclass
LinUCBDisjointOptimizedPolicy

Policy: LinUCB with unique linear models
LinUCBDisjointPolicy

Policy: LinUCB with unique linear models
EpsilonGreedyPolicy

Policy: Epsilon Greedy
LinUCBHybridOptimizedPolicy

Policy: LinUCB with hybrid linear models
Policy

Policy: Superclass
EpsilonFirstPolicy

Policy: Epsilon First
LinUCBGeneralPolicy

Policy: LinUCB with unique linear models
Plot

Plot
ContextualPrecachingBandit

Bandit: ContextualPrecachingBandit
ContextualTSProbitPolicy

Policy: ContextualTSProbitPolicy
Exp3Policy

Policy: Exp3
dec<-

Decrement
ContextualLinTSPolicy

Policy: Linear Thompson Sampling with unique linear models
ContextualLogitBandit

Bandit: ContextualLogitBandit
ContextualLogitBTSPolicy

Policy: ContextualLogitBTSPolicy
History

History
ContextualLinearBandit

Bandit: ContextualLinearBandit
GittinsBrezziLaiPolicy

Policy: Gittins Approximation algorithm for choosing arms in a MAB problem.
formatted_difftime

Format difftime objects
is_rstudio

Check if in RStudio
GradientPolicy

Policy: Gradient
mvrnorm

Simulate from a Multivariate Normal Distribution
which_max_list

Get maximum value in list
plot.history

Plot Method for Contextual History
print.history

Print Method for Contextual History
LifPolicy

Policy: Continuum Bandit Policy with Lock-in Feedback
set_external

Change Default Graphing Device from RStudio
SoftmaxPolicy

Policy: Softmax
OfflineBootstrappedReplayBandit

Bandit: Offline Bootstrapped Replay
LinUCBHybridPolicy

Policy: LinUCB with hybrid linear models
ThompsonSamplingPolicy

Policy: Thompson Sampling
set_global_seed

Set .Random.seed to a pre-saved value
OfflinePropensityWeightingBandit

Bandit: Offline Propensity Weighted Replay
UCB1Policy

Policy: UCB1
OfflineLookupReplayEvaluatorBandit

Bandit: Offline Replay with lookup tables
FixedPolicy

Policy: Fixed Arm
ContextualWheelBandit

Bandit: ContextualWheelBandit
OfflineDirectMethodBandit

Bandit: Offline Direct Methods
ContinuumBandit

Bandit: ContinuumBandit
OfflineDoublyRobustBandit

Bandit: Offline Doubly Robust
get_global_seed

Lookup .Random.seed in global environment
which_max_tied

Get maximum value randomly breaking ties
UCB2Policy

Policy: UCB2
inv

Inverse from Choleski (or QR) Decomposition.
ind

On-the-fly indicator function for use in formulae
one_hot

One Hot Encoding of data.table columns
ones_in_zeroes

A vector of zeroes and ones
sum_of

Sum of list
summary.history

Summary Method for Contextual History
invgamma

The Inverse Gamma Distribution
data_table_factors_to_numeric

Convert all factor columns in data.table to numeric
clipr

Clip vectors
OfflineReplayEvaluatorBandit

Bandit: Offline Replay
OraclePolicy

Policy: Oracle
invlogit

Inverse Logit Function
sim_post

Binomial Posterior Simulator
sherman_morrisson

Sherman-Morrisson inverse
prob_winner

Binomial Win Probability
RandomPolicy

Policy: Random
get_arm_context

Return context vector of an arm
get_full_context

Get full context matrix over all arms
sample_one_of

Sample one element from vector or list
Simulator

Simulator
value_remaining

Potential Value Remaining
inc<-

Increment
var_welford

Welford's variance