MDP: Define an MDP Problem

Description

Defines all the elements of a finite state-space MDP problem.

Usage

MDP(
  states,
  actions,
  transition_prob,
  reward,
  discount = 0.9,
  horizon = Inf,
  start = "uniform",
  info = NULL,
  name = NA
)
is_solved_MDP(x, stop = FALSE)

Value

The function returns an object of class MDP which is list with the model specification. solve_MDP() reads the object and adds a list element called 'solution'.

Arguments

states: a character vector specifying the names of the states.
actions: a character vector specifying the names of the available actions.
transition_prob: Specifies the transition probabilities between states.
reward: Specifies the rewards dependent on action, states and observations.
discount: numeric; discount rate between 0 and 1.
horizon: numeric; Number of epochs. Inf specifies an infinite horizon.
start: Specifies in which state the MDP starts.
info: A list with additional information.
name: a string to identify the MDP problem.
x: a MDP object.
stop: logical; stop with an error.

Author

Michael Hahsler

Details

Markov decision processes (MDPs) are discrete-time stochastic control process with completely observable states. We implement here MDPs with a finite state space. similar to POMDP models, but without the observation model. The 'observations' column in the the reward specification is always missing.

make_partially_observable() reformulates an MDP as a POMDP by adding an observation model with one observation per state that reveals the current state. This is achieved by adding identity observation probability matrices.

More details on specifying the model components can be found in the documentation for POMDP.

Examples

Run this code

# Michael's Sleepy Tiger Problem is like the POMDP Tiger problem, but
# has completely observable states because the tiger is sleeping in front
# of the door. This makes the problem an MDP.

STiger <- MDP(
  name = "Michael's Sleepy Tiger Problem",
  discount = .9,

  states = c("tiger-left" , "tiger-right"),
  actions = c("open-left", "open-right", "do-nothing"),
  start = "uniform",

  # opening a door resets the problem
  transition_prob = list(
    "open-left" =  "uniform",
    "open-right" = "uniform",
    "do-nothing" = "identity"),

  # the reward helper R_() expects: action, start.state, end.state, observation, value
  reward = rbind(
    R_("open-left",  "tiger-left",  v = -100),
    R_("open-left",  "tiger-right", v =   10),
    R_("open-right", "tiger-left",  v =   10),
    R_("open-right", "tiger-right", v = -100),
    R_("do-nothing",                v =    0)
  )
)

STiger

sol <- solve_MDP(STiger)
sol

policy(sol)
plot_value_function(sol)

# convert the MDP into a POMDP and solve
STiger_POMDP <- make_partially_observable(STiger)
sol2 <- solve_POMDP(STiger_POMDP)
sol2

policy(sol2)
plot_value_function(sol2, ylim = c(80, 120))

Run the code above in your browser using DataLab