Learn R Programming

pomdp (version 1.2.4)

regret: Calculate the Regret of a Policy

Description

Calculates the regret of a policy relative to a benchmark policy.

Usage

regret(policy, benchmark, start = NULL)

Value

the regret as a difference of expected long-term rewards.

Arguments

policy

a solved POMDP containing the policy to calculate the regret for.

benchmark

a solved POMDP with the (optimal) policy. Regret is calculated relative to this policy.

start

the used start (belief) state. If NULL then the start (belief) state of the benchmark is used.

Author

Michael Hahsler

Details

Regret is defined as \(V^{\pi^*}(s_0) - V^{\pi}(s_0)\) with \(V^\pi\) representing the expected long-term state value (represented by the value function) given the policy \(\pi\) and the start state \(s_0\). For POMDPs the start state is the start belief \(b_0\).

Note that for regret usually the optimal policy \(\pi^*\) is used as the benchmark. Since the optimal policy may not be known, regret relative to the best known policy can be used.

See Also

Other POMDP: MDP2POMDP, POMDP(), accessors, actions(), add_policy(), plot_belief_space(), projection(), reachable_and_absorbing, sample_belief_space(), simulate_POMDP(), solve_POMDP(), solve_SARSOP(), transition_graph(), update_belief(), value_function(), write_POMDP()

Other MDP: MDP(), MDP2POMDP, MDP_policy_functions, accessors, actions(), add_policy(), gridworld, reachable_and_absorbing, simulate_MDP(), solve_MDP(), transition_graph(), value_function()

Examples

Run this code
data(Tiger)

sol_optimal <- solve_POMDP(Tiger)
sol_optimal

# perform exact value iteration for 10 epochs
sol_quick <- solve_POMDP(Tiger, method = "enum", horizon = 10)
sol_quick

regret(sol_quick, benchmark = sol_optimal)

Run the code above in your browser using DataLab