MdpEnvironment

[<code>array (n.states x n.states x n.actions)</code>] 
State transition array.

transitions

[<code>matrix (n.states x n.actions)</code>] 
Reward array.

rewards

[<code>integer</code>] 
Optional starting state.
If a vector is given a starting state will be
randomly sampled from this vector whenever <code>reset</code> is called.
Note that states are numerated starting with
0. If <code>initial.state = NULL</code> all non-terminal states are
possible starting states.

initial.state

[<code>any</code>] Arguments passed on to <a rd-options="" href="/link/makeEnvironment?package=reinforcelearn&version=0.2.1" data-mini-rdoc="reinforcelearn::makeEnvironment">makeEnvironment</a>.

Markov Decision Process environment.

Implements reinforcement learning environments and algorithms as described in Sutton & Barto (1998, ISBN:0262193981).
The Q-Learning algorithm can be used with function approximation,
eligibility traces (Singh & Sutton (1996) <doi:10.1007/BF00114726>)
and experience replay (Mnih et al. (2013) <arXiv:1312.5602>).

Markus Dumke

reinforcelearn

Reinforcement Learning

MdpEnvironment function

<code>makeEnvironment("MDP", transitions, rewards, initial.state, ...)</code>

Usage

<ul>
<li><code>$step(action)</code> 
Take action in environment.
Returns a list with <code>state</code>, <code>reward</code>, <code>done</code>.</li>
<li><code>$reset()</code> 
Resets the <code>done</code> flag of the environment and returns an initial state.
Useful when starting a new episode.</li>
<li><code>$visualize()</code> 
Visualizes the environment (if there is a visualization function).</li>
</ul>

Methods

[<code>any</code>] Arguments passed on to <a rd-options='' href='makeEnvironment'>makeEnvironment</a>.

MdpEnvironment: MDP Environment

Description

Arguments

Usage

Methods

Examples