Windy Gridworld problem for reinforcement learning. Actions include going left, right, up and down. In each column the wind pushes you up a specific number of steps (for the next action). If an action would take you off the grid, you remain in the previous state. For each step you get a reward of -1, until you reach into a terminal state.
[any
] Arguments passed on to makeEnvironment.
makeEnvironment("windy.gridworld", ...)
$step(action)
Take action in environment.
Returns a list with state
, reward
, done
.
$reset()
Resets the done
flag of the environment and returns an initial state.
Useful when starting a new episode.
$visualize()
Visualizes the environment (if there is a visualization function).
This is the gridworld (goal state denoted G, start state denoted S). The last row specifies the upward wind in each column.
. | . | . | . | . | . | . | . | . | . |
. | . | . | . | . | . | . | . | . | . |
. | . | . | . | . | . | . | . | . | . |
S | . | . | . | . | . | . | G | . | . |
. | . | . | . | . | . | . | . | . | . |
. | . | . | . | . | . | . | . | . | . |
. | . | . | . | . | . | . | . | . | . |
. | . | . | . | . | . | . | . | . | . |
. | . | . | . | . | . | . | . | . | . |
. | . | . | . | . | . | . | . | . | . |
0 | 0 | 0 | 1 | 1 | 1 | 2 | 2 | 1 | 0 |
Sutton and Barto (Book draft 2017): Reinforcement Learning: An Introduction Example 6.5
# NOT RUN {
env = makeEnvironment("windy.gridworld")
# }
Run the code above in your browser using DataLab