8000 Release Naive random agent with planning · cog-isa/htm-rl · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Naive random agent with planning

Compare
Choose a tag to compare
@pkuderov pkuderov released this 24 Jul 13:24
· 1042 commits to master since this release

Naive agent based on htm framework described in report. Key features:

  • memorizes all transitions (r, s, a) -> (r', s', a') with a single [single- or multicolumn] Temporal Memory
    • (r, s, a) triplets are encoded into single (s, a, r) SDR
    • every part of SDR is encoded with naive integer encoder without overlaps then concatenated together
  • can infer policy [a1, a2, .. aT] to the rewarding state if it's in the radius N of memorized transitions
    • planning horizon is a hyperparameter
  • make random action if planner fail to make a plan
    • with planning horizon = 0 it degrades to random agent

Agent was tested on three gridworld MDPs (multi_way_v0-2) with different planning horizon and compared with random agent and simple DQN.

Key results:

  • learns (=progresses) faster than DQN
  • even planning horizon 1 is better than random
    • with fixed planning horizon N advantage diminishes as environment complexity grows
  • if planning horizon N is enough to plan to the reward from the initial state, it works perfect after very small number of training episodes (~ equal to the distance to the goal state)
0