10000 Releases · cog-isa/htm-rl · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Releases: cog-isa/htm-rl

Hierarchical Intrinsically Motivated Agent Planning Behavior with Dreaming in Grid Environments

02 Nov 13:31
Compare
Choose a tag to compare

This release marks a new milestone in our efforts to create biologically plausible models of intelligent agents capable of performing a wide range of tasks.

In this release, we present a model of the autonomous agent called HIMA (Hierarchical Intrinsically Motivated Agent). Its modular structure is divided into blocks that consist of unified and reusable sub-blocks. We provide HIMA with a novel hierarchical memory model. Its Spatial Pooler and Temporal Memory sub-blocks are based on corresponding objects from the Hierarchical Temporal Memory model. However, we contribute to it by extending Temporal Memory with external modulation support via feedback connections and the higher-order sequences learning algorithm. The latter enables us to construct a hierarchy that can work with the state and action abstractions. We also propose the Basal Ganglia model and empowerment as two further building sub-blocks, which are responsible for learning the action selection strategy while being driven by the modulated motivation signal. Additionally, we supply our agent with an ability to learn in imagination that we call dreaming. The sparse distributed representation of states and actions is another distinguishing feature of our model. As a result, our contribution is to investigate the representation of abstract context-dependent actions that denote behavioral programs, as well as the ability of the basal ganglia to learn the choosing strategy between partially overlapping actions. Finally, we validate HIMA's ability to aggregate and reuse experience in order to solve RL tasks with changing goals.

Naive random agent with planning

24 Jul 13:24
Compare
Choose a tag to compare

Naive agent based on htm framework described in report. Key features:

  • memorizes all transitions (r, s, a) -> (r', s', a') with a single [single- or multicolumn] Temporal Memory
    • (r, s, a) triplets are encoded into single (s, a, r) SDR
    • every part of SDR is encoded with naive integer encoder without overlaps then concatenated together
  • can infer policy [a1, a2, .. aT] to the rewarding state if it's in the radius N of memorized transitions
    • planning horizon is a hyperparameter
  • make random action if planner fail to make a plan
    • with planning horizon = 0 it degrades to random agent

Agent was tested on three gridworld MDPs (multi_way_v0-2) with different planning horizon and compared with random agent and simple DQN.

Key results:

  • learns (=progresses) faster than DQN
  • even planning horizon 1 is better than random
    • with fixed planning horizon N advantage diminishes as environment complexity grows
  • if planning horizon N is enough to plan to the reward from the initial state, it works perfect after very small number of training episodes (~ equal to the distance to the goal state)
0