Releases · cog-isa/htm-rl

This release marks a new milestone in our efforts to create biologically plausible models of intelligent agents capable of performing a wide range of tasks.

In this release, we present a model of the autonomous agent called HIMA (Hierarchical Intrinsically Motivated Agent). Its modular structure is divided into blocks that consist of unified and reusable sub-blocks. We provide HIMA with a novel hierarchical memory model. Its Spatial Pooler and Temporal Memory sub-blocks are based on corresponding objects from the Hierarchical Temporal Memory model. However, we contribute to it by extending Temporal Memory with external modulation support via feedback connections and the higher-order sequences learning algorithm. The latter enables us to construct a hierarchy that can work with the state and action abstractions. We also propose the Basal Ganglia model and empowerment as two further building sub-blocks, which are responsible for learning the action selection strategy while being driven by the modulated motivation signal. Additionally, we supply our agent with an ability to learn in imagination that we call dreaming. The sparse distributed representation of states and actions is another distinguishing feature of our model. As a result, our contribution is to investigate the representation of abstract context-dependent actions that denote behavioral programs, as well as the ability of the basal ganglia to learn the choosing strategy between partially overlapping actions. Finally, we validate HIMA's ability to aggregate and reuse experience in order to solve RL tasks with changing goals.

Naive agent based on htm framework described in report. Key features:

memorizes all transitions (r, s, a) -> (r', s', a') with a single [single- or multicolumn] Temporal Memory
- (r, s, a) triplets are encoded into single (s, a, r) SDR
- every part of SDR is encoded with naive integer encoder without overlaps then concatenated together
can infer policy [a1, a2, .. aT] to the rewarding state if it's in the radius N of memorized transitions
- planning horizon is a hyperparameter
make random action if planner fail to make a plan
- with planning horizon = 0 it degrades to random agent

Agent was tested on three gridworld MDPs (multi_way_v0-2) with different planning horizon and compared with random agent and simple DQN.

Key results:

learns (=progresses) faster than DQN
even planning horizon 1 is better than random
- with fixed planning horizon N advantage diminishes as environment complexity grows
if planning horizon N is enough to plan to the reward from the initial state, it works perfect after very small number of training episodes (~ equal to the distance to the goal state)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Releases: cog-isa/htm-rl

Hierarchical Intrinsically Motivated Agent Planning Behavior with Dreaming in Grid Environments

Uh oh!

Naive random agent with planning

Uh oh!