8000 GitHub - Petrelli/DQN-for-Lunar-Lander: DQN in Pytorch for training OpenAI's Lunar Lander
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Petrelli/DQN-for-Lunar-Lander

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Q-Network for OpenAI's Lunar Lander (PyTorch)

Result

Lunar Lander after ~2 hours of training

Lunar Lander Environment (source)

STATE : [position_x, position_y, vel_x, vel_y, angle, angular_v, left_leg_on_groud, right_leg_on_ground]

ACTION : [Do Nothing, fire left engine, main engine, right engine] - Discrete(4)

REWARD :

  • moving from top of the screen to landing pad at (0,0) @ zero speed : +100..140
  • If lander moves away from landing pad, it loses reward back
  • Episode finish w. lander crashing : -100
  • Episode finish w. lander coming to rest : +100
  • Each leg ground contact : +10
  • Firing main engine : -0.3/frame
  • Solved : +200

DQN was implemented with following tricks:

  • Fixed Q-target : separate local & target networks
  • Experience Replay : Having a buffer of (state, action, reward, next_state, done) tuples to sample from
  • Double DQN : using target network to evaluate the model- when choosing action maximizing action-value function
  • ε-greedy Policy : choosing non-greedy action with probability = ε (starts at 1 and decays to 0 each episode)

Loss function for DQN

Plot of Scores (= total reward for each episode)

Hyperparameters

  • n_episodes : 4000
  • model architecture : 2 fully connected layers (h=32)
  • reply buffer capacity : 100,000 tuples
  • batch size : 64
  • discount rate, γ : 0.99
  • soft update factor, τ (for target network params) : 0.001 1e-3
  • learning rate : 0.0005
  • update weights every 4 episode steps

Final model checkpoint producing above simulation is in models/ folder.

Sources

About

DQN in Pytorch for training OpenAI's Lunar Lander

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 92.5%
  • Jupyter Notebook 7.5%
0