Deep Q-Network for OpenAI's Lunar Lander (PyTorch)

Result

Lunar Lander Environment (source)

STATE : [position_x, position_y, vel_x, vel_y, angle, angular_v, left_leg_on_groud, right_leg_on_ground]

ACTION : [Do Nothing, fire left engine, main engine, right engine] - Discrete(4)

REWARD :

moving from top of the screen to landing pad at (0,0) @ zero speed : +100..140
If lander moves away from landing pad, it loses reward back
Episode finish w. lander crashing : -100
Episode finish w. lander coming to rest : +100
Each leg ground contact : +10
Firing main engine : -0.3/frame
Solved : +200

DQN was implemented with following tricks:

Fixed Q-target : separate local & target networks
Experience Replay : Having a buffer of (state, action, reward, next_state, done) tuples to sample from
Double DQN : using target network to evaluate the model- when choosing action maximizing action-value function
ε-greedy Policy : choosing non-greedy action with probability = ε (starts at 1 and decays to 0 each episode)

Loss function for DQN

Plot of Scores (= total reward for each episode)

Hyperparameters

n_episodes : 4000
model architecture : 2 fully connected layers (h=32)
reply buffer capacity : 100,000 tuples
batch size : 64
discount rate, γ : 0.99
soft update factor, τ (for target network params) : 0.001 1e-3
learning rate : 0.0005
update weights every 4 episode steps

Final model checkpoint producing above simulation is in models/ folder.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets		assets
models		models
README.md		README.md
Untitled0.ipynb		Untitled0.ipynb
dqn_agent.py		dqn_agent.py
main.py		main.py
model.py		model.py
simluate.py		simluate.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Deep Q-Network for OpenAI's Lunar Lander (PyTorch)

Result

Lunar Lander Environment (source)

DQN was implemented with following tricks:

Loss function for DQN

Plot of Scores (= total reward for each episode)

Hyperparameters

Sources

About

Uh oh!

Releases

Packages

Languages

Petrelli/DQN-for-Lunar-Lander

Folders and files

Latest commit

History

Repository files navigation

Deep Q-Network for OpenAI's Lunar Lander (PyTorch)

Result

Lunar Lander Environment (source)

DQN was implemented with following tricks:

Loss function for DQN

Plot of Scores (= total reward for each episode)

Hyperparameters

Sources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages