Fix PER bug. Add loadable memory. N-step TD. #276
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is basically #214 with most of the extra features cut out, because the PER bug needs to get fixed, whether or not the rest ever merges. DQfD, NoisyNets, and the csv logging callback are removed.
Fixes bug discussed in Prioritized Experience Replay #212 where observation indices rotate while their priorities stay static.
Adds loadable memory. The
PartitionedRingBuffer
is similar to the old implementation ofRingBuffer
but the data stays locked to the idx it was originally assigned to. That's what fixed the PER issue, and also makes it pretty easy to load demonstration data and never delete it. I originally used it in DQfD but I think it has more applications than that. A feature like this was requested in [feature-request] Add the possibility to load and store memory. #262.Adds n-step temporal difference error to dqn (only for prioritized memories). This basically involves a change to how the experiences are retrieved from memory. I'd appreciate someone double checking my math on that.