Tags: lunzizoo/tianshou
Tags
Fix SAC loss explode (thu-ml#333) * change SAC action_bound_method to "clip" (tanh is hardcoded in forward) * docstring update * modelbase -> modelbased
v0.3.2 (thu-ml#292) Throw a warning in ListReplayBuffer. This version update is needed because of thu-ml#289, the previous v0.3.1 cannot work well under torch<=1.6.0 with cuda environment.
Add offline trainer and discrete BCQ algorithm (thu-ml#263) The result needs to be tuned after `done` issue fixed. Co-authored-by: n+e <trinkle23897@gmail.com>
specify the meaning of logits in documentation (thu-ml#238)
change API of train_fn and test_fn (thu-ml#229) train_fn(epoch) -> train_fn(epoch, num_env_step) test_fn(epoch) -> test_fn(epoch, num_env_step)
add PSRL policy (thu-ml#202) Add PSRL policy in tianshou/policy/modelbase/psrl.py. Co-authored-by: n+e <trinkle23897@cmu.edu>
fix critical bugs in MAPolicy and docs update (thu-ml#207) - fix a bug in MAPolicy: `buffer.rew = Batch()` doesn't change `buffer.rew` (thanks mypy) - polish examples/box2d/bipedal_hardcore_sac.py - several docs update - format setup.py and bump version to 0.2.7
code refactor for venv (thu-ml#179) - Refacor code to remove duplicate code - Enable async simulation for all vector envs - Remove `collector.close` and rename `VectorEnv` to `DummyVectorEnv` The abstraction of vector env changed. Prior to this pr, each vector env is almost independent. After this pr, each env is wrapped into a worker, and vector envs differ with their worker type. In fact, users can just use `BaseVectorEnv` with different workers, I keep `SubprocVectorEnv`, `ShmemVectorEnv` for backward compatibility. Co-authored-by: n+e <463003665@qq.com> Co-authored-by: magicly <magicly007@gmail.com>
docs fix and v0.2.5 (thu-ml#156) * pre * update docs * update docs * $ in bash * size -> hidden_layer_size * doctest * doctest again * filter a warning * fix bug * fix examples * test fail * test succ
PreviousNext