The following results were not optimized over different hyperparameters, so there is room for improvement.
The evolution of the distribution for the [0, 0, 0, 0] state is the following:
Implicit : étendre aux actions continues https://arxiv.org/pdf/1806.06923.pdf QUOTA : https://arxiv.org/pdf/1811.02073.pdf Quantile regression : c51 qrdqn DISTRIBUTED DISTRIBUTIONAL DETERMINISTIC POLICY GRADIENTS: https://openreview.net/pdf?id=SyZipzbCb