Description
Motivation
Hoping that someone could build out an example using an RL algorithm (e.g. PPO) in a multi-agent environment using LSTMs, which may likely/preferably require the building out of a new MultiAgentLSTM module. This may follow the MultiAgentMLP (https://pytorch.org/rl/main/reference/generated/torchrl.modules.MultiAgentMLP.html) way, however, it's not as straightforward to do with LSTMs or other recurrent architectures.
Solution
Build MultiAgentLSTM module similar to the construction and use of the MultiAgentMLP module -- for the most part a drop-in replacement.
Alternatives
An example using simple LSTM blocks (or the LSTMModule) vice using MultiAgentLSTM, but in a multi-agent setting.
Additional context
Note that this may require changes to LSTMModule (https://pytorch.org/rl/main/reference/generated/torchrl.modules.LSTMModule.html) if that is to be used as a component of MultiAgentLSTM, as there are incompatibilities with using the environment InitTracker primer (https://pytorch.org/rl/main/reference/generated/torchrl.envs.transforms.InitTracker.html) alongside LSTMModule for multi-agent environments/agents. One thing might be to change both LSTMModule and InitTracker primer to accept/place the "is_init" TensorDict key in different locations in a multi-agent environment, i.e. have per-agent or per-group is_init keys vice a single global is_init key for a timestep as is expected with LSTMModule.
Checklist
- I have checked that there is no similar issue in the repo (required)