Distributed training (multi-node) of a Transformer model
machine-learning tutorial deep-learning pytorch data-parallelism model-parallelism distributed-training gradient-accumulation distributed-data-parallel collective-communication
-
Updated
Apr 10, 2024 - Python