Need support and testing for Adam optimizer for MPS

@vincentqb

🚀 The feature, motivation and pitch

Environment of Mac M2

Python3.10
torch              2.1.0.dev20230717
torchaudio         2.1.0.dev20230717
torchvision        0.15.2a0

I want to use the Adam optimizer to train my model. And got an error:

NotImplementedError: The operator 'aten::lerp.Scalar_out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS

And when I set the PYTORCH_ENABLE_MPS_FALLBACK=1, the training speed is quite slower.

I'm testing the tiny VIT model with minist dataset.

The details is following:
M2 chip takes about 2.4 minutes on CPU with Adam for one epoch.
M2 chip takes about 2.0 minutes on GPU with Adam for one epoch( with PYTORCH_ENABLE_MPS_FALLBACK=1).
M2 chip takes about 30 seconds on GPU with SGD for one epoch.

Alternatives

No response

Additional context

No response

cc @vincentqb @jbschlosser @albanD @janeyx99 @crcrpar @kulinseth @malfet @DenisVieriu97 @razarmehr @abhudev @ezyang @gchanan @zou3519

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions