8000 [inductor][cpu]functorch_dp_cifar10 and opacus_cifar10 performance regression in 2025-05-24 nightly release · Issue #154598 · pytorch/pytorch · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
[inductor][cpu]functorch_dp_cifar10 and opacus_cifar10 performance regression in 2025-05-24 nightly release #154598
Open
@zxd1997066

Description

@zxd1997066

🐛 Describe the bug

AMP static shape CPP wrapper

suite name thread batch_size_new speed_up_new inductor_new eager_new compilation_latency_new batch_size_old speed_up_old inductor_old eager_old compilation_latency_old Ratio Speedup(New/old) Eager Ratio(old/new) Inductor Ratio(old/new) Compilation_latency_Ratio(old/new)
torchbench functorch_dp_cifar10 multiple 64 0.868102 0.009271905 0.008048959274310001 10.719749 64 1.160976 0.006782472 0.007874287212672 11.024352 0.75 0.98 0.73 1.03
torchbench opacus_cifar10 multiple 64 0.838903 0.010030994 0.008415030959582 11.244563 64 1.18228 0.006808455000000001 0.0080495001774 11.659031 0.71 0.96 0.68 1.04

the bad commit: 768cb73

/workspace/pytorch# bash inductor_single_run.sh multiple inference performance torchbench functorch_dp_cifar10 amp first static cpp
Testing with cpp wrapper.
Testing with inductor.
multi-threads testing....
loading model: 0it [00:00, ?it/s]
cpu  eval  functorch_dp_cifar10
skipping cudagraphs due to cpp wrapper enabled
running benchmark: 100%|█████████████████████████████████████████████████████████████████████████████| 5
7396
0/50 [00:02<00:00, 24.73it/s]
1.139x
WARNING:common:Trying to call the empty_gpu_cache for device: cpu, which is not in list [cuda, xpu]
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks,autograd_captures,autograd_compiles,cudagraph_skips
cpu,functorch_dp_cifar10,64,1.138629,18.863196,38.237371,0.893517,77.563494,86.806938,71,1,0,0,0,0,1

the last good commit: 3c0cbf4

/workspace/pytorch# bash inductor_single_run.sh multiple inference performance torchbench functorch_dp_cifar10 amp first static cpp
Testing with cpp wrapper.
Testing with inductor.
multi-threads testing....
loading model: 0it [00:00, ?it/s]
cpu  eval  functorch_dp_cifar10
skipping cudagraphs due to cpp wrapper enabled
running benchmark: 100%|█████████████████████████████████████████████████████████████████████████████| 50/50 [00:01<00:00, 27.99it/s]
1.431x
WARNING:common:Trying to call the empty_gpu_cache for device: cpu, which is not in list [cuda, xpu]
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks,autograd_captures,autograd_compiles,cudagraph_skips
cpu,functorch_dp_cifar10,64,1.430911,14.841640,38.689340,0.888515,76.825395,86.464922,71,1,0,0,0,0,1

Versions

SW info

name target_branch target_commit refer_branch refer_commit
torchbench main 373ffb19 main 373ffb19
torch main 53ecb81 main 8568dbc
torchvision main 0.19.0a0+d23a6e1 main 0.19.0a0+d23a6e1
torchtext main 0.16.0a0+b0ebddc main 0.16.0a0+b0ebddc
torchaudio main 2.6.0a0+1a8f621 main 2.6.0a0+ea5de17
torchdata main 0.7.1a0+0790338 main 0.7.1a0+0790338
dynamo_benchmarks main nightly main nightly

Repro:
inductor_single_run.sh
bash inductor_single_run.sh multiple inference performance torchbench functorch_dp_cifar10 amp first static cpp
Suspected guilty commit: 768cb73
torchbench-functorch_dp_cifar10-inference-amp-static-cpp-multiple-performance-drop_guilty_commit.log

cc @chauhang @penguinwu @chuanqi129

Metadata

Metadata

Labels

oncall: cpu inductorCPU Inductor issues for Intel team to triageoncall: pt2triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0