Don't support legacy Python #2

bartvm · 2016-08-16T13:04:07Z

There is really no reason to support Python 2. Python 3 has been out for 8 years now. There are plenty of good articles written about this. Maintaining a dual codebase is a going to be a major pain and it prevents you from using a whole bunch of new Python 3 features (six only gets you so far).

The text was updated successfully, but these errors were encountered:

apaszke · 2016-09-09T01:18:47Z

This looks like a reason to me:

Windows merging

…pytorch#146372) Summary: Pull Request resolved: pytorch#146372 Pull Request resolved: pytorch#144120 # Summary ### Sticky points Cuda-graph rng handling has changed / deviated from original implementation. We will be left with a dangling 'offset' val and confusing naming due to BC ## Dependencies - Flash PR: Dao-AILab/flash-attention#1419 ### Other Points - The BC linter is complaining about losing generate.py and its functions which is not real BC surface cc albanD imported-using-ghimport Test Plan: Imported from OSS Building in dev `buck build @//mode/dev-nosan -c fbcode.nvcc_arch=h100a //caffe2:ATen-cu --show-full-output ` I and Nming the .so I do see that the flash symbols are correctly named: ``` 0000000001c3dfb0 t pytorch_flash::run_mha_bwd(pytorch_flash::Flash_bwd_params&, CUstream_st*)::$_0::operator()() const::{lambda()pytorch#1}::operator()() const::{lambda()pytorch#1}::operator()() const::{lambda()pytorch#7}::operator()() const 0000000001c36080 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()pytorch#2}::operator()() const::{lambda()pytorch#1}::operator()() const::{lambda()pytorch#6}::operator()() const 0000000001c360e0 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()pytorch#2}::operator()() const::{lambda()pytorch#1}::operator()() const::{lambda()pytorch#7}::operator()() const 0000000001c35fc0 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()pytorch#1}::operator()() const::{lambda()pytorch#1}::operator()() const::{lambda()pytorch#6}::operator()() const 0000000001c36020 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()pytorch#1}::operator()() const::{lambda()pytorch#1}::operator()() const::{lambda()pytorch#7}::operator()() const ``` Reviewed By: vkuzo Differential Revision: D68502879 Pulled By: drisspg

…#146372) Summary: Pull Request resolved: #146372 Pull Request resolved: #144120 # Summary ### Sticky points Cuda-graph rng handling has changed / deviated from original implementation. We will be left with a dangling 'offset' val and confusing naming due to BC ## Dependencies - Flash PR: Dao-AILab/flash-attention#1419 ### Other Points - The BC linter is complaining about losing generate.py and its functions which is not real BC surface cc albanD imported-using-ghimport Test Plan: Imported from OSS Building in dev `buck build @//mode/dev-nosan -c fbcode.nvcc_arch=h100a //caffe2:ATen-cu --show-full-output ` I and Nming the .so I do see that the flash symbols are correctly named: ``` 0000000001c3dfb0 t pytorch_flash::run_mha_bwd(pytorch_flash::Flash_bwd_params&, CUstream_st*)::$_0::operator()() const::{lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#7}::operator()() const 0000000001c36080 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()#2}::operator()() const::{lambda()#1}::operator()() const::{lambda()#6}::operator()() const 0000000001c360e0 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()#2}::operator()() const::{lambda()#1}::operator()() const::{lambda()#7}::operator()() const 0000000001c35fc0 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#6}::operator()() const 0000000001c36020 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#7}::operator()() const ``` Reviewed By: vkuzo Differential Revision: D68502879 Pulled By: drisspg

…pytorch#146372) Summary: Pull Request resolved: pytorch#146372 Pull Request resolved: pytorch#144120 # Summary ### Sticky points Cuda-graph rng handling has changed / deviated from original implementation. We will be left with a dangling 'offset' val and confusing naming due to BC ## Dependencies - Flash PR: Dao-AILab/flash-attention#1419 ### Other Points - The BC linter is complaining about losing generate.py and its functions which is not real BC surface cc albanD imported-using-ghimport Test Plan: Imported from OSS Building in dev `buck build @//mode/dev-nosan -c fbcode.nvcc_arch=h100a //caffe2:ATen-cu --show-full-output ` I and Nming the .so I do see that the flash symbols are correctly named: ``` 0000000001c3dfb0 t pytorch_flash::run_mha_bwd(pytorch_flash::Flash_bwd_params&, CUstream_st*)::$_0::operator()() const::{lambda()pytorch#1}::operator()() const::{lambda()pytorch#1}::operator()() const::{lambda()pytorch#7}::operator()() const 0000000001c36080 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()pytorch#2}::operator()() const::{lambda()pytorch#1}::operator()() const::{lambda()pytorch#6}::operator()() const 0000000001c360e0 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()pytorch#2}::operator()() const::{lambda()pytorch#1}::operator()() const::{lambda()pytorch#7}::operator()() const 0000000001c35fc0 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()pytorch#1}::operator()() const::{lambda()pytorch#1}::operator()() const::{lambda()pytorch#6}::operator()() const 0000000001c36020 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()pytorch#1}::operator()() const::{lambda()pytorch#1}::operator()() const::{lambda()pytorch#7}::operator()() const ``` Reviewed By: vkuzo Differential Revision: D68502879 Pulled By: drisspg

…#146372) Summary: Pull Request resolved: #146372 Pull Request resolved: #144120 # Summary ### Sticky points Cuda-graph rng handling has changed / deviated from original implementation. We will be left with a dangling 'offset' val and confusing naming due to BC ## Dependencies - Flash PR: Dao-AILab/flash-attention#1419 ### Other Points - The BC linter is complaining about losing generate.py and its functions which is not real BC surface cc albanD imported-using-ghimport Test Plan: Imported from OSS Building in dev `buck build @//mode/dev-nosan -c fbcode.nvcc_arch=h100a //caffe2:ATen-cu --show-full-output ` I and Nming the .so I do see that the flash symbols are correctly named: ``` 0000000001c3dfb0 t pytorch_flash::run_mha_bwd(pytorch_flash::Flash_bwd_params&, CUstream_st*)::$_0::operator()() const::{lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#7}::operator()() const 0000000001c36080 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()#2}::operator()() const::{lambda()#1}::operator()() const::{lambda()#6}::operator()() const 0000000001c360e0 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()#2}::operator()() const::{lambda()#1}::operator()() const::{lambda()#7}::operator()() const 0000000001c35fc0 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#6}::operator()() const 0000000001c36020 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#7}::operator()() const ``` Reviewed By: vkuzo Differential Revision: D68502879 Pulled By: drisspg

…pytorch#146372) Summary: Pull Request resolved: pytorch#146372 Pull Request resolved: pytorch#144120 # Summary ### Sticky points Cuda-graph rng handling has changed / deviated from original implementation. We will be left with a dangling 'offset' val and confusing naming due to BC ## Dependencies - Flash PR: Dao-AILab/flash-attention#1419 ### Other Points - The BC linter is complaining about losing generate.py and its functions which is not real BC surface cc albanD imported-using-ghimport Test Plan: Imported from OSS Building in dev `buck build @//mode/dev-nosan -c fbcode.nvcc_arch=h100a //caffe2:ATen-cu --show-full-output ` I and Nming the .so I do see that the flash symbols are correctly named: ``` 0000000001c3dfb0 t pytorch_flash::run_mha_bwd(pytorch_flash::Flash_bwd_params&, CUstream_st*)::$_0::operator()() const::{lambda()pytorch#1}::operator()() const::{lambda()pytorch#1}::operator()() const::{lambda()pytorch#7}::operator()() const 0000000001c36080 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()pytorch#2}::operator()() const::{lambda()pytorch#1}::operator()() const::{lambda()pytorch#6}::operator()() const 0000000001c360e0 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()pytorch#2}::operator()() const::{lambda()pytorch#1}::operator()() const::{lambda()pytorch#7}::operator()() const 0000000001c35fc0 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()pytorch#1}::operator()() const::{lambda()pytorch#1}::operator()() const::{lambda()pytorch#6}::operator()() const 0000000001c36020 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()pytorch#1}::operator()() const::{lambda()pytorch#1}::operator()() const::{lambda()pytorch#7}::operator()() const ``` Reviewed By: vkuzo Differential Revision: D68502879 Pulled By: drisspg

…#144120) (#146372) Summary: # Summary ### Sticky points Cuda-graph rng handling has changed / deviated from original implementation. We will be left with a dangling 'offset' val and confusing naming due to BC ## Dependencies - Flash PR: Dao-AILab/flash-attention#1419 ### Other Points - The BC linter is complaining about losing generate.py and its functions which is not real BC surface cc albanD imported-using-ghimport Test Plan: Imported from OSS Building in dev `buck build @//mode/dev-nosan -c fbcode.nvcc_arch=h100a //caffe2:ATen-cu --show-full-output ` I and Nming the .so I do see that the flash symbols are correctly named: ``` 0000000001c3dfb0 t pytorch_flash::run_mha_bwd(pytorch_flash::Flash_bwd_params&, CUstream_st*)::$_0::operator()() const::{lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#7}::operator()() const 0000000001c36080 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()#2}::operator()() const::{lambda()#1}::operator()() const::{lambda()#6}::operator()() const 0000000001c360e0 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()#2}::operator()() const::{lambda()#1}::operator()() const::{lambda()#7}::operator()() const 0000000001c35fc0 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#6}::operator()() const 0000000001c36020 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#7}::operator()() const ``` Reviewed By: vkuzo Differential Revision: D68502879 Pulled By: drisspg Pull Request resolved: #146372 Approved by: https://github.com/jbschlosser

…pytorch#144120) (pytorch#146372) Summary: # Summary ### Sticky points Cuda-graph rng handling has changed / deviated from original implementation. We will be left with a dangling 'offset' val and confusing naming due to BC ## Dependencies - Flash PR: Dao-AILab/flash-attention#1419 ### Other Points - The BC linter is complaining about losing generate.py and its functions which is not real BC surface cc albanD imported-using-ghimport Test Plan: Imported from OSS Building in dev `buck build @//mode/dev-nosan -c fbcode.nvcc_arch=h100a //caffe2:ATen-cu --show-full-output ` I and Nming the .so I do see that the flash symbols are correctly named: ``` 0000000001c3dfb0 t pytorch_flash::run_mha_bwd(pytorch_flash::Flash_bwd_params&, CUstream_st*)::$_0::operator()() const::{lambda()pytorch#1}::operator()() const::{lambda()pytorch#1}::operator()() const::{lambda()pytorch#7}::operator()() const 0000000001c36080 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()pytorch#2}::operator()() const::{lambda()pytorch#1}::operator()() const::{lambda()pytorch#6}::operator()() const 0000000001c360e0 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()pytorch#2}::operator()() const::{lambda()pytorch#1}::operator()() const::{lambda()pytorch#7}::operator()() const 0000000001c35fc0 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()pytorch#1}::operator()() const::{lambda()pytorch#1}::operator()() const::{lambda()pytorch#6}::operator()() const 0000000001c36020 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()pytorch#1}::operator()() const::{lambda()pytorch#1}::operator()() const::{lambda()pytorch#7}::operator()() const ``` Reviewed By: vkuzo Differential Revision: D68502879 Pulled By: drisspg Pull Request resolved: pytorch#146372 Approved by: https://github.com/jbschlosser

…pproach" Attempt #2 at #148736 using a traceback approach rather than a global variable approach. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]

…with traceback approach" Attempt #2 at #148736 using a traceback approach rather than a global variable approach. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]

…pproach" Attempt #2 at #148736 using a traceback approach rather than a global variable approach. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]

Summary: fix another combo kernel logging error: File "/home/guorachel/local/fbsource/buck-out/v2/gen/fbcode/4bcbfa3ef39dbd6f/caffe2/test/inductor/__combo_kernels__/combo_kernels#link-tree/torch/_inductor/scheduler.py", line 2036, in _init self.create_combo_kernel_nodes(num_ck_nodes=None) File "/home/guorachel/local/fbsource/buck-out/v2/gen/fbcode/4bcbfa3ef39dbd6f/caffe2/test/inductor/__combo_kernels__/combo_kernels#link-tree/torch/_inductor/scheduler.py", line 3068, in create_combo_kernel_nodes log.debug("ComboKernels: Generating with num_ck_nodes = %d...", num_ck_nodes) Message: 'ComboKernels: Generating with num_ck_nodes = %d...' Arguments: (None,) Test Plan: Verified in test_combo_kernel.py the logging error went away. Differential Revision: D71655949

Summary: fix another combo kernel logging error: ``` File "/home/guorachel/local/fbsource/buck-out/v2/gen/fbcode/4bcbfa3ef39dbd6f/caffe2/test/inductor/__combo_kernels__/combo_kernels#link-tree/torch/_inductor/scheduler.py", line 2036, in _init self.create_combo_kernel_nodes(num_ck_nodes=None) File "/home/guorachel/local/fbsource/buck-out/v2/gen/fbcode/4bcbfa3ef39dbd6f/caffe2/test/inductor/__combo_kernels__/combo_kernels#link-tree/torch/_inductor/scheduler.py", line 3068, in create_combo_kernel_nodes log.debug("ComboKernels: Generating with num_ck_nodes = %d...", num_ck_nodes) Message: 'ComboKernels: Generating with num_ck_nodes = %d...' Arguments: (None,) ``` Test Plan: Verified in test_combo_kernel.py the logging error went away. Differential Revision: D71655949

Summary: fix another combo kernel logging error: File "/home/guorachel/local/fbsource/buck-out/v2/gen/fbcode/4bcbfa3ef39dbd6f/caffe2/test/inductor/__combo_kernels__/combo_kernels#link-tree/torch/_inductor/scheduler.py", line 2036, in _init self.create_combo_kernel_nodes(num_ck_nodes=None) File "/home/guorachel/local/fbsource/buck-out/v2/gen/fbcode/4bcbfa3ef39dbd6f/caffe2/test/inductor/__combo_kernels__/combo_kernels#link-tree/torch/_inductor/scheduler.py", line 3068, in create_combo_kernel_nodes log.debug("ComboKernels: Generating with num_ck_nodes = %d...", num_ck_nodes) Message: 'ComboKernels: Generating with num_ck_nodes = %d...' Arguments: (None,) Test Plan: Verified in test_combo_kernel.py the logging error went away. Differential Revision: D71655949 Pull Request resolved: #149772 Approved by: https://github.com/ColinPeppler, https://github.com/Skylion007

ghstack-source-id: 1ad0b72 Pull Request resolved: #148555

fix lint and syntax errors for Navi31 CI

ghstack-source-id: 73dac9b Pull Request resolved: #148555

ghstack-source-id: f64d8e5 Pull Request resolved: #148555

Update readme

* wmma_op + unit test * add arch limitation to wmma test * change arch limitation * Refactor + Add all type unit test(int4 compile failed) * Add f32_16x16x16_bf16 unit test * tempsave * tempsave * tempsave * runtime bug, cannot find symbol * workaround for incorrect HIP warpSize return value * debugging * tempsave * Correctness OK, waiting for optimization * Tidy up + format * temp save * temp save, reproduce the v_bfi_b32 issue * add inline asm for wmmaop test * tidy up * clean some debug purpose code * discard some codes * clang format * clang format * compiler issue fixed + increase tile size * navi3x_multipleD+example * temp save * workable * batchedgemm[OK], groupconv[debug] * groupconv: Sanity check[OK], Performance[Bad] * navi3x_groupconv_need_optimization * create necessary files * save progress * Add Inter-Row thread transfer * save progress * save debugging progress * sanity check pass * fix a host tensor bug and clean up flash-attn code * format * cancel unnecessary change * cancel unnecessary change * cancel unnecessary change * temp save, add asm backend flag to amd_wmma * Mat-A LDS Bypass sanity pass * temp save * gemm sanity fix * Porting new blockwise gemm to flash attention * Example branch provide to compiler team * tempsave * Fix a bug * batched gemm ported * conv A-skip lds ported * Skip B-Lds real gemm * Skip B Lds Gemm + MulD * batched gemm, conv, skip b lds * format * Attn, skip b lds * Change GridwiseOp nam * fix a typo caused bug * Skip A_Lds sanity pass, Skip B_Lds scratch occured * Bug found, intra-row permute off caused * bug found * a fix * disable buffer load due to incorrect 3rd dword * update fmha config, no scratch generated * update 3rd dword * fmha config update * FMHA, add support to gfx1101/gfx1102 * Merge origin dev (pytorch#2) * [Navi3x] Fix Gridwise_multiple_d operation (pytorch#649) * Add CMake Option "USE_OPT_NAVI3X" * fix bug * standardize docs (pytorch#655) * Separate bibtex requirement from rocm-docs-core (pytorch#656) * separate bibtex requirement from rocm-docs-core * point requirements to source rocm-docs-core repo * Add CMake Option "USE_OPT_NAVI3X" (pytorch#647) * Add CMake Option "USE_OPT_NAVI3X" * remove navi3x opt compile option from cmake script * Conv + quantization + tanh (pytorch#645) * Rename file. Prepare to support another activation * Add comment for quantization * Extract out_elementop * Add tanh example * Add conv + bias + tanh quantization instance * Add missing parameter * Refine cmake * Add external api and client example * Extract variable in example * Fix the comment --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> * Add a denorm test fix (pytorch#603) * Add type_convert implementations for bf16 * Add the fix for conv_fwd * Add the fix for conv_bwd_data * Add the fix for conv_bwd_weight * Format * Format * Another format * Add a macro to use workaround on MI200 only * Format --------- Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> * simplify karg in device/grid of split-k op (pytorch#644) * simplify karg in device/grid split-k op * fix mk_kn_mn instances * add more instances * use name from tensor layout * fix 3rd dword of buffer source descriptor (pytorch#659) * add fp64 instances (pytorch#658) Co-authored-by: root <root@ctr-ubbsmc15.amd.com> * Issue pytorch#666: Revert "simplify karg in device/grid of split-k op (pytorch#644)" (pytorch#665) This reverts commit bb5530a. * Groupnorm + swish external api (pytorch#668) * Rename to proper naming * Add example of groupnorm + swish * Extract duplicate code in example * Add groupnorm + swish instances * Ractor instance generation, split into multiple cpp file * Add external api and client example * Refine profiler message * Use ck math version of exp * Refine problem size in example * Add host version of exp * add a marco to turn on/off denorm fix (off by default) (pytorch#673) * add a marco to turn off denorm fix by default * expose the marco --------- Co-authored-by: root <root@ctr-ubbsmc15.amd.com> * fixed quant example (pytorch#672) Co-authored-by: root <root@ctr-ubbsmc15.amd.com> * Add dependabot config and pin rocm-docs-core (pytorch#663) * [gtest] suppress unsafe buffer warn (pytorch#670) ref: ROCm/MIOpen#1912 * Add memory index guard in wmma device ops (pytorch#667) * Add more macros to turn on/off denorm fix (pytorch#678) Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com> * Fix a typo (pytorch#676) * Add (pytorch#677) * Allow using ROCm release candidate compilers. (pytorch#679) * enable use of rocm5.5 release candidate 4 * upgrade to ROCM5.5 RC5 * try fix the PUB_KEY error, remove the cmake-data package * upgrade to latest cmake version * use private dockerhub repo for rocm5.5 rc5 * add missing bracket * add vector load check * solve conflicts --------- Co-authored-by: Sam Wu <sjwu@ualberta.ca> Co-authored-by: Sam Wu <sam.wu2@amd.com> Co-authored-by: rocking5566 <ChunYu.Lai@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com> Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com> Co-authored-by: carlushuang <carlus.huang@amd.com> Co-authored-by: root <root@ctr-ubbsmc15.amd.com> Co-authored-by: Jun Liu <Liu.Jun@amd.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> * Disable SkipLDS & Align AIT api (pytorch#3) * fix layernorm, reduction Ops (pytorch#4) * [Navi3x] Fix Gridwise_multiple_d operation (pytorch#649) * Add CMake Option "USE_OPT_NAVI3X" * fix bug * standardize docs (pytorch#655) * Separate bibtex requirement from rocm-docs-core (pytorch#656) * separate bibtex requirement from rocm-docs-core * point requirements to source rocm-docs-core repo * Add CMake Option "USE_OPT_NAVI3X" (pytorch#647) * Add CMake Option "USE_OPT_NAVI3X" * remove navi3x opt compile option from cmake script * Conv + quantization + tanh (pytorch#645) * Rename file. Prepare to support another activation * Add comment for quantization * Extract out_elementop * Add tanh example * Add conv + bias + tanh quantization instance * Add missing parameter * Refine cmake * Add external api and client example * Extract variable in example * Fix the comment --------- Co-authored-by: zjing14 <zhangjing14@gmail.com> * Add a denorm test fix (pytorch#603) * Add type_convert implementations for bf16 * Add the fix for conv_fwd * Add the fix for conv_bwd_data * Add the fix for conv_bwd_weight * Format * Format * Another format * Add a macro to use workaround on MI200 only * Format --------- Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> * simplify karg in device/grid of split-k op (pytorch#644) * simplify karg in device/grid split-k op * fix mk_kn_mn instances * add more instances * use name from tensor layout * fix 3rd dword of buffer source descriptor (pytorch#659) * add fp64 instances (pytorch#658) Co-authored-by: root <root@ctr-ubbsmc15.amd.com> * Issue pytorch#666: Revert "simplify karg in device/grid of split-k op (pytorch#644)" (pytorch#665) This reverts commit bb5530a. * Groupnorm + swish external api (pytorch#668) * Rename to proper naming * Add example of groupnorm + swish * Extract duplicate code in example * Add groupnorm + swish instances * Ractor instance generation, split into multiple cpp file * Add external api and client example * Refine profiler message * Use ck math version of exp * Refine problem size in example * Add host version of exp * add a marco to turn on/off denorm fix (off by default) (pytorch#673) * add a marco to turn off denorm fix by default * expose the marco --------- Co-authored-by: root <root@ctr-ubbsmc15.amd.com> * fixed quant example (pytorch#672) Co-authored-by: root <root@ctr-ubbsmc15.amd.com> * Add dependabot config and pin rocm-docs-core (pytorch#663) * [gtest] suppress unsafe buffer warn (pytorch#670) ref: ROCm/MIOpen#1912 * Add memory index guard in wmma device ops (pytorch#667) * Add more macros to turn on/off denorm fix (pytorch#678) Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com> * Fix a typo (pytorch#676) * Add (pytorch#677) * Allow using ROCm release candidate compilers. (pytorch#679) * enable use of rocm5.5 release candidate 4 * upgrade to ROCM5.5 RC5 * try fix the PUB_KEY error, remove the cmake-data package * upgrade to latest cmake version * use private dockerhub repo for rocm5.5 rc5 * add missing bracket * Disable SkipLDS & Align AIT api * Update dependabot config (pytorch#682) Co-authored-by: samjwu <samjwu@users.noreply.github.com> * update attn api * solve type_convert bug + enable --------- Co-authored-by: Sam Wu <sjwu@ualberta.ca> Co-authored-by: Sam Wu <sam.wu2@amd.com> Co-authored-by: rocking5566 <ChunYu.Lai@amd.com> Co-authored-by: zjing14 <zhangjing14@gmail.com> Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com> Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com> Co-authored-by: carlushuang <carlus.huang@amd.com> Co-authored-by: root <root@ctr-ubbsmc15.amd.com> Co-authored-by: Jun Liu <Liu.Jun@amd.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: samjwu <samjwu@users.noreply.github.com> Co-authored-by: haocwang <Haocong.WANG@amd.com> * fix typo * Fix attention with causal mask * multiple fix, try ait compile * Add A/B not use LDS pipeline * Clang format, Add gfx1101, gfx1102 support of FMHA example * cancel change of format script * 1. Enable 2-stage global Prefetch ( May cause VGPR spilling) 2. Enable FP16 accumulator blockwise_gemm * clang-format * 1. change blockwise gemm loopover direction from kmn to mnk ( ~1% improvement) 2. change kernel timing mode to 50 warmup + 50 timed repeat * Update low level abstration of blockwise gemm wmma * (2/5) bilinear gemm pass, perf bug: skip a lds has lower performance than skip b lds * (3/5) batched gemm pass, perf bug: skip a lds has lower performance than skip b lds * (4/5) grouped conv pass * (5/5) attention pass, todo: debug lds perf bug * AIT Attention API refactor (pytorch#8) * sanity pass * sanity pass 2 * confirm significant performance regression. * turn on all instances * turn off instance format * Fix bug & tunning & format * DML meta, self_attn+cross_attn * sanity pass * remove useless flag * update tile and problem size used in AIT attention * bug fix in grouped conv supporting check * deprecate inline asm wmma * Bug fix: double lds skip * clang-format * Fix errors in 1. example, fmha 2. gridwise pipeline 3. deviceop, fmha, change some containers from vector to array * part2 of previous commit * clang format * API fix of gridwisegemmpipeline * separate array base and vector base attention tensor transformation * fix gemm * clang format * add gemm fp16 instances * Temp save * fpAintB kernel compile pass * Sanity pass. * Temp save * debug code enabled * Fp16AInt8B_GEMM sanity * MQA implementation * GQA-4 example * tempsave * Compile pass * New implementation of fp16Aint8B Gemm, Acheieve similar math throughput with native fp16 Gemm * format * Todo: fix gemm_bilinear_wmma instances compilation bug * Solve a bug when K1=16 * remove unnecessary changes * Remove tensor layout limitation to LDS usage in tesnor contraction * update self-attention and cross-attention * fix a typo of name * Add arch limiter for fp8 gemm * enable fp8 gemm_xdl for all gfx9 targets * temporarily disable gemm_xdl_fp16_fp8 on MI100/200 * fix the cmake logic for gemm_xdl_fp16_fp8 * re-enable the gemm_xdl_fp16_fp8 on MI100/200 --------- Co-authored-by: aska-0096 <haocwang@amd.com> Co-authored-by: Sam Wu <sjwu@ualberta.ca> Co-authored-by: Sam Wu <sam.wu2@amd.com> Co-authored-by: rocking5566 <ChunYu.Lai@amd.com> Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com> Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com> Co-authored-by: carlushuang <carlus.huang@amd.com> Co-authored-by: root <root@ctr-ubbsmc15.amd.com> Co-authored-by: Jun Liu <Liu.Jun@amd.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: samjwu <samjwu@users.noreply.github.com> Co-authored-by: haocwang <Haocong.WANG@amd.com> Co-authored-by: illsilin <Illia.Silin@amd.com>

Summary: fix another combo kernel logging error: File "/home/guorachel/local/fbsource/buck-out/v2/gen/fbcode/4bcbfa3ef39dbd6f/caffe2/test/inductor/__combo_kernels__/combo_kernels#link-tree/torch/_inductor/scheduler.py", line 2036, in _init self.create_combo_kernel_nodes(num_ck_nodes=None) File "/home/guorachel/local/fbsource/buck-out/v2/gen/fbcode/4bcbfa3ef39dbd6f/caffe2/test/inductor/__combo_kernels__/combo_kernels#link-tree/torch/_inductor/scheduler.py", line 3068, in create_combo_kernel_nodes log.debug("ComboKernels: Generating with num_ck_nodes = %d...", num_ck_nodes) Message: 'ComboKernels: Generating with num_ck_nodes = %d...' Arguments: (None,) Test Plan: Verified in test_combo_kernel.py the logging error went away. Differential Revision: D71655949 Pull Request resolved: pytorch#149772 Approved by: https://github.com/ColinPeppler, https://github.com/Skylion007

soumith closed this as completed Sep 10, 2016

rockcat mentioned this issue Jan 24, 2017

segfault on AMD Athlon CPUs #535

Closed

colesbury referenced this issue in colesbury/pytorch Apr 28, 2017

Lapack functions implementation #2 + fixes after review

f946552

apaszke pushed a commit that referenced this issue Apr 28, 2017

Lapack functions implementation #2 + fixes after review

ed0c11a

apaszke pushed a commit that referenced this issue May 1, 2017

Lapack functions implementation #2 + fixes after review

98d8e0b

Jiaming-Liu pushed a commit to Jiaming-Liu/pytorch that referenced this issue May 18, 2017

Lapack functions implementation pytorch#2 + fixes after review

497bf49

egborbe mentioned this issue May 25, 2017

memory leak(cpu, not gpu) in convolution layer #1272

Closed

ashmeet4293 mentioned this issue Jun 22, 2017

Segmentation fault (core dumped) on Ubuntu 14.04.5 #926

Closed

howard0su mentioned this issue Jul 16, 2017

Dropout segfaults with -OO #1848

Closed

Louis-Tian mentioned this issue Jul 30, 2017

Fork start method is susceptible to deadlocks #2245

Closed

tfriedel pushed a commit to tfriedel/pytorch that referenced this issue Aug 9, 2017

Merge pull request pytorch#2 from peterjc123/windows-new

0881771

Windows merging

tfriedel pushed a commit to tfriedel/pytorch that referenced this issue Aug 9, 2017

changed tests pytorch#2

3896268

yogi81 referenced this issue Aug 16, 2017

libshm needs static libstdc++ on binary build

1d9b10d

jiasenlu mentioned this issue Sep 6, 2017

segfault in python multithreaded setting #1868

Closed

soumith pushed a commit that referenced this issue Oct 5, 2017

Update README to link to NCCL2 #2

d66fb63

M-Eng mentioned this issue Oct 12, 2017

Segfault when using data parallel on PPC64 #3089

Closed

deepCMR mentioned this issue Oct 13, 2017

Multi-GPU K80s #1637

Closed

lightChaserX mentioned this issue Oct 14, 2017

CUDA 9 + CUDNN 7 - seg fault #3081

Closed

dmarnerides mentioned this issue Oct 16, 2017

conv2d Segmentation Fault #3141

Closed

amulyahwr mentioned this issue Dec 2, 2017

Segmentation fault, pytorch version '0.2.0_3' #3985

Closed

gesape mentioned this issue Dec 28, 2017

Segfault on "import torch" #4111

Closed

wineoffree mentioned this issue Apr 5, 2018

Segmentation fault (core dumped) after iterated for thousands of time #6319

Closed

wangyirui mentioned this issue Apr 9, 2018

Segmenta 8000 tion fault (core dumped) #6449

Closed

praveenkumarchandaliya mentioned this issue Apr 12, 2018

x = torch.cat([z,l,k],dim=1) #6545

Closed

mruberry mentioned this issue Apr 25, 2018

Changes incorrect "overlappingIndices" call to correct "maybeOverlappingIndices" #6953

Merged

dimilar mentioned this issue Apr 29, 2018

[caffe2] EigenTranspose problem in math_cpu.cc #6847

Closed

tinglvv mentioned this issue Mar 5, 2025

[CI][CUDA] Move away from cuda12.4, Add cuda12.6 eager CI tests #148602

Closed

cz2h mentioned this issue Mar 10, 2025

Add meta function for out variants of ones,zeros,empty #147350

Closed

blaine-rister mentioned this issue Mar 14, 2025

[Inductor] Permuted memory access pattern for tiled pointwise kernels #148718

Closed

williamwen42 mentioned this issue Mar 18, 2025

[dynamo] recursive-only dont_skip_tracing with traceback approach #149439

Open

IvanKobzarev added a commit that referenced this issue Mar 26, 2025

[custom_ops][perf] Move expensive pytree traversals of tensors to C++ #2

F438

5800927

ghstack-source-id: 1ad0b72 Pull Request resolved: #148555

pytorch-bot bot pushed a commit that referenced this issue Mar 28, 2025

Merge pull request #2 from ROCm/dnikolaev/patch-13-2

c8c67ae

fix lint and syntax errors for Navi31 CI

IvanKobzarev added a commit that referenced this issue Mar 31, 2025

[custom_ops][perf] Move expensive pytree traversals of tensors to C++ #2

c2ba1bc

ghstack-source-id: 73dac9b Pull Request resolved: #148555

IvanKobzarev added a commit that referenced this issue Apr 1, 2025

[custom_ops][perf] Move expensive pytree traversals of tensors to C++ #2

b816809

ghstack-source-id: f64d8e5 Pull Request resolved: #148555

akashveramd pushed a commit to akashveramd/pytorch that referenced this issue Apr 9, 2025

Merge pull request pytorch#2 from asroy/master

bd27ed6

Update readme

cassanof mentioned this issue Apr 9, 2025

[DCP] Allow for rank-specific tensors with duplicate keys #146566

Open

fadara01 mentioned this issue Apr 30, 2025

Implement MKLGenerator #151218

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't support legacy Python #2

Don't support legacy Python #2

Don't support legacy Python #2

Don't support legacy Python #2

Comments