-
Notifications
You must be signed in to change notification settings - Fork 42
install codegen header to torch/include #1405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ccff7d5
to
8620737
Compare
8620737
to
c3e1df3
Compare
@guangyey : this does not seem to work for me, I still don't get headers installed to
|
d6aa4e3
to
242ad4e
Compare
Commit extends existing CUDA test to cover XPU SyclExtension case for the same feature - `py_limited_api`. NOTE: THE CHANGE CAN NOT BE MERGED AS IS Change requires update of the commit pin for torch-xpu-ops. Requires: intel/torch-xpu-ops#1405 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
@guangyey : I see this resolved now after last changes. |
cmake/Codegen.cmake
Outdated
@@ -1,89 +1,95 @@ | |||
if(Codegen_GPU_cmake_included) | |||
if(Codegen_XPU_cmake_included) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's quite a change. I worry we might see more issues with torch code being updated and 2 things getting out of sync:
- torch-xpu-ops Codegen.cmake and related scripts with torch side versions of the same
- ops code in native/xpu folder which you've modified for some include files
Any chance we can start bringing pieces of this code into torch codebase itself? For example, any chance we can stop having Codegen.cmake here on torch-xpu-ops side?
Note: don't consider above as a request to do that in this PR. I am just trying to discuss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @dvrogozh, for your verification. I agree that these two components may be out of sync. However, upstreaming this code to stock PyTorch could be challenging since torch-xpu-ops is out of tree. We can track any out-of-sync issues through our nightly builds and CI unit tests. For now, we may need to address and maintain them ourselves.
242ad4e
to
0b53f33
Compare
@guangyey , overall, the PR looks good to me. However, I prefer to separate the code refinement from the head files installation. Does it make sense to you? |
@guangyey , by the way, are the ci failures relevant to this PR? |
I capture your point. I will separate another PR to install head files. This PR will focus on code refinement. |
I guess it is irrelevant but I am not sure. I will check it. |
# Motivation Following the comments [here](#1405 (comment)), this PR intend to refine code related to codegen and remove redundant code.
3970602
to
0a8ed2e
Compare
@EikanWang, I have separated this PR and rebased it to the latest main branch. May I know if I have addressed your comments? |
The failures are unrelated to this PR. @EikanWang, any comments? |
These failures are irrelevant. Show Failed cases in op_ut with skip
=========================================================================
test_transformers_xpu.py::TestTransformersXPU::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_2_key_padding_mask_dim_2_bool_xpu
test_transformers_xpu.py::TestTransformersXPU::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_3_key_padding_mask_dim_2_bool_xpu
test_transformers_xpu.py::TestTransformersXPU::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_False_use_autocast_False_d_model_12_xpu
test_transformers_xpu.py::TestTransformersXPU::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_False_use_autocast_True_d_model_12_xpu
test_transformers_xpu.py::TestTransformersXPU::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_True_use_autocast_False_d_model_12_xpu
test_transformers_xpu.py::TestTransformersXPU::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_True_use_autocast_True_d_model_12_xpu
test_linalg_xpu.py::TestLinalgXPU::test_gemm_bias_offline_tunableop_xpu_bfloat16
test_meta_xpu.py::TestMetaXPU::test_dispatch_meta_outplace_nn_functional_scaled_dot_product_attention_xpu_bfloat16
test_meta_xpu.py::TestMetaXPU::test_dispatch_meta_outplace_nn_functional_scaled_dot_product_attention_xpu_float16
test_meta_xpu.py::TestMetaXPU::test_dispatch_meta_outplace_nn_functional_scaled_dot_product_attention_xpu_float32
test_meta_xpu.py::TestMetaXPU::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_scaled_dot_product_attention_xpu_float32
test_meta_xpu.py::TestMetaXPU::test_dispatch_symbolic_meta_outplace_nn_functional_scaled_dot_product_attention_xpu_bfloat16
test_meta_xpu.py::TestMetaXPU::test_dispatch_symbolic_meta_outplace_nn_functional_scaled_dot_product_attention_xpu_float16
test_meta_xpu.py::TestMetaXPU::test_dispatch_symbolic_meta_outplace_nn_functional_scaled_dot_product_attention_xpu_float32 |
Motivation
This PR addresses a code generation issue related to XPU. Currently, there are two separate codegen paths for XPU:
The corresponding build directories are:
build/aten/src/ATen
(for stock PyTorch)build/xpu/ATen
(for torch-xpu-ops)However, in the torch-xpu-ops codegen, we mistakenly omitted installing XPU op headers from
build/xpu/ATen/ops
tobuild/aten/src/ATen/ops
. This PR fixes the issue and also removes some unnecessary code for better maintainability.Solution
We copy the codegen from torch-xpu-ops to stock PyTorch
Additional Context
Fix pytorch/pytorch#145902