install codegen header to torch/include #1405

guangyey · 2025-02-25T01:56:05Z

Motivation

This PR addresses a code generation issue related to XPU. Currently, there are two separate codegen paths for XPU:

Stock PyTorch – Generates code for oneDNN ops.
torch-xpu-ops – Generates code for SYCL kernel ops.

The corresponding build directories are:

build/aten/src/ATen (for stock PyTorch)
build/xpu/ATen (for torch-xpu-ops)

However, in the torch-xpu-ops codegen, we mistakenly omitted installing XPU op headers from build/xpu/ATen/ops to build/aten/src/ATen/ops. This PR fixes the issue and also removes some unnecessary code for better maintainability.

Solution

We copy the codegen from torch-xpu-ops to stock PyTorch

Additional Context

Fix pytorch/pytorch#145902

dvrogozh · 2025-02-25T21:36:38Z

@guangyey : this does not seem to work for me, I still don't get headers installed to torch/include/. Do we need any changes on pytorch side as well (outside of torch-xpu-ops)?

$ find . -name cat_xpu_dispatch.h
./build/xpu/ATen/ops/cat_xpu_dispatch.h
./build/aten/src/ATen/ops/cat_xpu_dispatch.h
$ find . -name cat_cuda_dispatch.h
./build/aten/src/ATen/ops/cat_cuda_dispatch.h
./torch/include/ATen/ops/cat_cuda_dispatch.h

src/ATen/CMakeLists.txt

Commit extends existing CUDA test to cover XPU SyclExtension case for the same feature - `py_limited_api`. NOTE: THE CHANGE CAN NOT BE MERGED AS IS Change requires update of the commit pin for torch-xpu-ops. Requires: intel/torch-xpu-ops#1405 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

dvrogozh · 2025-02-26T17:08:58Z

this does not seem to work for me, I still don't get headers installed to torch/include/

@guangyey : I see this resolved now after last changes.

dvrogozh · 2025-02-26T17:25:41Z

cmake/Codegen.cmake

@@ -1,89 +1,95 @@
-if(Codegen_GPU_cmake_included)
+if(Codegen_XPU_cmake_included)


That's quite a change. I worry we might see more issues with torch code being updated and 2 things getting out of sync:

torch-xpu-ops Codegen.cmake and related scripts with torch side versions of the same

ops code in native/xpu folder which you've modified for some include files

Any chance we can start bringing pieces of this code into torch codebase itself? For example, any chance we can stop having Codegen.cmake here on torch-xpu-ops side?

Note: don't consider above as a request to do that in this PR. I am just trying to discuss.

Thanks, @dvrogozh, for your verification. I agree that these two components may be out of sync. However, upstreaming this code to stock PyTorch could be challenging since torch-xpu-ops is out of tree. We can track any out-of-sync issues through our nightly builds and CI unit tests. For now, we may need to address and maintain them ourselves.

EikanWang · 2025-03-03T01:59:20Z

@guangyey , overall, the PR looks good to me. However, I prefer to separate the code refinement from the head files installation. Does it make sense to you?

EikanWang · 2025-03-03T01:59:50Z

@guangyey , by the way, are the ci failures relevant to this PR?

guangyey · 2025-03-03T08:33:19Z

@guangyey , overall, the PR looks good to me. However, I prefer to separate the code refinement from the head files installation. Does it make sense to you?

I capture your point. I will separate another PR to install head files. This PR will focus on code refinement.

guangyey · 2025-03-03T08:34:32Z

@guangyey , by the way, are the ci failures relevant to this PR?

I guess it is irrelevant but I am not sure. I will check it.

# Motivation Following the comments [here](#1405 (comment)), this PR intend to refine code related to codegen and remove redundant code.

guangyey · 2025-03-05T08:40:48Z

@EikanWang, I have separated this PR and rebased it to the latest main branch. May I know if I have addressed your comments?

guangyey · 2025-03-06T10:21:01Z

The failures are unrelated to this PR. @EikanWang, any comments?

guangyey · 2025-03-07T03:03:06Z

These failures are irrelevant.

Show Failed cases in op_ut with skip
=========================================================================
test_transformers_xpu.py::TestTransformersXPU::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_2_key_padding_mask_dim_2_bool_xpu
test_transformers_xpu.py::TestTransformersXPU::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_3_key_padding_mask_dim_2_bool_xpu
test_transformers_xpu.py::TestTransformersXPU::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_False_use_autocast_False_d_model_12_xpu
test_transformers_xpu.py::TestTransformersXPU::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_False_use_autocast_True_d_model_12_xpu
test_transformers_xpu.py::TestTransformersXPU::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_True_use_autocast_False_d_model_12_xpu
test_transformers_xpu.py::TestTransformersXPU::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_True_use_autocast_True_d_model_12_xpu
test_linalg_xpu.py::TestLinalgXPU::test_gemm_bias_offline_tunableop_xpu_bfloat16
test_meta_xpu.py::TestMetaXPU::test_dispatch_meta_outplace_nn_functional_scaled_dot_product_attention_xpu_bfloat16
test_meta_xpu.py::TestMetaXPU::test_dispatch_meta_outplace_nn_functional_scaled_dot_product_attention_xpu_float16
test_meta_xpu.py::TestMetaXPU::test_dispatch_meta_outplace_nn_functional_scaled_dot_product_attention_xpu_float32
test_meta_xpu.py::TestMetaXPU::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_scaled_dot_product_attention_xpu_float32
test_meta_xpu.py::TestMetaXPU::test_dispatch_symbolic_meta_outplace_nn_functional_scaled_dot_product_attention_xpu_bfloat16
test_meta_xpu.py::TestMetaXPU::test_dispatch_symbolic_meta_outplace_nn_functional_scaled_dot_product_attention_xpu_float16
test_meta_xpu.py::TestMetaXPU::test_dispatch_symbolic_meta_outplace_nn_functional_scaled_dot_product_attention_xpu_float32

guangyey force-pushed the guangyey/codegen branch 15 times, most recently from ccff7d5 to 8620737 Compare February 25, 2025 12:20

guangyey requested a review from EikanWang February 25, 2025 12:54

guangyey force-pushed the guangyey/codegen branch from 8620737 to c3e1df3 Compare February 25, 2025 12:57

guangyey force-pushed the guangyey/codegen branch 8 times, most recently from d6aa4e3 to 242ad4e Compare February 26, 2025 08:29

guangyey commented Feb 26, 2025

View reviewed changes

src/ATen/CMakeLists.txt Show resolved Hide resolved

dvrogozh mentioned this pull request Feb 26, 2025

xpu: test py_limited_api with SyclExtension pytorch/pytorch#147984

Closed

dvrogozh mentioned this pull request Feb 26, 2025

xpu: installed pytorch is missing aten xpu ops headers (ATen/ops/cat_xpu_dispatch.h and others) pytorch/pytorch#145902

Open

dvrogozh reviewed Feb 26, 2025

View reviewed changes

guangyey force-pushed the guangyey/codegen branch from 242ad4e to 0b53f33 Compare February 27, 2025 05:37

guangyey mentioned this pull request Mar 3, 2025

Refine codegen #1424

Merged

github-merge-queue bot pushed a commit that referenced this pull request Mar 5, 2025

Refine codegen (#1424)

b4701a1

# Motivation Following the comments [here](#1405 (comment)), this PR intend to refine code related to codegen and remove redundant code.

guangyey force-pushed the guangyey/codegen branch 2 times, most recently from 3970602 to 0a8ed2e Compare March 5, 2025 08:37

guangyey added 3 commits March 5, 2025 16:09

install codegen header to torch/include

556b6b1

replace include header

2345c4e

add a ut to guard the change

0a8ed2e

EikanWang approved these changes Mar 7, 2025

View reviewed changes

guangyey added this pull request to the merge queue Mar 7, 2025

Merged via the queue into main with commit b275be6 Mar 7, 2025
16 of 18 checks passed

guangyey deleted the guangyey/codegen branch March 7, 2025 03:04

guangyey added the enhancement New feature or request label Mar 7, 2025

guangyey mentioned this pull request Mar 12, 2025

fix windows build failure #1456

Merged

guangyey added a commit that referenced this pull request Apr 10, 2025

Revert install codegen header to torch/include (#1405)

af60369

guangyey mentioned this pull request Jun 20, 2025

[Reland] Install xpu codegen header to torch/include #1743

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

install codegen header to torch/include #1405

install codegen header to torch/include #1405

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		@@ -1,89 +1,95 @@
		if(Codegen_GPU_cmake_included)
		if(Codegen_XPU_cmake_included)

install codegen header to torch/include #1405

install codegen header to torch/include #1405

Uh oh!

Conversation

Uh oh!

Motivation

Solution

Additional Context

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!