-
Notifications
You must be signed in to change notification settings - Fork 24.4k
[quant][core][gpu][feature] Implemented quantized cuda gelu #77212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` [ghstack-poisoned]
🔗 Helpful links
❌ 2 New FailuresAs of commit a8d5169 (more details on the Dr. CI page): Expand to see more
🕵️ 2 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages
|
Summary: Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` ghstack-source-id: 7f74cbc Pull Request resolved: #77212
@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary: Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` Differential Revision: [D36302475](https://our.internmc.facebook.com/intern/diff/D36302475) [ghstack-poisoned]
@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary: Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` Differential Revision: [D36302475](https://our.internmc.facebook.com/intern/diff/D36302475) [ghstack-poisoned]
Summary: Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` Differential Revision: [D36302475](https://our.internmc.facebook.com/intern/diff/D36302475) [ghstack-poisoned]
Summary: Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` ghstack-source-id: 7ac334a Pull Request resolved: #77212
@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@jerryzh168 wondering if we should move |
yeah I think maybe we can move it to cuda folder |
Summary: Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` Differential Revision: [D36302475](https://our.internmc.facebook.com/intern/diff/D36302475) [ghstack-poisoned]
Summary: Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` ghstack-source-id: 2bec24f Pull Request resolved: #77212
@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
auto x_fp32 = at::dequantize(qx); | ||
auto result_fp32 = at::gelu(x_fp32); | ||
return at::quantize_per_tensor(result_fp32, qx.q_scale(), qx.q_zero_point(), qx.scalar_type()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if each one of them supports L11-13 we can remove L11-13 right? should we add a TODO to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they do, but I think I've seen several other functions this use pruning check for early termination
Summary: Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` Differential Revision: [D36302475](https://our.internmc.facebook.com/intern/diff/D36302475) [ghstack-poisoned]
Summary: Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` ghstack-source-id: 0d104e9 Pull Request resolved: #77212
@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@pytorchbot merge this (Initiating merge automatically since Phabricator Diff has merged) |
Summary: Pull Request resolved: #77212 Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` Reviewed By: jerryzh168 Differential Revision: D36302475 Pulled By: dzdang fbshipit-source-id: 11342fb290031d62ba5e620cbe572fe2cc8ed701
Reverting this PR internally as it broke bazel builds, see https://hud.pytorch.org/pytorch/pytorch/commit/b892b85b881c7b3b2b6bde529c4d174e348ba9fb |
@pytorchbot revert this This Pull Request has been reverted by a revert inside Meta. To re-land this change, please open another pull request, assign the same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).) |
This reverts commit b892b85. Reverted #77212 on behalf of https://github.com/facebook-github-bot
Summary: Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` Differential Revision: [D36302475](https://our.internmc.facebook.com/intern/diff/D36302475) [ghstack-poisoned]
Summary: Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` ghstack-source-id: e617725 Pull Request resolved: #77212
@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary: Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` Differential Revision: [D36392774](https://our.internmc.facebook.com/intern/diff/D36392774) [ghstack-poisoned]
Summary: Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` ghstack-source-id: 31203eb Pull Request resolved: #77212
@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary: Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` Differential Revision: [D36392774](https://our.internmc.facebook.com/intern/diff/D36392774) [ghstack-poisoned]
Summary: Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` ghstack-source-id: 5b2a79f Pull Request resolved: #77212
@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
1 similar comment
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
…36302475) (#77212) Summary: Pull Request resolved: #77212 Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` Reviewed By: cpuhrsch Differential Revision: D36392774 Pulled By: dzdang fbshipit-source-id: 1accdefb042ee4930451ef016c527c5cd3e13168
Can't merge closed PR #77212 |
@pytorchbot merge |
Can't merge closed PR #77212 |
@pytorchbot 10000 merge |
@pytorchbot merge |
Merge failed due to Command
Raised by https://github.com/pytorch/pytorch/actions/runs/2384646215 |
Stack from ghstack (oldest at bottom):
Summary:
Support for quantized cuda gelu has been provided by using
dequantize -> fp32 cuda gelu kernel -> quantize
. Mathematically, thisis not equivalent to doing int8 gelu, so we have opted for this approach
for now. It might be possible to write a variant of the int8 gelu that's
equivalent to
dequantize -> fp32 cuda gelu kernel -> quantize
, whichcan be a topic for future work.
Test function
test_qgelu
was amended to test gelu for quantized cudabackends.
Test Plan:
Differential Revision: D36392774