[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-training Quantization of ViTs

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Abstract

We present CLAMP-ViT, a data-free post-training quantization method for vision transformers (ViTs). We identify the limitations of recent techniques, notably their inability to leverage meaningful inter-patch relationships, leading to the generation of simplistic and semantically vague data, impacting quantization accuracy. CLAMP-ViT employs a two-stage approach, cyclically adapting between data generation and model quantization. Specifically, we incorporate a patch-level contrastive learning scheme to generate richer, semantically meaningful data. Furthermore, we leverage contrastive learning in layer-wise evolutionary search for fixed- and mixed-precision quantization to identify optimal quantization parameters while mitigating the effects of a non-smooth loss landscape. Extensive evaluations across various vision tasks demonstrate the superiority of CLAMP-ViT, with performance improvements of up to 3% in top-1 accuracy for classification, 0.6 mAP for object detection, and 1.5 mIoU for segmentation at similar or better compression ratio over existing alternatives. Code is available at https://github.com/georgia-tech-synergy-lab/CLAMP-ViT.git.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 49.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 64.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Patch (subset of image): group of neighboring pixels in an image.

  2. 2.

    weights/activations quantized to same precision for all layers.

  3. 3.

    weights/activations quantized to different precision for different layers.

  4. 4.

    \(({\textbf {max}}(\theta _i)-{\textbf {min}}(\theta _i))/(2^b - 1) \), where \(\theta \) is the weight tensor.

References

  1. Baskin, C., et al.: Uniq: uniform noise injection for non-uniform quantization of neural networks. ACM Trans. Comput. Syst. (TOCS) 37(1–4), 1–15 (2021)

    Google Scholar 

  2. Cai, Y., Yao, Z., Dong, Z., Gholami, A., Mahoney, M.W., Keutzer, K.: Zeroq: a novel zero shot quantization framework. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13169–13178 (2020)

    Google Scholar 

  3. Cai, Z., Vasconcelos, N.: Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)

    Google Scholar 

  4. Cao, Y.H., Sun, P., Huang, Y., Wu, J., Zhou, S.: Synergistic self-supervised and quantization learning. In: European Conference on Computer Vision, pp. 587–604. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-20056-4_34

  5. Chen, H., et al.: Bootstrap generalization ability from loss landscape perspective. In: European Conference on Computer Vision, pp. 500–517. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-25075-0_34

  6. Chen, K., et al.: Mmdetection: open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)

  7. Choi, K., Hong, D., Park, N., Kim, Y., Lee, J.: Qimera: data-free quantization with synthetic boundary supporting samples. Adv. Neural. Inf. Process. Syst. 34, 14835–14847 (2021)

    Google Scholar 

  8. Chuang, C.Y., et al.: Robust contrastive learning against noisy views. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16670–16681 (2022)

    Google Scholar 

  9. Contributors, M.: MMSegmentation: openmmlab semantic segmentation toolbox and benchmark (2020). https://github.com/open-mmlab/mmsegmentation

  10. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)

    Google Scholar 

  11. Dong, P., Li, L., Wei, Z., Niu, X., Tian, Z., Pan, H.: Emq: evolving training-free proxies for automated mixed precision quantization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 17076–17086 (2023)

    Google Scholar 

  12. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  13. Fei, W., Dai, W., Li, C., Zou, J., Xiong, H.: General bitwidth assignment for efficient deep convolutional neural network quantization. IEEE Trans. Neural Netw. Learn. Syst. 33(10), 5253–5267 (2021)

    Article  Google Scholar 

  14. Frumkin, N., Gope, D., Marculescu, D.: Jumping through local minima: quantization in the loss landscape of vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16978–16988 (2023)

    Google Scholar 

  15. Fu, Y., Yu, Q., Li, M., Ouyang, X., Chandra, V., Lin, Y.: Contrastive quant: quantization makes stronger contrastive learning. In: Proceedings of the 59th ACM/IEEE Design Automation Conference, pp. 205–210 (2022)

    Google Scholar 

  16. Huang, H., Yu, P.S., Wang, C.: An introduction to image synthesis with generative adversarial nets. arXiv preprint arXiv:1803.04469 (2018)

  17. Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks. Adv. Neural Inf. Process. Syst. 29 (2016)

    Google Scholar 

  18. Hubara, I., Nahshan, Y., Hanani, Y., Banner, R., Soudry, D.: Accurate post training quantization with small calibration sets. In: International Conference on Machine Learning, pp. 4466–4475. PMLR (2021)

    Google Scholar 

  19. Kim, N., Shin, D., Choi, W., Kim, G., Park, J.: Exploiting retraining-based mixed-precision quantization for low-cost dnn accelerator design. IEEE Trans. Neural Netw. Learn. Syst. 32(7), 2925–2938 (2020)

    Article  Google Scholar 

  20. Kundu, S., Sun, Q., Fu, Y., Pedram, M., Beerel, P.: Analyzing the confidentiality of undistillable teachers in knowledge distillation. Adv. Neural. Inf. Process. Syst. 34, 9181–9192 (2021)

    Google Scholar 

  21. Kundu, S., Wang, S., Sun, Q., Beerel, P.A., Pedram, M.: Bmpq: bit-gradient sensitivity-driven mixed-precision quantization of dnns from scratch. In: 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 588–591. IEEE (2022)

    Google Scholar 

  22. Li, H., Xu, Z., Taylor, G., Studer, C., Goldstein, T.: Visualizing the loss landscape of neural nets. Adv. Neural Inf. Process. Syst. 31 (2018)

    Google Scholar 

  23. Li, Y., Xu, S., Zhang, B., Cao, X., Gao, P., Guo, G.: Q-vit: accurate and fully quantized low-bit vision transformer. Adv. Neural. Inf. Process. Syst. 35, 34451–34463 (2022)

    Google Scholar 

  24. Li, Y., et al.: Efficientformer: vision transformers at mobilenet speed. Adv. Neural. Inf. Process. Syst. 35, 12934–12949 (2022)

    Google Scholar 

  25. Li, Z., Chen, M., Xiao, J., Gu, Q.: Psaq-vit v2: toward accurate and general data-free quantization for vision transformers. IEEE Trans. Neural Netw. Learn. Syst. (2023)

    Google Scholar 

  26. Li, Z., Gu, Q.: I-vit: integer-only quantization for efficient vision transformer inference. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 17065–17075 (2023)

    Google Scholar 

  27. Li, Z., Ma, L., Chen, M., Xiao, J., Gu, Q.: Patch similarity aware data-free quantization for vision transformers. In: European Conference on Computer Vision, pp. 154–170. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-20083-0_10

  28. Li, Z., Xiao, J., Yang, L., Gu, Q.: Repq-vit: scale reparameterization for post-training quantization of vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 17227–17236 (2023)

    Google Scholar 

  29. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  30. Lin, Y., Zhang, T., Sun, P., Li, Z., Zhou, S.: Fq-vit: post-training quantization for fully quantized vision transformer. arXiv preprint arXiv:2111.13824 (2021)

  31. Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)

    Google Scholar 

  32. Liu, Z., Wang, Y., Han, K., Zhang, W., Ma, S., Gao, W.: Post-training quantization for vision transformer. Adv. Neural. Inf. Process. Syst. 34, 28092–28103 (2021)

    Google Scholar 

  33. Peters, J.W., Welling, M.: Probabilistic binary neural networks. arXiv preprint arXiv:1809.03368 (2018)

  34. Ramachandran, A., Dhiman, A., Vandrotti, B.S., Kim, J.: Ntrans-net: a multi-scale neutrosophic-uncertainty guided transformer network for indoor depth completion. In: 2023 IEEE International Conference on Image Processing (ICIP), pp. 905–909. IEEE (2023)

    Google Scholar 

  35. Ramachandran, A., Wan, Z., Jeong, G., Gustafson, J., Krishna, T.: Algorithm-hardware co-design of distribution-aware logarithmic-posit encodings for efficient dnn inference. arXiv preprint arXiv:2403.05465 (2024)

  36. Ranjan, N., Savakis, A.: Lrp-qvit: mixed-precision vision transformer quantization via layer-wise relevance propagation. arXiv preprint arXiv:2401.11243 (2024)

  37. Strudel, R., Garcia, R., Laptev, I., Schmid, C.: Segmenter: transformer for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7262–7272 (2021)

    Google Scholar 

  38. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: International conference on machine learning, pp. 10347–10357. PMLR (2021)

    Google Scholar 

  39. Wang, J., Li, J., Li, W., Xuan, L., Zhang, T., Wang, W.: Positive-negative equal contrastive loss for semantic segmentation. Neurocomputing 535, 13–24 (2023)

    Article  Google Scholar 

  40. Wightman, R.: Pytorch image models (2019). https://github.com/rwightman/pytorch-image-models. https://doi.org/10.5281/zenodo.4414861

  41. Xiao, J., Li, Z., Yang, L., Gu, Q.: Patch-wise mixed-precision quantization of vision transformer. arXiv preprint arXiv:2305.06559 (2023)

  42. Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 418–434 (2018)

    Google Scholar 

  43. Yeh, C.H., Hong, C.Y., Hsu, Y.C., Liu, T.L., Chen, Y., LeCun, Y.: Decoupled contrastive learning. In: European Conference on Computer Vision. pp. 668–684. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-19809-0_38

  44. Yin, H., et al.: Dreaming to distill: data-free knowledge transfer via deepinversion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8715–8724 (2020)

    Google Scholar 

  45. Yuan, Z., Xue, C., Chen, Y., Wu, Q., Sun, G.: Ptq4vit: post-training quantization for vision transformers with twin uniform quantization. In: European Conference on Computer Vision, pp. 191–207. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-19775-8_12

  46. Zhang, D., Yang, J., Ye, D., Hua, G.: Lq-nets: learned quantization for highly accurate and compact deep neural networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 365–382 (2018)

    Google Scholar 

  47. Zhang, S., Zhou, Q., Wang, Z., Wang, F., Yan, J.: Patch-level contrastive learning via positional query for visual pre-training (2023)

    Google Scholar 

  48. Zhang, Y., Guo, X., Poggi, M., Zhu, Z., Huang, G., Mattoccia, S.: Completionformer: depth completion with convolutions and vision transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18527–18536 (2023)

    Google Scholar 

  49. Zhang, Y., Chen, D., Kundu, S., Li, C., Beerel, P.A.: Sal-vit: towards latency efficient private inference on vit using selective attention search with a learnable softmax approximation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5116–5125 (2023)

    Google Scholar 

  50. Zhang, Z., Lu, X., Cao, G., Yang, Y., Jiao, L., Liu, F.: Vit-yolo: transformer-based yolo for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2799–2808 (2021)

    Google Scholar 

  51. Zhong, Y., et al.: Intraq: learning synthetic images with intra-class heterogeneity for zero-shot network quantization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12339–12348 (2022)

    Google Scholar 

  52. Zhou, B., et al.: Semantic understanding of scenes through the ade20k dataset. Int. J. Comput. Vision 127, 302–321 (2019)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by CoCoSys, one of the seven centers in JUMP 2.0, a Semiconductor Research Corporation (SRC) program sponsored by DARPA.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Akshat Ramachandran .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 374 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ramachandran, A., Kundu, S., Krishna, T. (2025). CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-training Quantization of ViTs. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15125. Springer, Cham. https://doi.org/10.1007/978-3-031-72855-6_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72855-6_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72854-9

  • Online ISBN: 978-3-031-72855-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics