[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Optimizing codebook training through control chart analysis

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Frequent use of codebook resets to enhance the usage of the Vector Quantization Variational Autoencoder (VQ-VAE) may significantly alter the codebook distribution and consequently diminish the training efficiency. In this work, we introduce a novel codebook learning approach called Exponentially Weighted Moving Average Control VQ-VAE (ECVQ-VAE). This method considers the nearest neighbor distance of the codebook during training as a monitoring sample and constructs a control line. Our quantizer restricts the update process of codebook vectors based on whether the drift during monitoring exceeds the control line while simultaneously adjusting the overall usage distribution of the codebook by promoting competition. This process enables an optimization that sustains full codebook usage while reducing the training demands. We demonstrate that our approach achieves better results than the existing methods in lightweight scenarios and extensively validate the generalizability of our quantizer across various datasets, tasks, and architectures (VQ-VAE, VQ-GAN).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

All data included in this study are available upon request by contact with the corresponding author.

References

  1. Caron, M., Bojanowski, P., Joulin, A., et al.: Deep clustering for unsupervised learning of visual features. In: Proceedings of the European Conference on Computer vision (ECCV), pp 132–149 (2018)

  2. Adiban, M., Siniscalchi, M., Stefanov, K., et al.: Hierarchical residual learning based vector quantized variational autoencorder for image reconstruction and generation. In: 33rd British Machine Vision Conference, https://doi.org/10.48550/arXiv.2208.04554, (2022)

  3. Lee, D., Kim, C., Kim, S,. et al.: Autoregressive image generation using residual quantization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11523–11532, (2022), https://doi.org/10.48550/arXiv.2203.01941

  4. Gu, Y., Wang, X., Xie, L., et al.: Vqfr: Blind face restoration with vector-quantized dictionary and parallel decoder. In: European Conference on Computer Vision, Springer, pp 126–143, (2022) , https://doi.org/10.48550/arXiv.2205.06803

  5. Gu, X., Xu, S., Wong, Y., et al.: Multi2human: controllable human image generation with multimodal controls. Neurocomputing 587, 127682 (2024). https://doi.org/10.1016/j.neucom.2024.127682

    Article  Google Scholar 

  6. Chang, H., Zhang, H., Barber, J., et al.: Muse: text-to-image generation via masked generative transformers. 2023. arXiv preprint https://doi.org/10.48550/arXiv.2301.00704

  7. Esser, P., Rombach, R., Ommer, B.: Taming transformers for high-resolution image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12873–12883, (2021). https://doi.org/10.48550/arXiv.2012.09841

  8. Ramesh, A., Pavlov, M., Goh, G., et al.: Zero-shot text-to-image generation. In: International Conference on Machine Learning, PMLR, pp 8821–8831, (2021). https://doi.org/10.48550/arXiv.2102.12092

  9. Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30. 2017. https://doi.org/10.48550/arXiv.1711.00937

  10. Sadok, S., Leglaive, S., Girin, L., et al.: A multimodal dynamical variational autoencoder for audiovisual speech representation learning. Neural Netw. 172, 106120 (2024). https://doi.org/10.1016/j.neunet.2024.106120

    Article  Google Scholar 

  11. Chung, YA., Tang, H., Glass, J.: (2020) Vector-quantized autoregressive predictive coding. arXiv preprint https://doi.org/10.48550/arXiv.2005.08392

  12. Dhariwal, P., Jun, H., Payne, C., et al.: (2020) Jukebox: A generative model for music. arXiv preprint https://doi.org/10.48550/arXiv.2005.00341

  13. Zhang, J., Yoshie, O.: Learning hierarchical discrete prior for co-speech gesture generation. Neurocomputing (2024). https://doi.org/10.1016/j.neucom.2024.127831

    Article  Google Scholar 

  14. Li, Y., Ding, Y., Ren, Z., et al.: Qposer: quantized explicit pose prior modeling for controllable pose generation. arXiv preprint https://doi.org/10.48550/arXiv.2312.01104 (2023)

  15. Geng, Z., Wang, C., Wei, Y., et al.: Human pose as compositional tokens. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 660–671, (2023), https://doi.org/10.48550/arXiv.2303.11638

  16. Liu, P., Li, S., Wang, H.: Steganography in vector quantization process of linear predictive coding for low-bit-rate speech codec. Multimedia Syst. 23, 485–497 (2017). https://doi.org/10.1007/s00530-015-0500-7

    Article  Google Scholar 

  17. Sun, M., Wang, W., Zhu, X., et al.: Moso: decomposing motion, scene and object for video prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 18727–18737, (2023), https://doi.org/10.48550/arXiv.2303.03684

  18. Wang, M., Li, J., Li, Z., et al.: Unsupervised anomaly detection with local-sensitive vqvae and global-sensitive transformers. arXiv preprint arXiv:2303.17505 (2023)

  19. Lao, D., Wu, Y., Liu, TY., et al.: Sub-token vit embedding via stochastic resonance transformers. arXiv preprint https://doi.org/10.48550/arXiv.2310.03967 (2023)

  20. Khalil, A., Piechocki, R., Santos-Rodriguez, R.: Ll-vq-vae: learnable lattice vector-quantization for efficient representations. arXiv preprint https://doi.org/10.48550/arXiv.2310.09382 (2023)

  21. Volkov, I.: Homology-constrained vector quantization entropy regularizer. arXiv preprint https://doi.org/10.48550/arXiv.2211.14363 (2022)

  22. Kaiser, L., Bengio, S., Roy, A., et al.: Fast decoding in sequence models using discrete latent variables. In: International Conference on Machine Learning, PMLR, pp 2390–2399, (2018), https://doi.org/10.48550/arXiv.1803.03382

  23. LeCun, Y., Bottou, L., Bengio, Y., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791

    Article  Google Scholar 

  24. Łańcucki, A., Chorowski, J., Sanchez, G., et al.: Robust training of vector quantized bottleneck models. In: 2020 International Joint Conference on Neural Networks (IJCNN), IEEE, pp 1–7, (2020), https://doi.org/10.48550/arXiv.2005.08520

  25. Zeghidour, N., Luebs, A., Omran, A., et al.: Soundstream: an end-to-end neural audio codec. IEEE/ACM Trans. Audio Speech Lang. Process. 30, 495–507 (2021). https://doi.org/10.48550/arXiv.2107.03312

    Article  Google Scholar 

  26. v Takida, Y., Shibuya, T., Liao, W., et al.: Sq-vae: variational bayes on discrete representation with self-annealed stochastic quantization. arXiv preprint https://doi.org/10.48550/arXiv.2205.07547 (2022)

  27. Huh, M., Cheung, B., Agrawal, P., et al.: Straightening out the straight-through estimator: Overcoming optimization challenges in vector quantized networks. arXiv preprint https://doi.org/10.48550/arXiv.2305.08842 (2023)

  28. Zheng, C., Vedaldi, A.: Online clustered codebook. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 22798–22807, (2023), https://doi.org/10.48550/arXiv.2307.15139

  29. Yu, J., Li, X., Koh, JY., et al.: Vector-quantized image modeling with improved vqgan. arXiv preprint https://doi.org/10.48550/arXiv.2110.04627 (2021)

  30. Li, L., Liu, T., Wang, C., et al.: Resizing codebook of vector quantization without retraining. Multimed. Syst. 29(3), 1499–1512 (2023). https://doi.org/10.1007/s00530-023-01065-2

    Article  Google Scholar 

  31. Panaretos, V.M., Zemel, Y.: Statistical aspects of Wasserstein distances. Annu. Rev. Stat. Appl. 6, 405–431 (2019)

    Article  MathSciNet  Google Scholar 

  32. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4401–4410, (2019), https://doi.org/10.48550/arXiv.1812.04948

  33. Kingma, DP., Welling, M.: Auto-encoding variational bayes. arXiv preprint https://doi.org/10.48550/arXiv.1312.6114 (2013)

  34. Williams, W., Ringer, S., Ash, T., et al.: Hierarchical quantized autoencoders. Adv. Neural Inform. Process. Syst. 33, 4524–4535 (2020). https://doi.org/10.48550/arXiv.2002.08111

    Article  Google Scholar 

  35. Oakland, J., Oakland, J.S.: Statistical Process Control. Routledge (2007)

    Book  Google Scholar 

  36. Malinovskaya, A., Mozharovskyi, P., Otto, P.: Statistical process monitoring of artificial neural networks. Technometrics (2023). https://doi.org/10.1080/00401706.2023.2239886

    Article  Google Scholar 

  37. Bayer, F.M., Kozakevicius, A.J., Cintra, R.J.: An iterative wavelet threshold for signal denoising. Signal Process. 162, 10–20 (2019). https://doi.org/10.1016/j.sigpro.2019.04.005

    Article  Google Scholar 

  38. Van Nguyen, T.T., Heuchenne, C., Tran, K.P.: Anomaly detection for compositional data using vsi mewma control chart. IFAC-PapersOnLine 55(10), 1533–1538 (2022). https://doi.org/10.48550/arXiv.2203.15438

    Article  Google Scholar 

  39. Addeh, J., Ebrahimzadeh, A., Azarbad, M., et al.: Statistical process control using optimized neural networks: A case study. ISA Trans. 53(5), 1489–1499 (2014)

    Article  Google Scholar 

  40. Psarakis, S.: The use of neural networks in statistical process control charts. Qual. Reliab. Eng. Int. 27(5), 641–650 (2011)

    Article  Google Scholar 

  41. Gan, F.: Monitoring Poisson observations using modified exponentially weighted moving average control charts. Commun. Stat.-Simul. Comput. 19(1), 103–124 (1990)

    Article  MathSciNet  Google Scholar 

  42. Cabral Morais, M., Knoth, S.: Improving the arl profile and the accuracy of its calculation for Poisson ewma charts. Qual. Reliab. Eng. Int. 36(3), 876–889 (2020). https://doi.org/10.1002/qre.2606

    Article  Google Scholar 

  43. Montgomery, D.C.: Introduction to Statistical Quality Control. Wiley (2019)

    Google Scholar 

  44. Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint https://doi.org/10.48550/arXiv.1708.07747 (2017)

  45. Krizhevsky, A., Hinton, G., et al.: Learning Multiple Layers of Features from Tiny Images. utoronto (2009)

    Google Scholar 

  46. Zhang, R., Isola, P., Efros, AA., et al.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 586–595, (2018), https://doi.org/10.48550/arXiv.1801.03924

  47. Heusel, M., Ramsauer, H., Unterthiner, T., et al.: Gans trained by a two time-scale update rule converge to a local Nash equilibrium. Adv. Neural Inform. Process. syst. 30, 10 (2017). https://doi.org/10.48550/arXiv.1706.08500

    Article  Google Scholar 

  48. Vuong, TL., Le, T., Zhao, H., et al.: (2023) Vector quantized Wasserstein auto-encoder. arXiv preprint https://doi.org/10.48550/arXiv.2302.05917 (2023)

  49. Virtanen, P., Gommers, R., Oliphant, T.E., et al.: Scipy 1.0: fundamental algorithms for scientific computing in python. Nat. Methods 17(3), 261–272 (2020)

    Article  Google Scholar 

  50. Hu, J., Qian, S., Fang, Q., et al.: Efficient graph deep learning in tensorflow with tf_geometric. In: Proceedings of the 29th ACM International Conference on Multimedia, pp 3775–3778 (2021)

  51. Zhaok, X., Liu, H., Fan, W., et al.: Autoemb: automated embedding dimensionality search in streaming recommendations. In: 2021 IEEE International Conference on Data Mining (ICDM), IEEE, pp 896–905 (2021)

  52. Kingma, DP., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint https://doi.org/10.48550/arXiv.1412.6980 (2014)

  53. Polyak, B.T., Juditsky, A.B.: Acceleration of stochastic approximation by averaging. SIAM J. Control. Optim. 30(4), 838–855 (1992). https://doi.org/10.1137/0330046

    Article  MathSciNet  Google Scholar 

  54. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems 25 (2012)

  55. Fu, SW., Hung, KH., Tsao, Y., et al.: Self-supervised speech quality estimation and enhancement using only clean speech. arXiv preprint arXiv:2402.16321 (2024)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qingxuan Shi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, K., Shi, Q., Li, X. et al. Optimizing codebook training through control chart analysis. Multimedia Systems 31, 2 (2025). https://doi.org/10.1007/s00530-024-01555-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00530-024-01555-x

Keywords

Navigation