Optimizing codebook training through control chart analysis

Kanglin Wang¹,
Qingxuan Shi^1,2,
Xiaoyang Li¹,
Enyi Wu¹ &
…
Zifan Li¹

67 Accesses
Explore all metrics

Abstract

Frequent use of codebook resets to enhance the usage of the Vector Quantization Variational Autoencoder (VQ-VAE) may significantly alter the codebook distribution and consequently diminish the training efficiency. In this work, we introduce a novel codebook learning approach called Exponentially Weighted Moving Average Control VQ-VAE (ECVQ-VAE). This method considers the nearest neighbor distance of the codebook during training as a monitoring sample and constructs a control line. Our quantizer restricts the update process of codebook vectors based on whether the drift during monitoring exceeds the control line while simultaneously adjusting the overall usage distribution of the codebook by promoting competition. This process enables an optimization that sustains full codebook usage while reducing the training demands. We demonstrate that our approach achieves better results than the existing methods in lightweight scenarios and extensively validate the generalizability of our quantizer across various datasets, tasks, and architectures (VQ-VAE, VQ-GAN).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Neural-Gas VAE

Continual Variational Autoencoder Learning via Online Cooperative Memorization

Vector Quantization Improvement Algorithm for Controlling Average Distortion in the Context of Big Data

Data availability

All data included in this study are available upon request by contact with the corresponding author.

References

Caron, M., Bojanowski, P., Joulin, A., et al.: Deep clustering for unsupervised learning of visual features. In: Proceedings of the European Conference on Computer vision (ECCV), pp 132–149 (2018)
Adiban, M., Siniscalchi, M., Stefanov, K., et al.: Hierarchical residual learning based vector quantized variational autoencorder for image reconstruction and generation. In: 33rd British Machine Vision Conference, https://doi.org/10.48550/arXiv.2208.04554, (2022)
Lee, D., Kim, C., Kim, S,. et al.: Autoregressive image generation using residual quantization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11523–11532, (2022), https://doi.org/10.48550/arXiv.2203.01941
Gu, Y., Wang, X., Xie, L., et al.: Vqfr: Blind face restoration with vector-quantized dictionary and parallel decoder. In: European Conference on Computer Vision, Springer, pp 126–143, (2022) , https://doi.org/10.48550/arXiv.2205.06803
Gu, X., Xu, S., Wong, Y., et al.: Multi2human: controllable human image generation with multimodal controls. Neurocomputing 587, 127682 (2024). https://doi.org/10.1016/j.neucom.2024.127682
Article Google Scholar
Chang, H., Zhang, H., Barber, J., et al.: Muse: text-to-image generation via masked generative transformers. 2023. arXiv preprint https://doi.org/10.48550/arXiv.2301.00704
Esser, P., Rombach, R., Ommer, B.: Taming transformers for high-resolution image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12873–12883, (2021). https://doi.org/10.48550/arXiv.2012.09841
Ramesh, A., Pavlov, M., Goh, G., et al.: Zero-shot text-to-image generation. In: International Conference on Machine Learning, PMLR, pp 8821–8831, (2021). https://doi.org/10.48550/arXiv.2102.12092
Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. Advances in neural information processing systems 30. 2017. https://doi.org/10.48550/arXiv.1711.00937
Sadok, S., Leglaive, S., Girin, L., et al.: A multimodal dynamical variational autoencoder for audiovisual speech representation learning. Neural Netw. 172, 106120 (2024). https://doi.org/10.1016/j.neunet.2024.106120
Article Google Scholar
Chung, YA., Tang, H., Glass, J.: (2020) Vector-quantized autoregressive predictive coding. arXiv preprint https://doi.org/10.48550/arXiv.2005.08392
Dhariwal, P., Jun, H., Payne, C., et al.: (2020) Jukebox: A generative model for music. arXiv preprint https://doi.org/10.48550/arXiv.2005.00341
Zhang, J., Yoshie, O.: Learning hierarchical discrete prior for co-speech gesture generation. Neurocomputing (2024). https://doi.org/10.1016/j.neucom.2024.127831
Article Google Scholar
Li, Y., Ding, Y., Ren, Z., et al.: Qposer: quantized explicit pose prior modeling for controllable pose generation. arXiv preprint https://doi.org/10.48550/arXiv.2312.01104 (2023)
Geng, Z., Wang, C., Wei, Y., et al.: Human pose as compositional tokens. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 660–671, (2023), https://doi.org/10.48550/arXiv.2303.11638
Liu, P., Li, S., Wang, H.: Steganography in vector quantization process of linear predictive coding for low-bit-rate speech codec. Multimedia Syst. 23, 485–497 (2017). https://doi.org/10.1007/s00530-015-0500-7
Article Google Scholar
Sun, M., Wang, W., Zhu, X., et al.: Moso: decomposing motion, scene and object for video prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 18727–18737, (2023), https://doi.org/10.48550/arXiv.2303.03684
Wang, M., Li, J., Li, Z., et al.: Unsupervised anomaly detection with local-sensitive vqvae and global-sensitive transformers. arXiv preprint arXiv:2303.17505 (2023)
Lao, D., Wu, Y., Liu, TY., et al.: Sub-token vit embedding via stochastic resonance transformers. arXiv preprint https://doi.org/10.48550/arXiv.2310.03967 (2023)
Khalil, A., Piechocki, R., Santos-Rodriguez, R.: Ll-vq-vae: learnable lattice vector-quantization for efficient representations. arXiv preprint https://doi.org/10.48550/arXiv.2310.09382 (2023)
Volkov, I.: Homology-constrained vector quantization entropy regularizer. arXiv preprint https://doi.org/10.48550/arXiv.2211.14363 (2022)
Kaiser, L., Bengio, S., Roy, A., et al.: Fast decoding in sequence models using discrete latent variables. In: International Conference on Machine Learning, PMLR, pp 2390–2399, (2018), https://doi.org/10.48550/arXiv.1803.03382
LeCun, Y., Bottou, L., Bengio, Y., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791
Article Google Scholar
Łańcucki, A., Chorowski, J., Sanchez, G., et al.: Robust training of vector quantized bottleneck models. In: 2020 International Joint Conference on Neural Networks (IJCNN), IEEE, pp 1–7, (2020), https://doi.org/10.48550/arXiv.2005.08520
Zeghidour, N., Luebs, A., Omran, A., et al.: Soundstream: an end-to-end neural audio codec. IEEE/ACM Trans. Audio Speech Lang. Process. 30, 495–507 (2021). https://doi.org/10.48550/arXiv.2107.03312
Article Google Scholar
v Takida, Y., Shibuya, T., Liao, W., et al.: Sq-vae: variational bayes on discrete representation with self-annealed stochastic quantization. arXiv preprint https://doi.org/10.48550/arXiv.2205.07547 (2022)
Huh, M., Cheung, B., Agrawal, P., et al.: Straightening out the straight-through estimator: Overcoming optimization challenges in vector quantized networks. arXiv preprint https://doi.org/10.48550/arXiv.2305.08842 (2023)
Zheng, C., Vedaldi, A.: Online clustered codebook. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 22798–22807, (2023), https://doi.org/10.48550/arXiv.2307.15139
Yu, J., Li, X., Koh, JY., et al.: Vector-quantized image modeling with improved vqgan. arXiv preprint https://doi.org/10.48550/arXiv.2110.04627 (2021)
Li, L., Liu, T., Wang, C., et al.: Resizing codebook of vector quantization without retraining. Multimed. Syst. 29(3), 1499–1512 (2023). https://doi.org/10.1007/s00530-023-01065-2
Article Google Scholar
Panaretos, V.M., Zemel, Y.: Statistical aspects of Wasserstein distances. Annu. Rev. Stat. Appl. 6, 405–431 (2019)
Article MathSciNet Google Scholar
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4401–4410, (2019), https://doi.org/10.48550/arXiv.1812.04948
Kingma, DP., Welling, M.: Auto-encoding variational bayes. arXiv preprint https://doi.org/10.48550/arXiv.1312.6114 (2013)
Williams, W., Ringer, S., Ash, T., et al.: Hierarchical quantized autoencoders. Adv. Neural Inform. Process. Syst. 33, 4524–4535 (2020). https://doi.org/10.48550/arXiv.2002.08111
Article Google Scholar
Oakland, J., Oakland, J.S.: Statistical Process Control. Routledge (2007)
Book Google Scholar
Malinovskaya, A., Mozharovskyi, P., Otto, P.: Statistical process monitoring of artificial neural networks. Technometrics (2023). https://doi.org/10.1080/00401706.2023.2239886
Article Google Scholar
Bayer, F.M., Kozakevicius, A.J., Cintra, R.J.: An iterative wavelet threshold for signal denoising. Signal Process. 162, 10–20 (2019). https://doi.org/10.1016/j.sigpro.2019.04.005
Article Google Scholar
Van Nguyen, T.T., Heuchenne, C., Tran, K.P.: Anomaly detection for compositional data using vsi mewma control chart. IFAC-PapersOnLine 55(10), 1533–1538 (2022). https://doi.org/10.48550/arXiv.2203.15438
Article Google Scholar
Addeh, J., Ebrahimzadeh, A., Azarbad, M., et al.: Statistical process control using optimized neural networks: A case study. ISA Trans. 53(5), 1489–1499 (2014)
Article Google Scholar
Psarakis, S.: The use of neural networks in statistical process control charts. Qual. Reliab. Eng. Int. 27(5), 641–650 (2011)
Article Google Scholar
Gan, F.: Monitoring Poisson observations using modified exponentially weighted moving average control charts. Commun. Stat.-Simul. Comput. 19(1), 103–124 (1990)
Article MathSciNet Google Scholar
Cabral Morais, M., Knoth, S.: Improving the arl profile and the accuracy of its calculation for Poisson ewma charts. Qual. Reliab. Eng. Int. 36(3), 876–889 (2020). https://doi.org/10.1002/qre.2606
Article Google Scholar
Montgomery, D.C.: Introduction to Statistical Quality Control. Wiley (2019)
Google Scholar
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint https://doi.org/10.48550/arXiv.1708.07747 (2017)
Krizhevsky, A., Hinton, G., et al.: Learning Multiple Layers of Features from Tiny Images. utoronto (2009)
Google Scholar
Zhang, R., Isola, P., Efros, AA., et al.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 586–595, (2018), https://doi.org/10.48550/arXiv.1801.03924
Heusel, M., Ramsauer, H., Unterthiner, T., et al.: Gans trained by a two time-scale update rule converge to a local Nash equilibrium. Adv. Neural Inform. Process. syst. 30, 10 (2017). https://doi.org/10.48550/arXiv.1706.08500
Article Google Scholar
Vuong, TL., Le, T., Zhao, H., et al.: (2023) Vector quantized Wasserstein auto-encoder. arXiv preprint https://doi.org/10.48550/arXiv.2302.05917 (2023)
Virtanen, P., Gommers, R., Oliphant, T.E., et al.: Scipy 1.0: fundamental algorithms for scientific computing in python. Nat. Methods 17(3), 261–272 (2020)
Article Google Scholar
Hu, J., Qian, S., Fang, Q., et al.: Efficient graph deep learning in tensorflow with tf_geometric. In: Proceedings of the 29th ACM International Conference on Multimedia, pp 3775–3778 (2021)
Zhaok, X., Liu, H., Fan, W., et al.: Autoemb: automated embedding dimensionality search in streaming recommendations. In: 2021 IEEE International Conference on Data Mining (ICDM), IEEE, pp 896–905 (2021)
Kingma, DP., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint https://doi.org/10.48550/arXiv.1412.6980 (2014)
Polyak, B.T., Juditsky, A.B.: Acceleration of stochastic approximation by averaging. SIAM J. Control. Optim. 30(4), 838–855 (1992). https://doi.org/10.1137/0330046
Article MathSciNet Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems 25 (2012)
Fu, SW., Hung, KH., Tsao, Y., et al.: Self-supervised speech quality estimation and enhancement using only clean speech. arXiv preprint arXiv:2402.16321 (2024)

Download references

Author information

Authors and Affiliations

School of Cyber Security and Computer, Hebei University, Baoding, 071002, Hebei, China
Kanglin Wang, Qingxuan Shi, Xiaoyang Li, Enyi Wu & Zifan Li
Hebei Machine Vision Engneering Research Center, Baoding, 071002, Hebei, China
Qingxuan Shi

Authors

Kanglin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qingxuan Shi
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyang Li
View author publications
You can also search for this author in PubMed Google Scholar
Enyi Wu
View author publications
You can also search for this author in PubMed Google Scholar
Zifan Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qingxuan Shi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, K., Shi, Q., Li, X. et al. Optimizing codebook training through control chart analysis. Multimedia Systems 31, 2 (2025). https://doi.org/10.1007/s00530-024-01555-x

Download citation

Received: 27 May 2024
Accepted: 05 November 2024
Published: 05 December 2024
DOI: https://doi.org/10.1007/s00530-024-01555-x

Optimizing codebook training through control chart analysis

Abstract

Access this article

Subscribe and save

Buy Now