Abstract
We present a framework for learning disentangled representation of CapsNet by information bottleneck constraint that distills information into a compact form and motivates to learn an interpretable capsule. In our β-CapsNet framework, the hyperparameter β is utilized to trade off disentanglement and other tasks, and variational inference is utilized to convert the information bottleneck constraint into a KL divergence term that is approximated as a constraint on the mean of the capsule. For supervised learning, class-independent mask vector is used for understanding the types of variations synthetically irrespective of the image class, and we carry out extensive quantitative and qualitative experiments by tuning the parameter β to figure out the relationship between disentanglement, reconstruction and classification performance. Furthermore, the unsupervised β-CapsNet and the corresponding dynamic routing algorithm are proposed for learning disentangled capsule in an unsupervised manner, and extensive empirical evaluations suggest that our β-CapsNet achieves state-of-the-art disentanglement performance compared to CapsNet and various baselines on several complex datasets both in supervision and unsupervised scenes.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
No new data were created during the study.
References
Higgins I, Matthey L, Pal A, Burgess C, Glorot X, Botvinick M, Mohamed S, Lerchner A (2017) beta-vae: Learning basic visual concepts with a constrained variational framework. In: International Conference on Learning Representations
Ridgeway K (2016) A survey of inductive biases for factorial representation-learning. arXiv preprint arXiv:1612.05299
Lake BM, Ullman TD, Tenenbaum JB, Gershman SJ (2016) Building machines that learn and think like people. Behav Brain Sci 40:1–101
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Chen X, Duan Y, Houthooft R, Schulman J, Sutskever I, Abbeel P (2016) InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In: NIPS
Kingma DP, Welling M (2014) Auto-encoding variational bayes. ICLR
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In:NIPS, Montreal, Canada, pp 2672–2680
Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. In: NIPS, Long Beach, CA pp 3856–3866
Tishby N, Pereira FC, Biale W (1999) The information bottleneck method. In: The 37th annual Allerton conference on communication, control, and computing, pp 368–377
Tishby N, Zaslavsky N (2015) Deep learning and the information bottleneck principle. In: Information theory workshop (ITW), 2015 IEEE, pp 1–5. IEEE
Schmidhuber J (1992) Learning factorial codes by predictability minimization. Neural Comput 4(6):863–879
Desjardins G, Courville A, Bengio Y (2012) Disentangling factors of variation via generative entangling. arXiv preprint arXiv:1210.5474
Chen TQ, Li X, Grosse RB, Duvenaud DK (2018) Isolating sources of disentanglement in variational autoencoders. In NIPS, pp 2615–2625
Achille A, Soatto S (2018) Information dropout: learning optimal representations through noisy computation. IEEE Trans Pattern Anal Mach Intell 40(12):2897–2905
Kim H, Mnih A (2018) Disentangling by factorising. arXiv preprint arXiv:1802.05983
Dupont E (2018) Learning disentangled joint continuous and discrete representations
Kulkarni TD, Whitney WF, Kohli P, Tenenbaum J (2015) Deep convolutional inverse graphics network. Adv Neural Inf Process Syst 2539–2547
Yang J, Reed SE, Yang M-H, Lee H (2015) Weakly-supervised disentangling with recurrent transformations for 3d view synthesis. In: NIPS, pp 1099–1107
Reed S, Sohn K, Zhang Y, Lee H (2014) Learning to disentangle factors of variation with manifold interaction. In: ICML, pp 1431–1439
Shwartz-Ziv R, Tishby N (2017) Opening the black box of deep neural networks via information. CoRR, arXiv:1703.00810
Alemi AA, Fischer I, Dillon JV, Murphy K (2017) Deep variational information bottleneck. In: ICLR
Peng XB, Kanazawa A, Toyer S, Abbeel P, Levine S (2019) Variational discriminator bottleneck: improving imitation learning, inverse RL, and GANs by constraining information flow. In: ICLR
Chalk M, Marre O, Tkacik G (2016) Relevant sparse codes with variational information bottleneck. In: NIPS, pp 1957–1965
Federici M, Dutta A, Forré P, Kushman N, Akata Z (2020) Learning robust representations via multi-view information bottleneck. In: ICLR
Hu MF, Liu JW (2021) Optimal representations of CapsNet by information bottleneck. In: ICANN
Hinton GE (1981) A parallel computation that assigns canonical object-based frames of reference. In: Proceedings of the 7th international joint conference on artificial intelligence, IJCAI, pp 683–685
Hinton GE, Krizhevsky A, Wang SD (2011) Transforming auto-encoders. In: Artificial neural networks and machine learning, ICANN, pp 44–51
Hinton GE, Sabour S, Frosst N (2018) Matrix capsules with EM routing. In: International conference on learning representations, ICLR
Ribeiro FDS, Leontidis G, Kollias SD (2020) Capsule routing via variational bayes. In: The thirty-fourth AAAI conference on artificial intelligence, AAAI 2020, pp 3749–3756
Dou Z, Tu Z, Wang X, Wang L, Shi S, Zhang T (2019) Dynamic layer aggregation for neural machine translation with routing-by-agreement. In: AAAI conference on artificial intelligence, pp 86–93
Wang D, Liu Q (2018) An optimization view on dynamic routing between capsules. In: International conference on learning representations, ICLR
Li H, Guo X, Dai B, Ouyang W, Wang X (2018) Neural network encapsulation. In: European computer vision conference, ECCV, pp 266–282
Kosiorek AR, Sabour S, Teh YW, Hinton GE (2019) Stacked capsule autoencoders. In: Annual conference on neural information processing systems, NeurIPS, pp 15486–15496
Rajasegaran J, Jayasundara V, Jayasekara S et al (2019) Deepcaps: going deeper with capsule networks. In: Computer vision and pattern recognition, CVPR, pp 10725–10733
LeCun Y, Cortes C, Burges CJC (1998) The mnist database of handwritten digits
Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. CoRR
Aubry M, Maturana D, Efros A, Russell B, Sivic J (2014) Seeing 3d chairs: exemplar part-based 2d-3d alignment using a large dataset of cad models. In: CVPR
Liu Z, Luo P, Wang X, Tang X (2015) Deep learning face attributes in the wild. In: ICCV
Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. In: CVPR, pp 4401–4410
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, and there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled ‘β-CapsNet: Learning Disentangled Representation for CapsNet by Information Bottleneck.’
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hu, Mf., Liu, Jw. β-CapsNet: learning disentangled representation for CapsNet by information bottleneck. Neural Comput & Applic 35, 2503–2525 (2023). https://doi.org/10.1007/s00521-022-07729-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07729-w