Abstract
Nowadays artificial neural network models achieve remarkable results in many disciplines. Functions mapping the representation provided by the model to the probability distribution are the inseparable aspect of deep learning solutions. Although softmax is a commonly accepted probability mapping function in the machine learning community, it cannot return sparse outputs and always spreads the positive probability to all positions. In this paper, we propose r-softmax, a modification of the softmax, outputting sparse probability distribution with controllable sparsity rate. In contrast to the existing sparse probability mapping functions, we provide an intuitive mechanism for controlling the output sparsity level. We show on several multi-label datasets that r-softmax outperforms other sparse alternatives to softmax and is highly competitive with the original softmax. We also apply r-softmax to the self-attention module of a pre-trained transformer language model and demonstrate that it leads to improved performance when fine-tuning the model on different natural language processing tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Code with \({\text {r-softmax}}\) is available at https://github.com/gmum/rsoftmax.
References
Albert, J.H., Chib, S.: Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88(422), 669–679 (1993)
Bouchard, G.: Efficient bounds for the softmax function and applications to approximate inference in hybrid models. In: NIPS Workshop for Approximate Bayesian Inference in Continuous/Hybrid Systems (2007)
de Brébisson, A., Vincent, P.: An exploration of softmax alternatives belonging to the spherical loss family. arXiv preprint arXiv:1511.05042 (2015)
Bridle, J.S.: Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In: Soulié, F.F., Hérault, J. (eds.) Neurocomputing. NATO ASI Series, vol. 68, pp. pp. 227–236. Springer, Heidelberg (1990). https://doi.org/10.1007/978-3-642-76153-9_28
Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding, vol. 1, pp. 4171–4186. ACL (2019)
Everingham, M., et al.: The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results (2007)
He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Jang, E., et al.: Categorical reparameterization with Gumbel-softmax. arXiv preprint arXiv:1611.01144 (2016)
Laha, A., et al.: On controllable sparse alternatives to softmax. NeurIPS (2018)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Luce, R.D.: Individual choice behavior: a theoretical analysis. Courier Corporation (2012)
Martins, A., et al.: From softmax to sparsemax: a sparse model of attention and multi-label classification. In: ICML2016, vol. 48, pp. 1614–1623. JMLR.org (2016)
Vaswani, A., et al. s: Attention is all you need. In: NeurIPS, pp. 5998–6008 (2017)
Wang, A., et al.: GLUE: A multi-task benchmark and analysis platform for natural language understanding (2018)
Acknowledgements
The work of Klaudia Bałazy and Łukasz Struski was supported by the National Centre of Science (Poland) Grant No. 2020/39/D/ST6/01332. The research of Jacek Tabor was carried out within the research project “Bio-inspired artificial neural network” (grant no. POIR.04.04.00-00-14DE/18-00) within the Team-Net program of the Foundation for Polish Science co-financed by the European Union under the European Regional Development Fund. The work of Marek Śmieja was supported by the National Centre of Science (Poland) Grant No. 2022/45/B/ST6/01117. Klaudia Balazy is affiliated with Doctoral School of Exact and Natural Sciences at the Jagiellonian University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bałazy, K., Struski, Ł., Śmieja, M., Tabor, J. (2023). r-softmax: Generalized Softmax with Controllable Sparsity Rate. In: Mikyška, J., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2023. ICCS 2023. Lecture Notes in Computer Science, vol 14074. Springer, Cham. https://doi.org/10.1007/978-3-031-36021-3_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-36021-3_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36020-6
Online ISBN: 978-3-031-36021-3
eBook Packages: Computer ScienceComputer Science (R0)