r-softmax: Generalized Softmax with Controllable Sparsity Rate

Klaudia Bałazy¹³,
Łukasz Struski¹³,
Marek Śmieja¹³ &
…
Jacek Tabor¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14074))

Included in the following conference series:

International Conference on Computational Science

981 Accesses

Abstract

Nowadays artificial neural network models achieve remarkable results in many disciplines. Functions mapping the representation provided by the model to the probability distribution are the inseparable aspect of deep learning solutions. Although softmax is a commonly accepted probability mapping function in the machine learning community, it cannot return sparse outputs and always spreads the positive probability to all positions. In this paper, we propose r-softmax, a modification of the softmax, outputting sparse probability distribution with controllable sparsity rate. In contrast to the existing sparse probability mapping functions, we provide an intuitive mechanism for controlling the output sparsity level. We show on several multi-label datasets that r-softmax outperforms other sparse alternatives to softmax and is highly competitive with the original softmax. We also apply r-softmax to the self-attention module of a pre-trained transformer language model and demonstrate that it leads to improved performance when fine-tuning the model on different natural language processing tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 79.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 99.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Code with \({\text {r-softmax}}\) is available at https://github.com/gmum/rsoftmax.

References

Albert, J.H., Chib, S.: Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88(422), 669–679 (1993)
Article MathSciNet MATH Google Scholar
Bouchard, G.: Efficient bounds for the softmax function and applications to approximate inference in hybrid models. In: NIPS Workshop for Approximate Bayesian Inference in Continuous/Hybrid Systems (2007)
Google Scholar
de Brébisson, A., Vincent, P.: An exploration of softmax alternatives belonging to the spherical loss family. arXiv preprint arXiv:1511.05042 (2015)
Bridle, J.S.: Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In: Soulié, F.F., Hérault, J. (eds.) Neurocomputing. NATO ASI Series, vol. 68, pp. pp. 227–236. Springer, Heidelberg (1990). https://doi.org/10.1007/978-3-642-76153-9_28
Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding, vol. 1, pp. 4171–4186. ACL (2019)
Google Scholar
Everingham, M., et al.: The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results (2007)
Google Scholar
He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Jang, E., et al.: Categorical reparameterization with Gumbel-softmax. arXiv preprint arXiv:1611.01144 (2016)
Laha, A., et al.: On controllable sparse alternatives to softmax. NeurIPS (2018)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Luce, R.D.: Individual choice behavior: a theoretical analysis. Courier Corporation (2012)
Google Scholar
Martins, A., et al.: From softmax to sparsemax: a sparse model of attention and multi-label classification. In: ICML2016, vol. 48, pp. 1614–1623. JMLR.org (2016)
Google Scholar
Vaswani, A., et al. s: Attention is all you need. In: NeurIPS, pp. 5998–6008 (2017)
Google Scholar
Wang, A., et al.: GLUE: A multi-task benchmark and analysis platform for natural language understanding (2018)
Google Scholar

Download references

Acknowledgements

The work of Klaudia Bałazy and Łukasz Struski was supported by the National Centre of Science (Poland) Grant No. 2020/39/D/ST6/01332. The research of Jacek Tabor was carried out within the research project “Bio-inspired artificial neural network” (grant no. POIR.04.04.00-00-14DE/18-00) within the Team-Net program of the Foundation for Polish Science co-financed by the European Union under the European Regional Development Fund. The work of Marek Śmieja was supported by the National Centre of Science (Poland) Grant No. 2022/45/B/ST6/01117. Klaudia Balazy is affiliated with Doctoral School of Exact and Natural Sciences at the Jagiellonian University.

Author information

Authors and Affiliations

Jagiellonian University, Kraków, Poland
Klaudia Bałazy, Łukasz Struski, Marek Śmieja & Jacek Tabor

Authors

Klaudia Bałazy
View author publications
You can also search for this author in PubMed Google Scholar
Łukasz Struski
View author publications
You can also search for this author in PubMed Google Scholar
Marek Śmieja
View author publications
You can also search for this author in PubMed Google Scholar
Jacek Tabor
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Klaudia Bałazy .

Editor information

Editors and Affiliations

Czech Technical University in Prague, Prague, Czech Republic
Jiří Mikyška
University of Amsterdam, Amsterdam, The Netherlands
Clélia de Mulatier
AGH University of Science and Technology, Krakow, Poland
Maciej Paszynski
University of Amsterdam, Amsterdam, The Netherlands
Valeria V. Krzhizhanovskaya
University of Tennessee at Knoxville, Knoxville, TN, USA
Jack J. Dongarra
University of Amsterdam, Amsterdam, The Netherlands
Peter M.A. Sloot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bałazy, K., Struski, Ł., Śmieja, M., Tabor, J. (2023). r-softmax: Generalized Softmax with Controllable Sparsity Rate. In: Mikyška, J., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2023. ICCS 2023. Lecture Notes in Computer Science, vol 14074. Springer, Cham. https://doi.org/10.1007/978-3-031-36021-3_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-36021-3_11
Published: 26 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36020-6
Online ISBN: 978-3-031-36021-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics