[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

r-softmax: Generalized Softmax with Controllable Sparsity Rate

  • Conference paper
  • First Online:
Computational Science – ICCS 2023 (ICCS 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14074))

Included in the following conference series:

  • 981 Accesses

Abstract

Nowadays artificial neural network models achieve remarkable results in many disciplines. Functions mapping the representation provided by the model to the probability distribution are the inseparable aspect of deep learning solutions. Although softmax is a commonly accepted probability mapping function in the machine learning community, it cannot return sparse outputs and always spreads the positive probability to all positions. In this paper, we propose r-softmax, a modification of the softmax, outputting sparse probability distribution with controllable sparsity rate. In contrast to the existing sparse probability mapping functions, we provide an intuitive mechanism for controlling the output sparsity level. We show on several multi-label datasets that r-softmax outperforms other sparse alternatives to softmax and is highly competitive with the original softmax. We also apply r-softmax to the self-attention module of a pre-trained transformer language model and demonstrate that it leads to improved performance when fine-tuning the model on different natural language processing tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 79.50
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 99.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Code with \({\text {r-softmax}}\) is available at https://github.com/gmum/rsoftmax.

References

  1. Albert, J.H., Chib, S.: Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88(422), 669–679 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bouchard, G.: Efficient bounds for the softmax function and applications to approximate inference in hybrid models. In: NIPS Workshop for Approximate Bayesian Inference in Continuous/Hybrid Systems (2007)

    Google Scholar 

  3. de Brébisson, A., Vincent, P.: An exploration of softmax alternatives belonging to the spherical loss family. arXiv preprint arXiv:1511.05042 (2015)

  4. Bridle, J.S.: Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In: Soulié, F.F., Hérault, J. (eds.) Neurocomputing. NATO ASI Series, vol. 68, pp. pp. 227–236. Springer, Heidelberg (1990). https://doi.org/10.1007/978-3-642-76153-9_28

  5. Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding, vol. 1, pp. 4171–4186. ACL (2019)

    Google Scholar 

  6. Everingham, M., et al.: The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results (2007)

    Google Scholar 

  7. He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  8. Jang, E., et al.: Categorical reparameterization with Gumbel-softmax. arXiv preprint arXiv:1611.01144 (2016)

  9. Laha, A., et al.: On controllable sparse alternatives to softmax. NeurIPS (2018)

    Google Scholar 

  10. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  11. Luce, R.D.: Individual choice behavior: a theoretical analysis. Courier Corporation (2012)

    Google Scholar 

  12. Martins, A., et al.: From softmax to sparsemax: a sparse model of attention and multi-label classification. In: ICML2016, vol. 48, pp. 1614–1623. JMLR.org (2016)

    Google Scholar 

  13. Vaswani, A., et al. s: Attention is all you need. In: NeurIPS, pp. 5998–6008 (2017)

    Google Scholar 

  14. Wang, A., et al.: GLUE: A multi-task benchmark and analysis platform for natural language understanding (2018)

    Google Scholar 

Download references

Acknowledgements

The work of Klaudia Bałazy and Łukasz Struski was supported by the National Centre of Science (Poland) Grant No. 2020/39/D/ST6/01332. The research of Jacek Tabor was carried out within the research project “Bio-inspired artificial neural network” (grant no. POIR.04.04.00-00-14DE/18-00) within the Team-Net program of the Foundation for Polish Science co-financed by the European Union under the European Regional Development Fund. The work of Marek Śmieja was supported by the National Centre of Science (Poland) Grant No. 2022/45/B/ST6/01117. Klaudia Balazy is affiliated with Doctoral School of Exact and Natural Sciences at the Jagiellonian University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Klaudia Bałazy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bałazy, K., Struski, Ł., Śmieja, M., Tabor, J. (2023). r-softmax: Generalized Softmax with Controllable Sparsity Rate. In: Mikyška, J., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2023. ICCS 2023. Lecture Notes in Computer Science, vol 14074. Springer, Cham. https://doi.org/10.1007/978-3-031-36021-3_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-36021-3_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-36020-6

  • Online ISBN: 978-3-031-36021-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics