[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/3495724.3495947guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
research-article
Free access

Optimal approximation - smoothness tradeoffs for soft-max functions

Published: 06 December 2020 Publication History

Abstract

A soft-max function has two main efficiency measures: (1) approximation - which corresponds to how well it approximates the maximum function, (2) smoothness - which shows how sensitive it is to changes of its input. Our goal is to identify the optimal approximation-smoothness tradeoffs for different measures of approximation and smoothness. This leads to novel soft-max functions, each of which is optimal for a different application. The most commonly used soft-max function, called exponential mechanism, has optimal tradeoff between approximation measured in terms of expected additive approximation and smoothness measured with respect to Rényi Divergence. We introduce a soft-max function, called piecewise linear soft-max, with optimal tradeoff between approximation, measured in terms of worst-case additive approximation and smoothness, measured with respect to ℓq-norm. The worst-case approximation guarantee of the piecewise linear mechanism enforces sparsity in the output of our soft-max function, a property that is known to be important in Machine Learning applications [14, 12] and is not satisfied by the exponential mechanism. Moreover, the ℓq-smoothness is suitable for applications in Mechanism Design and Game Theory where the piecewise linear mechanism outperforms the exponential mechanism. Finally, we investigate another soft-max function, called power mechanism, with optimal tradeoff between expected multiplicative approximation and smoothness with respect to the Rényi Divergence, which provides improved theoretical and practical results in differentially private submodular optimization.

Supplementary Material

Additional material (3495724.3495947_supp.pdf)
Supplemental material.

References

[1]
Ludwig Boltzmann. Studien uber das gleichgewicht der lebenden kraft. Wissenschafiliche Abhandlungen, 1:49–96, 1868.
[2]
John S. Bridle. Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In Françoise Fogelman Soulié and Jeanny Hérault, editors, Neurocomputing, pages 227–236, Berlin, Heidelberg, 1990. Springer Berlin Heidelberg.
[3]
John S. Bridle. Training stochastic model recognition algorithms as networks can lead to maximum mutual information estimation of parameters. In D. S. Touretzky, editor, Advances in Neural Information Processing Systems 2, pages 211–217, 1990.
[4]
Alexandre de Brébisson and Pascal Vincent. An exploration of softmax alternatives belonging to the spherical loss family. arXiv preprint arXiv:1511.05042, 2015.
[5]
Konstantinos Drakakis and BA Pearlmutter. On the calculation of the l2→ l1 induced matrix norm. International Journal of Algebra, 3(5):231–240, 2009.
[6]
Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.
[7]
J.W. Gibbs. Elementary Principles in Statistical Mechanics: Developed with Especial Reference to the Rational Foundations of Thermodynamics. C. Scribner's sons, 1902.
[8]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016.
[9]
Julien M Hendrickx and Alex Olshevsky. Matrix p-norms are np-hard to approximate if p ≠ 1, 2, ∞. SIAM Journal on Matrix Analysis and Applications, 31(5):2802-2812, 2010.
[10]
Zhiyi Huang and Sampath Kannan. The exponential mechanism for social welfare: Private, truthful, and nearly optimal. In Proceedings of the 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science, FOCS '12, page 140-149, USA, 2012. IEEE Computer Society.
[11]
Michael I. Jordan and Robert A. Jacobs. Hierarchical mixtures of experts and the em algorithm. Neural Comput., 6(2):181-214, March 1994.
[12]
Anirban Laha, Saneem Ahmed Chemmengath, Priyanka Agrawal, Mitesh Khapra, Karthik Sankaranarayanan, and Harish G Ramaswamy. On controllable sparse alternatives to softmax. In Advances in Neural Information Processing Systems, pages 6422-6432, 2018.
[13]
R. Duncan Luce. Individual Choice Behavior: A Theoretical Analysis. John Wiley & Sons, 1959.
[14]
Andre Martins and Ramon Astudillo. From softmax to sparsemax: A sparse model of attention and multi-label classification. In International Conference on Machine Learning, pages 1614-1623, 2016.
[15]
Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In Foundations of Computer Science, 2007. FOCS'07. 48th Annual IEEE Symposium on, pages 94–103. IEEE, 2007.
[16]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111-3119, 2013.
[17]
Marko Mitrovic, Mark Bun, Andreas Krause, and Amin Karbasi. Differentially private sub-modular maximization: Data summarization in disguise. In ICML, 2017.
[18]
Frederic Morin and Yoshua Bengio. Hierarchical probabilistic neural network language model. In Aistats, volume 5, pages 246-252. Citeseer, 2005.
[19]
Jiří Rohn. Computing the norm‖ a‖∞, 1 is np-hard*. Linear and Multilinear Algebra, 47(3):195-204, 2000.
[20]
Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. A Bradford Book, Cambridge, MA, USA, 2018.
[21]
Pascal Vincent, Alexandre De Brébisson, and Xavier Bouthillier. Efficient exact gradient update for training deep networks with very large sparse targets. In Advances in Neural Information Processing Systems, pages 1108-1116, 2015.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS '20: Proceedings of the 34th International Conference on Neural Information Processing Systems
December 2020
22651 pages
ISBN:9781713829546

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 06 December 2020

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 54
    Total Downloads
  • Downloads (Last 12 months)41
  • Downloads (Last 6 weeks)5
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media