More Web Proxy on the site http://driver.im/

research-article

Free access

Optimal approximation - smoothness tradeoffs for soft-max functions

AUTHORs:

Alessandro Epasto,

Mohammad Mahdian,

Vahab Mirrokni,

Manolis ZampetakisAuthors Info & Claims

NIPS'20: Proceedings of the 34th International Conference on Neural Information Processing Systems

Article No.: 223, Pages 2651 - 2660

Published: 06 December 2020 Publication History

PDF eReader Publisher Site

Abstract

A soft-max function has two main efficiency measures: (1) approximation - which corresponds to how well it approximates the maximum function, (2) smoothness - which shows how sensitive it is to changes of its input. Our goal is to identify the optimal approximation-smoothness tradeoffs for different measures of approximation and smoothness. This leads to novel soft-max functions, each of which is optimal for a different application. The most commonly used soft-max function, called exponential mechanism, has optimal tradeoff between approximation measured in terms of expected additive approximation and smoothness measured with respect to Rényi Divergence. We introduce a soft-max function, called piecewise linear soft-max, with optimal tradeoff between approximation, measured in terms of worst-case additive approximation and smoothness, measured with respect to ℓ_q-norm. The worst-case approximation guarantee of the piecewise linear mechanism enforces sparsity in the output of our soft-max function, a property that is known to be important in Machine Learning applications [14, 12] and is not satisfied by the exponential mechanism. Moreover, the ℓ_q-smoothness is suitable for applications in Mechanism Design and Game Theory where the piecewise linear mechanism outperforms the exponential mechanism. Finally, we investigate another soft-max function, called power mechanism, with optimal tradeoff between expected multiplicative approximation and smoothness with respect to the Rényi Divergence, which provides improved theoretical and practical results in differentially private submodular optimization.

Supplementary Material

Additional material (3495724.3495947_supp.pdf)

Supplemental material.

Download
480.56 KB

References

[1]

Ludwig Boltzmann. Studien uber das gleichgewicht der lebenden kraft. Wissenschafiliche Abhandlungen, 1:49–96, 1868.

[2]

John S. Bridle. Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In Françoise Fogelman Soulié and Jeanny Hérault, editors, Neurocomputing, pages 227–236, Berlin, Heidelberg, 1990. Springer Berlin Heidelberg.

[3]

John S. Bridle. Training stochastic model recognition algorithms as networks can lead to maximum mutual information estimation of parameters. In D. S. Touretzky, editor, Advances in Neural Information Processing Systems 2, pages 211–217, 1990.

[4]

Alexandre de Brébisson and Pascal Vincent. An exploration of softmax alternatives belonging to the spherical loss family. arXiv preprint arXiv:1511.05042, 2015.

[5]

Konstantinos Drakakis and BA Pearlmutter. On the calculation of the l2→ l1 induced matrix norm. International Journal of Algebra, 3(5):231–240, 2009.

[6]

Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.

Digital Library

[7]

J.W. Gibbs. Elementary Principles in Statistical Mechanics: Developed with Especial Reference to the Rational Foundations of Thermodynamics. C. Scribner's sons, 1902.

[8]

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016.

Digital Library

[9]

Julien M Hendrickx and Alex Olshevsky. Matrix p-norms are np-hard to approximate if p ≠ 1, 2, ∞. SIAM Journal on Matrix Analysis and Applications, 31(5):2802-2812, 2010.

Digital Library

[10]

Zhiyi Huang and Sampath Kannan. The exponential mechanism for social welfare: Private, truthful, and nearly optimal. In Proceedings of the 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science, FOCS '12, page 140-149, USA, 2012. IEEE Computer Society.

Digital Library

[11]

Michael I. Jordan and Robert A. Jacobs. Hierarchical mixtures of experts and the em algorithm. Neural Comput., 6(2):181-214, March 1994.

Digital Library

[12]

Anirban Laha, Saneem Ahmed Chemmengath, Priyanka Agrawal, Mitesh Khapra, Karthik Sankaranarayanan, and Harish G Ramaswamy. On controllable sparse alternatives to softmax. In Advances in Neural Information Processing Systems, pages 6422-6432, 2018.

[13]

R. Duncan Luce. Individual Choice Behavior: A Theoretical Analysis. John Wiley & Sons, 1959.

[14]

Andre Martins and Ramon Astudillo. From softmax to sparsemax: A sparse model of attention and multi-label classification. In International Conference on Machine Learning, pages 1614-1623, 2016.

[15]

Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In Foundations of Computer Science, 2007. FOCS'07. 48th Annual IEEE Symposium on, pages 94–103. IEEE, 2007.

[16]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111-3119, 2013.

Digital Library

[17]

Marko Mitrovic, Mark Bun, Andreas Krause, and Amin Karbasi. Differentially private sub-modular maximization: Data summarization in disguise. In ICML, 2017.

[18]

Frederic Morin and Yoshua Bengio. Hierarchical probabilistic neural network language model. In Aistats, volume 5, pages 246-252. Citeseer, 2005.

[19]

Jiří Rohn. Computing the norm‖ a‖∞, 1 is np-hard*. Linear and Multilinear Algebra, 47(3):195-204, 2000.

[20]

Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. A Bradford Book, Cambridge, MA, USA, 2018.

[21]

Pascal Vincent, Alexandre De Brébisson, and Xavier Bouthillier. Efficient exact gradient update for training deep networks with very large sparse targets. In Advances in Neural Information Processing Systems, pages 1108-1116, 2015.

Digital Library

Recommendations

Optimal Approximation Algorithms for Maximum Distance-Bounded Subgraph Problems

In this paper we study the (in)approximability of two distance-based relaxed variants of the maximum clique problem (Max Clique), named Max d-Clique and Max d-Club: A d-clique in a graph $$G = (V, E)$$G=(V,E) is a subset $$S\subseteq V$$S⊆V of vertices ...
Optimal Approximation Algorithms for Maximum Distance-Bounded Subgraph Problems
COCOA 2015: Proceedings of the 9th International Conference on Combinatorial Optimization and Applications - Volume 9486

A d-clique in a graph $$G = V, E$$G=V,E is a subset $$S\subseteq V$$S⊆V of vertices such that for pairs of vertices $$u, v\in S$$u,v∈S, the distance between u and v is at most d in G. A d-club in a graph $$G = V, E$$G=V,E is a subset $$S'\subseteq V$$S'⊆...
Improved Approximation Algorithms for MAX SAT

MAX SAT (the maximum satisfiability problem) is stated as follows: given a set of clauses with weights, find a truth assignment that maximizes the sum of the weights of the satisfied clauses. In this paper, we consider approximation algorithms for MAX ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS '20: Proceedings of the 34th International Conference on Neural Information Processing Systems

December 2020

22651 pages

ISBN:9781713829546

Editors:
H. Larochelle
Google Research
,
M. Ranzato
Facebook AI Research
,
R. Hadsell
DeepMind
,
M.F. Balcan
Carnegie Mellon University
,
H. Lin
National Taiwan University

Copyright © 2020 Neural Information Processing Systems Foundation, Inc.

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 06 December 2020

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
54
Total Downloads

Downloads (Last 12 months)41
Downloads (Last 6 weeks)5

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents