[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Approximation Algorithms for Submodular Data Summarization with a Knapsack Constraint

Published: 22 February 2021 Publication History

Abstract

Data summarization, i.e., selecting representative subsets of manageable size out of massive data, is often modeled as a submodular optimization problem. Although there exist extensive algorithms for submodular optimization, many of them incur large computational overheads and hence are not suitable for mining big data. In this work, we consider the fundamental problem of (non-monotone) submodular function maximization with a knapsack constraint, and propose simple yet effective and efficient algorithms for it. Specifically, we propose a deterministic algorithm with approximation ratio 6 and a randomized algorithm with approximation ratio 4, and show that both of them can be accelerated to achieve nearly linear running time at the cost of weakening the approximation ratio by an additive factor of ε. We then consider a more restrictive setting without full access to the whole dataset, and propose streaming algorithms with approximation ratios of 8+ε and 6+ε that make one pass and two passes over the data stream, respectively. As a by-product, we also propose a two-pass streaming algorithm with an approximation ratio of 2+ε when the considered submodular function is monotone. To the best of our knowledge, our algorithms achieve the best performance bounds compared to the state-of-the-art approximation algorithms with efficient implementation for the same problem. Finally, we evaluate our algorithms in two concrete submodular data summarization applications for revenue maximization in social networks and image summarization, and the empirical results show that our algorithms outperform the existing ones in terms of both effectiveness and efficiency.

References

[1]
Georgios Amanatidis, Federico Fusco, Philip Lazos, Stefano Leonardi, and Rebecca Reiffenhäuser. 2020. Fast Adaptive Non-Monotone Submodular Maximization Subject to a Knapsack Constraint. In Neural Information Processing Systems (NeurIPS), arXiv: 2007.05014.
[2]
Ashwinkumar Badanidiyuru, Baharan Mirzasoleiman, Amin Karbasi, and Andreas Krause. 2014. Streaming submodular maximization: Massive data summarization on the fly. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 671--680.
[3]
Ashwinkumar Badanidiyuru and Jan Vondrák. 2014. Fast algorithms for maximizing submodular functions. In ACM-SIAM Symposium on Discrete Algorithms (SODA). 1497--1514.
[4]
Eric Balkanski, Adam Breuer, and Yaron Singer. 2018. Non-monotone submodular maximization in exponentially fewer iterations. In Neural Information Processing Systems (NeurIPS). 2353--2364.
[5]
Mohammadhossein Bateni, Lin Chen, Hossein Esfandiari, Thomas Fu, Vahab Mirrokni, and Afshin Rostamizadeh. 2019. Categorical feature compression via submodular optimization. In International Conference on Machine Learning (ICML). 515--523.
[6]
Niv Buchbinder and Moran Feldman. 2018. Deterministic algorithms for submodular maximization problems. ACM Transactions on Algorithms 14, 3 (2018), 1--20.
[7]
Niv Buchbinder and Moran Feldman. 2019. Constrained submodular maximization via a nonsymmetric technique. Mathematics of Operations Research 44, 3 (2019), 988--1005.
[8]
Niv Buchbinder, Moran Feldman, and Mohit Garg. 2019. Deterministic (1/2+ '')-approximation for submodular maximization over a matroid. In ACM-SIAM Symposium on Discrete Algorithms (SODA). 241--254.
[9]
Niv Buchbinder, Moran Feldman, Joseph Naor, and Roy Schwartz. 2014. Submodular maximization with cardinality constraints. In ACM-SIAM Symposium on Discrete Algorithms (SODA). 1433--1452.
[10]
Niv Buchbinder, Moran Feldman, Joseph Seffi, and Roy Schwartz. 2015. A tight linear time (1/2)-approximation for unconstrained submodular maximization. SIAM J. Comput. 44, 5 (2015), 1384--1402.
[11]
Chandra Chekuri, Shalmoli Gupta, and Kent Quanrud. 2015. Streaming algorithms for submodular function maximization. In International Colloquium on Automata, Languages, and Programming (ICALP). 318--330.
[12]
Chandra Chekuri, TS Jayram, and Jan Vondrák. 2015. On multiplicative weight updates for concave and submodular function maximization. In Innovations in Theoretical Computer Science (ITCS). 201--210.
[13]
Liang-Chieh Chen, Maxwell Collins, Yukun Zhu, George Papandreou, Barret Zoph, Florian Schroff, Hartwig Adam, and Jon Shlens. 2018. Searching for efficient multi-scale architectures for dense image prediction. In Advances in neural information processing systems (NeurIPS). 8699--8710.
[14]
Wei Chen, Yajun Wang, and Siyu Yang. 2009. Efficient influence maximization in social networks. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 199--208.
[15]
Alina Ene and Huy L Nguyen. 2016. Constrained submodular maximization: Beyond 1/e. In IEEE Annual Symposium on Foundations of Computer Science (FOCS). 248--257.
[16]
Alina Ene and Huy L. Nguyen. 2019. A nearly-linear time algorithm for submodular maximization with a knapsack constraint. In International Colloquium on Automata, Languages, and Programming (ICALP). 53:1--53:12.
[17]
Matthew Fahrbach, Vahab Mirrokni, and Morteza Zadimoghaddam. 2019. Non-monotone submodular maximization with nearly optimal adaptivity and query complexity. In International Conference on Machine Learning (ICML). 1833-- 1842.
[18]
Uriel Feige, Vahab S Mirrokni, and Jan Vondrák. 2011. Maximizing non-monotone submodular functions. SIAM J. Comput. 40, 4 (2011), 1133--1153.
[19]
Moran Feldman, Christopher Harshaw, and Amin Karbasi. 2017. Greed is good: Near-optimal submodular maximization via greedy optimization. In Conference on Learning Theory (COLT). 758--784.
[20]
Moran Feldman, Joseph Naor, and Roy Schwartz. 2011. A unified continuous greedy algorithm for submodular maximization. In IEEE Annual Symposium on Foundations of Computer Science (FOCS). 570--579.
[21]
Shayan Oveis Gharan and Jan Vondrák. 2011. Submodular maximization by simulated annealing. In ACM-SIAM Symposium on Discrete Algorithms (SODA). 1098--1116. Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 1, Article 5. Publication date: March 2021. 5:22 Kai Han et al.
[22]
Ryan Gomes and Andreas Krause. 2010. Budgeted nonparametric learning from data streams. In International Conference on Machine Learning (ICML). 391--398.
[23]
Anupam Gupta, Aaron Roth, Grant Schoenebeck, and Kunal Talwar. 2010. Constrained non-monotone submodular maximization: Offline and secretary algorithms. In International Workshop on Internet and Network Economics (WINE). 246--257.
[24]
Ran Haba, Ehsan Kazemi, Moran Feldman, and Amin Karbasi. 2020. Streaming Submodular Maximization under a ??-Set System Constraint. In International Conference on Machine Learning (ICML).
[25]
Shengyuan Hu, Tao Yu, Chuan Guo, Wei-Lun Chao, and Kilian Q Weinberger. 2019. A new defense against adversarial images: Turning a weakness into a strength. In Advances in neural information processing systems (NeurIPS). 1635--1646.
[26]
Chien-Chung Huang and Naonori Kakimura. 2018. Multi-Pass Streaming Algorithms for Monotone Submodular Function Maximization. preprint, arXiv:1802.06212 (2018).
[27]
Chien-Chung Huang and Naonori Kakimura. 2019. Improved streaming algorithms for maximizing monotone submodular functions under a knapsack constraint. In Workshop on Algorithms and Data Structures (WADS). 438--451.
[28]
Chien-Chung Huang, Naonori Kakimura, and Yuichi Yoshida. 2020. Streaming algorithms for maximizing monotone submodular functions under a knapsack constraint. Algorithmica 82, 4 (2020), 1006--1032.
[29]
Ehsan Kazemi, Marko Mitrovic, Morteza Zadimoghaddam, Silvio Lattanzi, and Amin Karbasi. 2019. Submodular streaming in all Its glory: Tight approximation, minimum memory and low adaptive complexity. In International Conference on Machine Learning (ICML). 3311--3320.
[30]
Samir Khuller, Anna Moss, and Joseph Seffi Naor. 1999. The budgeted maximum coverage problem. Inform. Process. Lett. 70, 1 (1999), 39--45.
[31]
Andreas Krause and Daniel Golovin. 2014. Tractability: Practical Approaches to Hard Problems. Cambridge University Press. 71--104 pages.
[32]
Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Reports, University of Toronto (2009).
[33]
Alan Kuhnle. 2019. Interlaced greedy algorithm for maximization of submodular functions in nearly linear time. In Neural Information Processing Systems (NeurIPS). 2371--2381.
[34]
Ariel Kulik, Hadas Shachnai, and Tami Tamir. 2013. Approximations for monotone and nonmonotone submodular maximization with knapsack constraints. Mathematics of Operations Research 38, 4 (2013), 729--739.
[35]
Jon Lee, Vahab S Mirrokni, Viswanath Nagarajan, and Maxim Sviridenko. 2009. Non-monotone submodular maximization under matroid and knapsack constraints. In ACM Symposium on Theory of Computing (STOC). 323--332.
[36]
Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, and Natalie Glance. 2007. Cost-effective outbreak detection in networks. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 420--429.
[37]
Jure Leskovec and Andrej Krevl. 2014. SNAP datasets: Stanford large network dataset collection, URL: https://snap.stanford.edu.
[38]
Wenxin Li and Ness Shroff. 2018. Nearly linear time algorithms and lower bound for submodular maximization. preprint, arXiv:1804.08178 (2018).
[39]
Yishi Lin,Wei Chen, and John CS Lui. 2017. Boosting information spread: An algorithmic approach. In IEEE International Conference on Data Engineering (ICDE). 883--894.
[40]
Baharan Mirzasoleiman, Ashwinkumar Badanidiyuru, and Amin Karbasi. 2016. Fast Constrained Submodular Maximization: Personalized Data Summarization. In International Conference on Machine Learning (ICML). 1358--1367.
[41]
Baharan Mirzasoleiman, Stefanie Jegelka, and Andreas Krause. 2018. Streaming non-monotone submodular maximization: Personalized video summarization on the fly. In AAAI Conference on Artificial Intelligence (AAAI). 1379--1386.
[42]
Gamal Sallam and Bo Ji. 2019. Joint placement and allocation of virtual network functions with budget and capacity constraints. In IEEE Conference on Computer Communications (INFOCOM). 523--531.
[43]
Gamal Sallam, Zizhan Zheng, and Bo Ji. 2019. Placement and allocation of virtual network functions: Multi-dimensional case. In IEEE International Conference on Network Protocols (ICNP). 1--11.
[44]
Adish Singla, Sebastian Tschiatschek, and Andreas Krause. 2016. Noisy submodular maximization via adaptive sampling with applications to crowdsourced image collection summarization. In AAAI Conference on Artificial Intelligence (AAAI). 2037--2043.
[45]
Ruben Sipos, Adith Swaminathan, Pannaga Shivaswamy, and Thorsten Joachims. 2012. Temporal corpus summarization using submodular word coverage. In International Conference on Information and Knowledge Management (CIKM). 754--763.
[46]
Maxim Sviridenko. 2004. A note on maximizing a submodular set function subject to a knapsack constraint. Operations Research Letters 32, 1 (2004), 41--43.
[47]
Laurence A Wolsey. 1982. Maximising real-valued submodular functions: Primal and dual heuristics for location problems. Mathematics of Operations Research 7, 3 (1982), 410--425. Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 1, Article 5. Publication date: March 2021. Approximation Algorithms for Submodular Data Summarization with a Knapsack Constraint 5:23
[48]
Grigory Yaroslavtsev, Samson Zhou, and Dmitrii Avdiukhin. 2020. "Bring your own greedy" + max: Near-optimal 1/2- approximations for submodular knapsack. In International Conference on Artificial Intelligence and Statistics (AISTATS). 3263--3274.
[49]
Junzhou Zhao, Shuo Shang, Pinghui Wang, John CS Lui, and Xiangliang Zhang. 2019. Submodular optimization over streams with inhomogeneous decays. In AAAI Conference on Artificial Intelligence (AAAI), Vol. 33. 5861--5868.

Cited By

View all
  • (2024)Enhanced deterministic approximation algorithm for non-monotone submodular maximization under knapsack constraint with linear query complexityJournal of Combinatorial Optimization10.1007/s10878-024-01232-949:1Online publication date: 22-Nov-2024
  • (2023)Linear query approximation algorithms for non-monotone submodular maximization under knapsack constraintProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/459(4127-4135)Online publication date: 19-Aug-2023
  • (2023)Practical parallel algorithms for submodular maximization subject to a knapsack constraint with nearly optimal adaptivityProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i6.25885(7261-7269)Online publication date: 7-Feb-2023
  • Show More Cited By

Index Terms

  1. Approximation Algorithms for Submodular Data Summarization with a Knapsack Constraint

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems
    Proceedings of the ACM on Measurement and Analysis of Computing Systems  Volume 5, Issue 1
    POMACS
    March 2021
    252 pages
    EISSN:2476-1249
    DOI:10.1145/3452093
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 February 2021
    Published in POMACS Volume 5, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data summarization
    2. machine learning
    3. submodular function maximization

    Qualifiers

    • Research-article

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)69
    • Downloads (Last 6 weeks)15
    Reflects downloads up to 31 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Enhanced deterministic approximation algorithm for non-monotone submodular maximization under knapsack constraint with linear query complexityJournal of Combinatorial Optimization10.1007/s10878-024-01232-949:1Online publication date: 22-Nov-2024
    • (2023)Linear query approximation algorithms for non-monotone submodular maximization under knapsack constraintProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/459(4127-4135)Online publication date: 19-Aug-2023
    • (2023)Practical parallel algorithms for submodular maximization subject to a knapsack constraint with nearly optimal adaptivityProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i6.25885(7261-7269)Online publication date: 7-Feb-2023
    • (2023)Social Media Influencers and Consumer Behaviour of Women in Saudi Arabia: A Literature ReviewProceedings of the 4th African Human Computer Interaction Conference10.1145/3628096.3629058(208-214)Online publication date: 27-Nov-2023
    • (2023)Streaming Algorithms for Constrained Submodular MaximizationACM SIGMETRICS Performance Evaluation Review10.1145/3606376.359357351:1(65-66)Online publication date: 27-Jun-2023
    • (2023)Streaming Algorithms for Constrained Submodular MaximizationAbstract Proceedings of the 2023 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems10.1145/3578338.3593573(65-66)Online publication date: 19-Jun-2023
    • (2023)Randomized Pricing with Deferred Acceptance for Revenue Maximization with Submodular ObjectivesProceedings of the ACM Web Conference 202310.1145/3543507.3583477(3530-3540)Online publication date: 30-Apr-2023
    • (2023)Deletion-Robust Submodular Maximization under a Knapsack Constraint2023 IEEE International Conference on High Performance Computing & Communications, Data Science & Systems, Smart City & Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys60770.2023.00072(482-489)Online publication date: 17-Dec-2023
    • (2022)BABOONSProceedings of the VLDB Endowment10.14778/3551793.355184615:11(2980-2993)Online publication date: 1-Jul-2022
    • (2022)Streaming Algorithms for Constrained Submodular MaximizationProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35706156:3(1-32)Online publication date: 8-Dec-2022
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media