[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Multi-Pass Streaming Algorithms for Monotone Submodular Function Maximization

  • Published:
Theory of Computing Systems Aims and scope Submit manuscript

Abstract

We consider maximizing a monotone submodular function under a cardinality constraint or a knapsack constraint in the streaming setting. In particular, the elements arrive sequentially and at any point of time, the algorithm has access to only a small fraction of the data stored in primary memory. We propose the following streaming algorithms taking O(ε− 1) passes: (1) a (1 − e− 1ε)-approximation algorithm for the cardinality-constrained problem, (2) a (0.5 − ε)-approximation algorithm for the knapsack-constrained problem. Both of our algorithms run deterministically in O(n) time, using O(K) space, where n is the size of the ground set and K is the size of the knapsack. Here the term O hides a polynomial of \(\log K\) and ε− 1. Our streaming algorithms can also be used as fast approximation algorithms. In particular, for the cardinality-constrained problem, our algorithm takes \(O(n\varepsilon ^{-1} \log (\varepsilon ^{-1}\log K) )\) time, improving on the algorithm of Badanidiyuru and Vondrák that takes \(O(n \varepsilon ^{-1} \log (\varepsilon ^{-1} K) )\) time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Independently of our work, Norouzi-Fard et al. [43] proposed another algorithm with the same performance as the second one in Theorem 1

  2. Recently, it is shown in [44] that it suffices to enumerate all the sets of at most two items.

  3. This theorem is essentially a rephrasing of Theorem 5.

References

  1. Ahn, K.J., Guha, S.: Linear programming in the semi-streaming model with application to the maximum matching problem. Inform. Comput. 222, 59–79 (2013). 38th International Colloquium on Automata Languages and Programming (ICALP 2011)

    Article  MathSciNet  Google Scholar 

  2. Alaluf, N., Ene, A., Feldman, M., Nguyen, H.L., Suh, A.: Optimal streaming algorithms for submodular maximization with cardinality constraints. In: Czumaj, A., Dawar, A., Merelli, E. (eds.) 47th International Colloquium on Automata, Languages, and Programming, ICALP 2020, July 8-11, 2020, Saarbrücken, Germany (Virtual Conference), LIPIcs. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. https://doi.org/10.4230/LIPIcs.ICALP.2020.6, vol. 168, pp 6:1–6:19 (2020)

  3. Alon, N., Gamzu, I., Tennenholtz, M.: Optimizing budget allocation among channels and influencers. In: Proceedings of the 21st international conference on world wide web (WWW), pp 381–388 (2012)

  4. Badanidiyuru, A., Mirzasoleiman, B., Karbasi, A., Krause, A.: Streaming submodular maximization: massive data summarization on the fly. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp 671–680 (2014)

  5. Badanidiyuru, A., Vondrák, J.: Fast algorithms for maximizing submodular functions. In: Proceedings of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp 1497–1514 (2013)

  6. Balkanski, E., Rubinstein, A., Singer, Y.: An exponential speedup in parallel running time for submodular maximization without loss in approximation. In: Proceedings of the 30th Annual ACM-SIAM Symposium on Discrete Algorithms SODA, pp 283–302 (2019)

  7. Balkanski, E., Singer, Y.: The adaptive complexity of maximizing a submodular function. In: Proceedings of the 50th Annual ACM Symposium on Theory of Computing, STOC 2018, pp 1138–1151. ACM, New York (2018)

  8. Barbosa, R.D.P., Ene, A., Nguyễn, H.L., Ward, J.: A new framework for distributed submodular maximization. In: 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pp 645–654 (2016)

  9. Bateni, M., Esfandiari, H., Mirrokni, V.: Almost optimal streaming algorithms for coverage problems. In: Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, pp 13–23 (2017)

  10. Calinescu, G., Chekuri, C., Pál, M., Vondrák, J.: Maximizing a monotone submodular function subject to a matroid constraint. SIAM J. Comput. 40(6), 1740–1766 (2011)

    Article  MathSciNet  Google Scholar 

  11. Chakrabarti, A., Kale, S.: Submodular maximization meets streaming: matchings, matroids, and more. Math. Program. 154(1-2), 225–247 (2015)

    Article  MathSciNet  Google Scholar 

  12. Chan, T.H.H., Huang, Z., Jiang, S.H.C., Kang, N., Tang, Z.G.: Online submodular maximization with free disposal: Randomization beats for partition matroids online. In: Proceedings of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp 1204–1223 (2017)

  13. Chan, T.H.H., Jiang, S.H.C., Tang, Z.G., Wu, X.: Online Submodular Maximization Problem with Vector Packing Constraint. In: Annual European Symposium on Algorithms (ESA), pp 24:1–24:14 (2017)

  14. Chekuri, C., Gupta, S., Quanrud, K.: Streaming algorithms for submodular function maximization. In: Proceedings of the 42nd International Colloquium on Automata, Languages, and Programming (ICALP), vol. 9134, pp 318–330 (2015)

  15. Chekuri, C., Quanrud, K.: Submodular function maximization in parallel via the multilinear relaxation. In: Proceedings of the 30th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp 303–322 (2019)

  16. Chekuri, C., Vondrák, J., Zenklusen, R.: Submodular function maximization via the multilinear relaxation and contention resolution schemes. SIAM J. Comput. 43(6), 1831–1879 (2014)

    Article  MathSciNet  Google Scholar 

  17. Ene, A., Nguyễn, H.L.: A nearly-linear time algorithm for submodular maximization with a knapsack constraint. In: Baier, C., Chatzigiannakis, I., Flocchini, P., Leonardi, S. (eds.) 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019), LIPIcs, vol. 132, pp 53:1–53:12 (2019)

  18. Ene, A., Nguyễn, H.L.: Submodular maximization with nearly-optimal approximation and adaptivity in nearly-linear time. In: Proceedings of the 30th annual ACM-SIAM symposium on discrete algorithms (SODA), pp 274–282 (2019)

  19. Feldman, M., Karbasi, A., Kazemi, E.: Do less, get more: streaming submodular maximization with subsampling. In: Annual Conference on Neural Information Processing Systems (NeurIPS), pp 730–740 (2018)

  20. Feldman, M., Norouzi-Fard, A., Svensson, O., Zenklusen, R.: The one-way communication complexity of submodular maximization with applications to streaming and robustness. In: Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing (STOC), pp 1363–1374 (2020)

  21. Filmus, Y., Ward, J.: A tight combinatorial algorithm for submodular maximization subject to a matroid constraint. SIAM J. Comput. 43(2), 514–542 (2014)

    Article  MathSciNet  Google Scholar 

  22. Fisher, M.L., Nemhauser, G.L., Wolsey, L.A.: An analysis of approximations for maximizing submodular set functions i. Math. Program., pp. 265–294 (1978)

  23. Fisher, M.L., Nemhauser, G.L., Wolsey, L.A.: An analysis of approximations for maximizing submodular set functions ii. Math Prog Study 8, 73–87 (1978)

    Article  MathSciNet  Google Scholar 

  24. Haba, R., Kazemi, E., Feldman, M., Karbasi, A.: Streaming submodular maximization under a k-set system constraint. In: H.D. III, Singh, A (eds.) Proceedings of the 37th International Conference on Machine Learning, Proceedings of Machine Learning Research. PMLR. http://proceedings.mlr.press/v119/haba20a.html, vol. 119, pp 3939–3949 (2020)

  25. Huang, C., Kakimura, N.: Improved streaming algorithms for maximizing monotone submodular functions under a knapsack constraint. Algorithmica 83 (3), 879–902 (2021). https://doi.org/10.1007/s00453-020-00786-4

    Article  MathSciNet  Google Scholar 

  26. Huang, C.C., Kakimura, N., Mauras, S., Yoshida, Y.: Approximability of Monotone Submodular Function Maximization under Cardinality and Matroid Constraints in the Streaming Model, SIAM Journal on Discrete Mathematics, to appear (2021)

  27. Huang, C.C., Kakimura, N., Yoshida, Y.: Streaming algorithms for maximizing monotone submodular functions under a knapsack constraint. Algorithmica 82(4), 1006–1032 (2020)

    Article  MathSciNet  Google Scholar 

  28. Kale, S., Tirodkar, S.: Maximum matching in two, three, and a few more passes over graph streams. In: The 20th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems (APPROX2017) (2017)

  29. Kazemi, E., Mitrovic, M., Zadimoghaddam, M., Lattanzi, S., Karbasi, A.: Submodular streaming in all its glory: Tight approximation, minimum memory and low adaptive complexity. In: Proceedings of the 36th International Conference on Machine Learning (ICML), pp 3311–3320 (2019)

  30. Kempe, D., Kleinberg, J., Tardos, É.: Maximizing the spread of influence through a social network. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp 137–146 (2003)

  31. Krause, A., Singh, A.P., Guestrin, C.: Near-optimal sensor placements in gaussian processes: Theory, efficient algorithms and empirical studies. J. Mach. Learn. Res. 9, 235–284 (2008)

    MATH  Google Scholar 

  32. Kulik, A., Shachnai, H., Tamir, T.: Maximizing submodular set functions subject to multiple linear constraints. In: Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp 545–554 (2013)

  33. Kumar, R., Moseley, B., Vassilvitskii, S., Vattani, A.: Fast greedy algorithms in mapreduce and streaming. ACM Trans. Parallel Comput. 2(3), 14:1–14:22 (2015)

    Article  Google Scholar 

  34. Lee, J.: Maximum entropy sampling, encyclopedia of environmetrics, vol. 3, pp 1229–1234. Wiley, New York (2006)

    Google Scholar 

  35. Lee, J., Sviridenko, M., Vondrák, J.: Submodular maximization over multiple matroids via generalized exchange properties. Math. Oper. Res. 35(4), 795–806 (2010)

    Article  MathSciNet  Google Scholar 

  36. Lin, H., Bilmes, J.: Multi-document summarization via budgeted maximization of submodular functions. In: Proceedings of the 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language technologies (NAACL-HLT), pp 912–920 (2010)

  37. Lin, H., Bilmes, J.: A class of submodular functions for document summarization. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT), pp 510–520 (2011)

  38. McGregor, A.: Finding graph matchings in data streams. In: Proceedings of the 8th International Workshop on Approximation, Randomization and Combinatorial Optimization Problems, APPROX05, pp 170–181 (2005)

  39. McGregor, A.: Graph stream algorithms: a survey. SIGMOD Rec. 43(1), 9–20 (2014)

    Article  Google Scholar 

  40. McGregor, A., Vu, H.T.: Better streaming algorithms for the maximum coverage problem. In: International Conference on Database Theory (ICDT) (2017)

  41. Mirzasoleiman, B., Badanidiyuru, A., Karbasi, A., Vondrák, J., Krause, A.: Lazier than lazy greedy. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence, AAAI2015, pp 1812–1818. AAAI Press (2015)

  42. Mirzasoleiman, B., Jegelka, S., Krause, A.: Streaming non-monotone submodular maximization: personalized video summarization on the fly. In: Proc. Conference on Artificial Intelligence (AAAI) (2018)

  43. Norouzi-Fard, A., Tarnawski, J., Mitrovic, S., Zandieh, A., Mousavifar, A., Svensson, O.: Beyond 1/2-approximation for submodular maximization on massive data streams. In: Proceedings of the 35th International Conference on Machine Learning (ICML), pp 3826–3835 (2018)

  44. Nutov, Z., Shoham, E.: Practical budgeted submodular maximization (2020)

  45. da Ponte Barbosa, R., Ene, A., Nguyễn, H.L., Ward, J.: The power of randomization: Distributed submodular maximization on massive datasets. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, JMLR workshop and conference proceedings. JMLR.org. http://proceedings.mlr.press/v37/barbosa15.html, vol. 37, pp 1236–1244 (2015)

  46. Soma, T., Kakimura, N., Inaba, K., Kawarabayashi, K.: Optimal budget allocation: Theoretical guarantee and efficient algorithm. In: Proceedings of the 31st International Conference on Machine Learning (ICML), pp 351–359 (2014)

  47. Sviridenko, M.: A note on maximizing a submodular set function subject to a knapsack constraint. Oper. Res. Lett. 32(1), 41–43 (2004)

    Article  MathSciNet  Google Scholar 

  48. Wolsey, L.: Maximising real-valued submodular functions: primal and dual heuristics for location problems. Math. Oper. Res. (1982)

  49. Yaroslavtsev, G., Zhou, S., Avdiukhin, D.: “Bring Your Own Greedy”+Max: near-optimal 1/2-approximations for submodular knapsack. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp 3263–3274 (2020)

  50. Yoshida, Y.: Maximizing a monotone submodular function with a bounded curvature under a knapsack constraint. SIAM J. Discrete Math. 33(3), 1452–1471 (2019)

    Article  MathSciNet  Google Scholar 

  51. Yu, Q., Xu, E.L., Cui, S.: Streaming algorithms for news and scientific literature recommendation: Submodular maximization with a d-knapsack constraint. IEEE Global Conf Signal Inf Process (2016)

Download references

Acknowledgements

The authors thank the referees for their valuable comments on this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Naonori Kakimura.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The first author was supported by French National Research Agency (ANR), with grant ANR-19-CE48-0016 and ANR-18-CE40-0025-01.The second author was supported by JSPS KAKENHI Grant Numbers JP17K00028, JP18H05291, 20K21834, and 20H05795.

Appendix A: Proof of Lemma 7

Appendix A: Proof of Lemma 7

We discuss how to obtain a (0.5 − O(ε))-approximation when c1 + c2 is almost 1.

Lemma 22

Suppose that \(f(o_{1}+o_{2})\geq v^{\prime }\). We can find a set S using two passes and O(ε− 1K) space such that |S| = 2 and \(f(S)\geq \left (\frac {2}{3}-\varepsilon \right )v^{\prime }\).

We begin by reviewing the algorithmFootnote 3 in [27].

Theorem 8

Let \(E_{\mathrm {R}} \subseteq E\) be a subset of the ground set (and we call ER red items). Let \(X\subseteq E\) such that vf(X) ≤ (1 + ε)v. Assume that there exists xXER such that \(\underline {\tau } v \leq f(x) \leq \overline {\tau }v\). Then we can find a set \(Y \subseteq E_{\mathrm {R}}\) of red items, in one pass and O(n) time, with \(|Y| = O(\log _{1+\varepsilon } \frac {\overline {\tau }}{ \underline {\tau }})\) such that some item e in Y satisfies f(Xx + e) ≥ (2/3 − O(ε))v.

Proof of Lemma 22

For each \(t=1,2,\dots , K/2\), define Et = {eEtc(e) ≤ Kt} as the red items. The critical thing to observe is that, if tc(o2), we see o1Et.

The above observation suggests the following implementation. In the first pass, for each set Et, apply Theorem 8 to collect a set \(X_{t} \subseteq E_{t}\) (apparently we can set \(\overline {\tau }= 2/3\) and \(\underline {\tau }=1/3\)). Since \(|X_{t}|=O(\log _{1+\varepsilon }2)=O(\varepsilon ^{-1})\), it takes O(ε− 1K) space and O(n) time in total. Then it follows from Theorem 8 that, for each t with tc(o2), there exists eXt such that \(f(o_{2}+e^{\ast })\geq (2/3- O(\varepsilon )) v^{\prime }\) and c(e) ≤ Kc(o2). In the second pass, for each item e in E, check whether there exists \(e^{\prime }\) in Xc(e) such that \(c(e+e^{\prime })\leq K\) and \(f(e + e^{\prime }) \geq (2/3-O(\varepsilon ))v^{\prime }\). It follows that there exists at least one pair of e and \(e^{\prime }\) satisfying the condition. The second pass also takes O(ε− 1K) space as we keep Xt’s. Since |Xt| = O(ε− 1), the second phase takes O(ε− 1n) time. □

Suppose that vf(OPT) ≤ (1 + ε)v. If f(o1 + o2) ≥ 0.75v, then we are done using Lemma 22. So assume otherwise, meaning that f(OPT − o1o2) ≥ 0.25v. Notice that we can also assume that f(OPT − o1) ≥ 0.5v. Now consider two possibilities.

Claim

If \(c_{1} \geq 1- \sqrt {\varepsilon }\), then we can find a set S in O(ε− 1) passes and O(K) space such that c(S) ≤ K and f(S) ≥ (0.5 − O(ε))v.

Proof

Since \(c_{1} \geq 1- \sqrt {\varepsilon }\), we have \(c(\text {OPT} - o_{1}) \leq \sqrt {\varepsilon } K\). Consider the problem (1) to approximate OPT − o1. Then the largest item in OPT − o1 is c(o2) which is at most \(\sqrt {\varepsilon }K\). By Corollary 1, Simple\((\mathcal {I};0.5v, \sqrt {\varepsilon }K)\) can obtain a set S satisfying that

$$ f(S)\geq 0.5 \left( 1-e^{-\frac{1 - \sqrt{\varepsilon} }{\sqrt{\varepsilon}}} - O(\varepsilon)\right)v\geq (0.5 - O(\varepsilon))v, $$

where the last inequality follows because \(e^{-\frac {1 - \sqrt {\varepsilon } }{\sqrt {\varepsilon }}} \leq \varepsilon \) when ε ≤ 1. □

Claim

If \(c_{1} < 1- \sqrt {\varepsilon }\), then we can find a set S in O(ε− 1) passes and O(K) space such that c(S) ≤ K and f(S) ≥ (0.5 − O(ε))v.

Proof

Consider the problem:

$$ \text{maximize\ \ }f(S) \quad \text{subject to \ } c(S)\leq \sqrt{\varepsilon} K,\quad S\subseteq E,\\ $$

to approximate OPT − o1o1. Let \(\mathcal {I}^{\prime }\) be the corresponding instance. Since f(OPT − o1o2) ≥ 0.25v and c(OPT − o1o2) ≤ εK, Corollary 1 implies that Simple\((\mathcal {I}^{\prime };\) 0.25v,εK) can obtain a set Y satisfying that

$$ f(Y)\geq 0.25 \left( 1-e^{-\frac{\sqrt{\varepsilon} - \varepsilon}{\varepsilon}} - O(\varepsilon)\right)v\geq (0.25 - O(\varepsilon))v, $$

since the largest item in OPT − o1o2 has size at most ε. After taking the set Y, we still have space for packing either o1 or o2, since \(c(Y)\leq \sqrt {\varepsilon } K <K-c(o_{1})\).

Define g := f(⋅∣Y ). If some element e satisfies c(Y ) + c(e) ≤ K and f(Y + e) ≥ 0.5v, then we are done. Thus we may assume that no such element exists, implying that f(Y + o) < 0.5v for = 1, 2. Hence it holds that

$$ g(\text{OPT} -o_{1})\geq g(\text{OPT})-g(o_{1})\geq \left( f(\text{OPT})-f(Y) \right) - \left( f(Y+o_{1})-f(Y)\right)\geq 0.5 v, $$

This implies that

$$ \begin{array}{@{}rcl@{}} g(\text{OPT}-o_{1}-o_{2}) & \geq& g(\text{OPT} - o_{1}) - g(o_{2}) \\ & \geq& 0.5 v - (f(Y+o_{1})-f(Y)) \geq f(Y)\geq (0.25 - O(\varepsilon))v. \end{array} $$

Consider the problem:

$$ \text{maximize\ \ }g(S) \quad \text{subject to \ } c(S)\leq K - c(Y),\quad S\subseteq E,\\ $$

to approximate OPT − o1o1. Denote by \(\mathcal {I}^{\prime \prime }\) the corresponding instance. Since \(K-c(Y)\geq (1-\sqrt {\varepsilon })K\) and g(OPT − o1o2) ≥ (0.25 − O(ε))v, Corollary 1 implies that Simple\((\mathcal {I}^{\prime \prime }; (0.25 - O(\varepsilon ))v, \varepsilon K)\) can obtain a set S satisfying that

$$ f(S)\geq (0.25- O(\varepsilon)) \left( 1-e^{-\frac{1-\sqrt{\varepsilon} - \varepsilon}{\varepsilon}} - O(\varepsilon)\right)v\geq (0.25 - O(\varepsilon))v. $$

Therefore, YS satisfies that c(YS) ≤ K and

$$ f(Y\cup S) = f(Y) + g(S)\geq (0.5 - O(\varepsilon))v. $$

For a given v, the above can be done in O(ε− 1K) space using O(ε− 1) passes. The total running time is O(nε− 1). This completes the proof of Lemma 7.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, CC., Kakimura, N. Multi-Pass Streaming Algorithms for Monotone Submodular Function Maximization. Theory Comput Syst 66, 354–394 (2022). https://doi.org/10.1007/s00224-021-10065-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00224-021-10065-6

Keywords

Navigation