[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Social content matching in MapReduce

Published: 01 April 2011 Publication History

Abstract

Matching problems are ubiquitous. They occur in economic markets, labor markets, internet advertising, and elsewhere. In this paper we focus on an application of matching for social media. Our goal is to distribute content from information suppliers to information consumers. We seek to maximize the overall relevance of the matched content from suppliers to consumers while regulating the overall activity, e.g., ensuring that no consumer is overwhelmed with data and that all suppliers have chances to deliver their content.
We propose two matching algorithms, GreedyMR and StackMR, geared for the MapReduce paradigm. Both algorithms have provable approximation guarantees, and in practice they produce high-quality solutions. While both algorithms scale extremely well, we can show that Stack-MR requires only a poly-logarithmic number of MapReduce steps, making it an attractive option for applications with very large datasets. We experimentally show the trade-offs between quality and efficiency of our solutions on two large datasets coming from real-world social-media web sites.

References

[1]
E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne. Finding high-quality content in social media. In WSDM, pp. 183--194, 2008.
[2]
R. Baraglia, G. De Francisci Morales, and C. Lucchese. Document similarity self-join with mapreduce. In ICDM, pp. 731--736, 2010.
[3]
D. Charles, M. Chickering, N. R. Devanur, K. Jain, and M. Sanghi. Fast algorithms for finding matchings in lopsided bipartite graphs with applications to display ads. In EC, pp. 121--128, 2010.
[4]
F. Chierichetti, R. Kumar, and A. Tomkins. Max-cover in map-reduce. In WWW, pp. 231--240, 2010.
[5]
P. Christiano, J. A. Kelner, A. Madry, D. A. Spielman, and S.-H. Teng. Electrical flows, laplacian systems, and faster approximation of maximum flow in undirected graphs. CoRR, arXiv:1010.2921v2, 2010.
[6]
C.-T. Chu, S. K. Kim, Y.-A. Lin, Y. Yu, G. R. Bradski, A. Y. Ng, and K. Olukotun. Map-reduce for machine learning on multicore. In NIPS, pp. 281--288, 2006.
[7]
J. Cohen. Graph twiddling in a MapReduce world. Computing in Science and Engineering, 11(4):29--41, 2009.
[8]
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. CACM, 51(1):107--113, 2008.
[9]
T. Fischer, A. Goldberg, D. Haglin, and S. Plotkin. Approximating matchings in parallel. IPL, 46(3):115--118, 1993.
[10]
H. Gabow. An efficient reduction technique for degree-constrained subgraph and bidirected network flow problems. In STOC, pp. 448--456, 1983.
[11]
N. Garg, T. Kavitha, A. Kumar, K. Mehlhorn, and J. Mestre. Assigning papers to referees. Algorithmica, 58(1):119--136, 2010.
[12]
O. Garrido, S. Jarominek, A. Lingas, and W. Rytter. A simple randomized parallel algorithm for maximal f-matchings. IPL, 57(2):83--87, 1996.
[13]
A. V. Goldberg and S. Rao. Beyond the flow decomposition barrier. JACM, 45(5):783--797, 1998.
[14]
T. Jebara and V. Shchogolev. B-matching for spectral clustering. In ECML, pp. 679--686, 2006.
[15]
T. Jebara, J. Wang, and S. Chang. Graph construction and b-matching for semi-supervised learning. In ICML, pp. 441--448, 2009.
[16]
U. Kang, C. Tsourakakis, and C. Faloutsos. PEGASUS: A peta-scale graph mining system implementation and observations. In ICDM, pp. 229--238, 2009.
[17]
H. J. Karloff, S. Suri, and S. Vassilvitskii. A model of computation for MapReduce. In SODA, pp. 938--948, 2010.
[18]
C. Koufogiannakis and N. Young. Distributed fractional packing and maximum weighted b-matching via tail-recursive duality. In DISC, pp. 221--238, 2009.
[19]
J. Lin and C. Dyer. Data-intensive text processing with MapReduce. Morgan & Claypool Publishers, 2010.
[20]
Z. Lotker, B. Patt-Shamir, and S. Pettie. Improved distributed approximate matching. In SPAA, pp. 129--136, 2008.
[21]
A. Panconesi and M. Sozio. Fast primal-dual distributed algorithms for scheduling and matching problems. Distributed Computing, 22(4):269--283, 2010.
[22]
M. Penn and M. Tennenholtz. Constrained multi-object auctions and b-matching. IPL, 75(1--2):29--34, 2000.
[23]
V. Vazirani. Approximation algorithms. Springer-Verlag, 2001.
[24]
M. Wattenhofer and R. Wattenhofer. Distributed weighted matching. In DISC, pp. 335--348, 2004.

Cited By

View all
  • (2016)Designing scalable b-Matching algorithms on distributed memory multiprocessors by approximationProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3014904.3014993(1-11)Online publication date: 13-Nov-2016
  • (2015)Scalable Facility Location for Massive Graphs on Pregel-like SystemsProceedings of the 24th ACM International on Conference on Information and Knowledge Management10.1145/2806416.2806508(273-282)Online publication date: 17-Oct-2015
  • (2014)Show me the moneyProceedings of the VLDB Endowment10.14778/2733085.27330867:14(1785-1796)Online publication date: 1-Oct-2014
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 4, Issue 7
April 2011
61 pages

Publisher

VLDB Endowment

Publication History

Published: 01 April 2011
Published in PVLDB Volume 4, Issue 7

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2016)Designing scalable b-Matching algorithms on distributed memory multiprocessors by approximationProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3014904.3014993(1-11)Online publication date: 13-Nov-2016
  • (2015)Scalable Facility Location for Massive Graphs on Pregel-like SystemsProceedings of the 24th ACM International on Conference on Information and Knowledge Management10.1145/2806416.2806508(273-282)Online publication date: 17-Oct-2015
  • (2014)Show me the moneyProceedings of the VLDB Endowment10.14778/2733085.27330867:14(1785-1796)Online publication date: 1-Oct-2014
  • (2014)Correlation clustering in MapReduceProceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/2623330.2623743(641-650)Online publication date: 24-Aug-2014
  • (2013)A distributed algorithm for large-scale generalized matchingProceedings of the VLDB Endowment10.14778/2536360.25363626:9(613-624)Online publication date: 1-Jul-2013
  • (2013)The family of mapreduce and large-scale data processing systemsACM Computing Surveys10.1145/2522968.252297946:1(1-44)Online publication date: 11-Jul-2013
  • (2013)Minimal MapReduce algorithmsProceedings of the 2013 ACM SIGMOD International Conference on Management of Data10.1145/2463676.2463719(529-540)Online publication date: 22-Jun-2013
  • (2013)Computing n-gram statistics in MapReduceProceedings of the 16th International Conference on Extending Database Technology10.1145/2452376.2452389(101-112)Online publication date: 18-Mar-2013
  • (2012)Densest subgraph in streaming and MapReduceProceedings of the VLDB Endowment10.14778/2140436.21404425:5(454-465)Online publication date: 1-Jan-2012

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media