[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2287036.2287041acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

Accelerate large-scale iterative computation through asynchronous accumulative updates

Published: 18 June 2012 Publication History

Abstract

Myriad of data mining algorithms in scientific computing require parsing data sets iteratively. These iterative algorithms have to be implemented in a distributed environment to scale to massive data sets. To accelerate iterative computations in a large-scale distributed environment, we identify a broad class of iterative computations that can accumulate iterative update results. Specifically, different from traditional iterative computations, which iteratively update the result based on the result from the previous iteration, accumulative iterative update accumulates the intermediate iterative update results. We prove that an accumulative update will yield the same result as its corresponding traditional iterative update. Furthermore, accumulative iterative computation can be performed asynchronously and converges much faster. We present a general computation model to describe asynchronous accumulative iterative computation. Based on the computation model, we design and implement a distributed framework, Maiter. We evaluate Maiter on Amazon EC2 Cloud with 100 EC2 instances. Our results show that Maiter achieves as much as 60x speedup over Hadoop for implementing iterative algorithms.

References

[1]
Amazon ec2. http://aws.amazon.com/ec2/.
[2]
Hadoop. http://hadoop.apache.org/.
[3]
Maiter project. http://code.google.com/p/maiter/.
[4]
Open mpi. http://www.open-mpi.org/.
[5]
Stanford dataset. http://snap.stanford.edu/data/.
[6]
S. Baluja, R. Seth, D. Sivakumar, Y. Jing, J. Yagnik, S. Kumar, D. Ravichandran, and M. Aly. Video suggestion and discovery for youtube: taking random walks through the view graph. In WWW '08, pages 895--904, 2008.
[7]
G. M. Baudet. Asynchronous iterative methods for multiprocessors. J. ACM, 25:226--244, April 1978.
[8]
D. P. Bertsekas. Distributed asynchronous computation of fixed points. Math. Programming, 27:107--120, 1983.
[9]
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst., 30:107--117, April 1998.
[10]
Y. Bu, B. Howe, M. Balazinska, and D. M. Ernst. Haloop: Efficient iterative data processing on large clusters. In VLDB '10, 2010.
[11]
D. Chazan and W. Miranker. Chaotic relaxation. Linear Algebra and its Applications, 2(2):199 -- 222, 1969.
[12]
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. In OSDI'04, pages 10--10, 2004.
[13]
J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: a runtime for iterative mapreduce. In MAPREDUCE '10, pages 810--818, 2010.
[14]
A. Frommer and D. B. Szyld. On asynchronous iterations. J. Comput. Appl. Math., 123:201--216, November 2000.
[15]
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In EuroSys '07, pages 59--72, 2007.
[16]
K. Kambatla, N. Rapolu, S. Jagannathan, and A. Grama. Asynchronous algorithms in mapreduce. In CLUSTER' 10, pages 245 --254, 2010.
[17]
U. Kang, C. Tsourakakis, and C. Faloutsos. Pegasus: A peta-scale graph mining system implementation and observations. In ICDM '09, pages 229 --238, 2009.
[18]
L. Katz. A new status index derived from sociometric analysis. Psychometrika, 18:39--43, 1953.
[19]
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46:604--632, September 1999.
[20]
D. Liben-Nowell and J. Kleinberg. The link-prediction problem for social networks. J. Am. Soc. Inf. Sci. Technol., 58:1019--1031, May 2007.
[21]
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Graphlab: A new framework for parallel machine learning. In UAI '10, 2010.
[22]
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD '10, pages 135--146, 2010.
[23]
R. Power and J. Li. Piccolo: Building fast, distributed programs with partitioned tables. In OSDI'10, pages 1--14, 2010.
[24]
H. H. Song, T. W. Cho, V. Dave, Y. Zhang, and L. Qiu. Scalable proximity estimation and link prediction in online social networks. In IMC '09, pages 322--335, 2009.
[25]
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: cluster computing with working sets. In HotCloud'10, 2010.
[26]
Y. Zhang, Q. Gao, L. Gao, and C. Wang. imapreduce: A distributed computing framework for iterative computation. In DataCloud '11, pages 1112--1121, 2011.
[27]
Y. Zhang, Q. Gao, L. Gao, and C. Wang. Priter: A distributed framework for prioritized iterative computations. In SOCC '11, 2011.
[28]
Y. Zhang, Q. Gao, L. Gao, and C. Wang. Maiter: A message-passing distributed framework for accumulative iterative computation, http://rio.ecs.umass.edu/~yzhang/maiter-full.pdf. Technical report, 2012.

Cited By

View all
  • (2021)DiterGraph: Toward I/O-Efficient Incremental Computation over Large Graphs with Billion Edges2021 7th International Conference on Big Data Computing and Communications (BigCom)10.1109/BigCom53800.2021.00006(30-37)Online publication date: Aug-2021
  • (2020)SubwayProceedings of the Fifteenth European Conference on Computer Systems10.1145/3342195.3387537(1-16)Online publication date: 15-Apr-2020
  • (2019)SPFC: An Effective Optimization for Vertex-Centric Graph Processing SystemsIEEE Transactions on Sustainable Computing10.1109/TSUSC.2017.27803204:1(118-131)Online publication date: 1-Jan-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ScienceCloud '12: Proceedings of the 3rd workshop on Scientific Cloud Computing
June 2012
80 pages
ISBN:9781450313407
DOI:10.1145/2287036
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. asynchronous accumulative update
  2. iterative computation
  3. maiter

Qualifiers

  • Research-article

Conference

HPDC'12
Sponsor:

Acceptance Rates

Overall Acceptance Rate 44 of 151 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2021)DiterGraph: Toward I/O-Efficient Incremental Computation over Large Graphs with Billion Edges2021 7th International Conference on Big Data Computing and Communications (BigCom)10.1109/BigCom53800.2021.00006(30-37)Online publication date: Aug-2021
  • (2020)SubwayProceedings of the Fifteenth European Conference on Computer Systems10.1145/3342195.3387537(1-16)Online publication date: 15-Apr-2020
  • (2019)SPFC: An Effective Optimization for Vertex-Centric Graph Processing SystemsIEEE Transactions on Sustainable Computing10.1109/TSUSC.2017.27803204:1(118-131)Online publication date: 1-Jan-2019
  • (2018)Scalable Graph Processing FrameworksACM Computing Surveys10.1145/319952351:3(1-53)Online publication date: 12-Jun-2018
  • (2017)iiHadoop: an asynchronous distributed framework for incremental iterative computationsJournal of Big Data10.1186/s40537-017-0086-34:1Online publication date: 24-Jul-2017
  • (2017)PMSProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3133087(1999-2002)Online publication date: 6-Nov-2017
  • (2017)CoRALACM SIGARCH Computer Architecture News10.1145/3093337.303774745:1(223-236)Online publication date: 4-Apr-2017
  • (2017)CoRALACM SIGPLAN Notices10.1145/3093336.303774752:4(223-236)Online publication date: 4-Apr-2017
  • (2017)CoRALACM SIGOPS Operating Systems Review10.1145/3093315.303774751:2(223-236)Online publication date: 4-Apr-2017
  • (2017)CoRALProceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3037697.3037747(223-236)Online publication date: 4-Apr-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media