[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2723372.2742793acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

REEF: Retainable Evaluator Execution Framework

Published: 27 May 2015 Publication History

Abstract

Resource Managers like Apache YARN have emerged as a critical layer in the cloud computing system stack, but the developer abstractions for leasing cluster resources and instantiating application logic are very low-level. This flexibility comes at a high cost in terms of developer effort, as each application must repeatedly tackle the same challenges (e.g., fault-tolerance, task scheduling and coordination) and re-implement common mechanisms (e.g., caching, bulk-data transfers). This paper presents REEF, a development framework that provides a control-plane for scheduling and coordinating task-level (data-plane) work on cluster resources obtained from a Resource Manager. REEF provides mechanisms that facilitate resource re-use for data caching, and state management abstractions that greatly ease the development of elastic data processing work-flows on cloud platforms that support a Resource Manager service. REEF is being used to develop several commercial offerings such as the Azure Stream Analytics service. Furthermore, we demonstrate REEF development of a distributed shell application, a machine learning algorithm, and a port of the CORFU [4] system. REEF is also currently an Apache Incubator project that has attracted contributors from several instititutions.1 http://reef.incubator.apache.org

References

[1]
A. Agarwal, O. Chapelle, M. Dudí;k, and J. Langford. A reliable effective terascale linear learning system. CoRR, abs/1110.4198, 2011.
[2]
A. Ahmed, M. Aly, J. Gonzalez, S. Narayanamurthy, and A. J. Smola. Scalable inference in latent variable models. In WSDM '12, 2012.
[3]
P. Alvaro, N. Conway, J. Hellerstein, and W. R. Marczak. Consistency analysis in bloom: a calm and collected approach. In CIDR, pages 249--260, 2011.
[4]
M. Balakrishnan, D. Malkhi, J. D. Davis, V. Prabhakaran, M. Wei, and T. Wobber. Corfu: A distributed shared log. ACM Transactions on Computer Systems (TOCS), 31(4):10, 2013.
[5]
D. Battré, S. Ewen, F. Hueske, O. Kao, V. Markl, and D. Warneke. Nephele/PACTs: A programming model and execution framework for web-scale analytical processing. In SOCC, 2010.
[6]
A. Beutel, M. Weimer, V. Narayanan, and Y. Z. Tom Minka. Elastic distributed bayesian collaborative filtering. In NIPS workshop on Distributed Machine Learning and Matrix Computations, 2014.
[7]
V. Borkar, Y. Bu, M. J. Carey, J. Rosen, N. Polyzotis, T. Condie, M. Weimer, and R. Ramakrishnan. Declarative systems for large-scale machine learning. TCDE, 35(2), 2012.
[8]
V. Borkar, M. Carey, R. Grover, N. Onose, and R. Vernica. Hyracks: A flexible and extensible foundation for data-intensive computing. In ICDE, 2011.
[9]
O. Bousquet and L. Bottou. The tradeoffs of large scale learning. In Advances in Neural Information Processing Systems, pages 161--168, 2007.
[10]
C.-T. Chu, S. K. Kim, Y.-A. Lin, Y. Yu, G. R. Bradski, A. Y. Ng, and K. Olukotun. Map-reduce for machine learning on multicore. In Advances in Neural Information Processing Systems, 2006.
[11]
J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Commun. ACM, 51, 2008.
[12]
J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: a runtime for iterative mapreduce. In HPDC, 2010.
[13]
Google. Guice. https://github.com/google/guice.
[14]
W. Gropp, S. Huss-Lederman, A. Lumsdaine, E. Lusk, B. Nitzberg, W. Saphir, and M. Snir. MPI - The Complete Reference: Volume 2, The MPI-2 Extensions. MIT Press, Cambridge, MA, USA, 1998.
[15]
B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, and I. Stoica. Mesos: A platform for fine-grained resource sharing in the data center. In NSDI, pages 22--22. USENIX Association, 2011.
[16]
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In Eurosys, 2007.
[17]
S. Kavulya, J. Tan, R. Gandhi, and P. Narasimhan. An analysis of traces from a production mapreduce cluster. In Proceedings of the 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, CCGRID '10, pages 94--103, Washington, DC, USA, 2010. IEEE Computer Society.
[18]
M. Kearns. Efficient noise-tolerant learning from statistical queries. J. ACM, 45(6):983--1006, 1998.
[19]
E. Kohler, R. Morris, B. Chen, J. Jannotti, and M. F. Kaashoek. The click modular router. ACM Transactions on Computer Systems (TOCS), 18(3):263--297, 2000.
[20]
J. Kreps, N. Narkhede, and J. Rao. Kafka: A distributed messaging system for log processing. In NetDB, 2011.
[21]
A. Kumar, N. Karampatziakis, P. Mineiro, M. Weimer, and V. Narayanan. Distributed and scalable pca in the cloud. In BigLearn NIPS Workshop, 2013.
[22]
M. Li, D. G. Andersen, J. W. Park, A. J. Smola, A. Ahmed, V. Josifovski, J. Long, E. J. Shekita, and B.-Y. Su. Scaling distributed machine learning with the parameter server. In Proc. OSDI, pages 583--598, 2014.
[23]
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. GraphLab: A New Parallel Framework for Machine Learning. In Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, 2010.
[24]
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In Proceedings of the ACM SIGMOD International Conference on Management of data, SIGMOD '10, pages 135--146, New York, NY, USA, 2010. ACM.
[25]
N. Marz. Storm: Distributed and fault-tolerant realtime computation. http://storm.apache.org.
[26]
E. Meijer. Your mouse is a database. Commun. ACM, 55(5):66--73, 2012.
[27]
D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: A timely dataflow system. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, pages 439--455, New York, NY, USA, 2013. ACM.
[28]
S. Narayanamurthy, M. Weimer, D. Mahajan, T. Condie, S. Sellamanickam, and S. S. Keerthi. Towards resource-elastic machine learning. In BigLearn NIPS Workshop, 2013.
[29]
L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed stream computing platform. In ICDMW, 2010.
[30]
C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data processing. In Proceedings of the ACM SIGMOD international conference on Management of data, SIGMOD '08, pages 1099--1110, New York, NY, USA, 2008. ACM.
[31]
A. Rabkin. Using program analysis to reduce misconfiguration in open source systems software. Ph.D. Dissertation, UC Berkeley, 2012.
[32]
D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by error propagation. Technical report, DTIC Document, 1985.
[33]
B. Saha, H. Shah, S. Seth, G. Vijayaraghavan, A. Murthy, and C. Curino. Apache tez: A unifying framework for modeling and building data processing applications. In SIGMOD 2015, 2015.
[34]
M. Schwarzkopf, A. Konwinski, M. Abd-El-Malek, and J. Wilkes. Omega: flexible, scalable schedulers for large compute clusters. In EuroSys, pages 351--364, 2013.
[35]
M. Shapiro and N. M. Preguiça. Designing a commutative replicated data type. CoRR, abs/0710.1784, 2007.
[36]
M. Stonebraker and U. Cetintemel. One size fits all: An idea whose time has come and gone. In Proceedings of the 21st International Conference on Data Engineering, ICDE '05, pages 2--11, Washington, DC, USA, 2005. IEEE Computer Society.
[37]
The Apache Software Foundation. Apache Accumulo. http://accumulo.apache.org/.
[38]
The Apache Software Foundation. Apache Giraph. http://giraph.apache.org/.
[39]
The Apache Software Foundation. Apache Hadoop. http://hadoop.apache.org.
[40]
The Apache Software Foundation. Apache Mahout. http://mahout.apache.org.
[41]
The Apache Software Foundation. Apache Slider. http://slider.incubator.apache.org/.
[42]
The Apache Software Foundation. Apache Twill. http://twill.incubator.apache.org/.
[43]
The Netty project. Netty. http://netty.io.
[44]
A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive -- a warehousing solution over a map-reduce framework. In PVLDB, 2009.
[45]
L. G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8):103--111, 1990.
[46]
V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O'Malley, S. Radia, B. Reed, and E. Baldeschwieler. Apache hadoop yarn: Yet another resource negotiator. In SOCC, 2013.
[47]
M. Weimer, S. Rao, and M. Zinkevich. A convenient framework for efficient parallel multipass algorithms. In LCCC, 2010.
[48]
M. Welsh. What I wish systems researchers would work on. http://matt-welsh.blogspot.com/2013/05/what-i-wish-systems-researchers-would.html.
[49]
M. Welsh, D. Culler, and E. Brewer. Seda: an architecture for well-conditioned, scalable internet services. In SIGOPS, volume 35, pages 230--243. ACM, 2001.
[50]
J. Ye, J.-H. Chow, J. Chen, and Z. Zheng. Stochastic gradient boosted distributed decision trees. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM '09, pages 2061--2064, New York, NY, USA, 2009. ACM.
[51]
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI, 2012.
[52]
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: cluster computing with working sets. In HotCloud, 2010.
[53]
J. Zhou, N. Bruno, M.-C. Wu, P.-A. Larson, R. Chaiken, and D. Shakib. Scope: Parallel databases meet mapreduce. VLDB Journal, 21(5), 2012.

Cited By

View all
  • (2021)Apache Nemo: A Framework for Optimizing Distributed Data ProcessingACM Transactions on Computer Systems10.1145/346814438:3-4(1-31)Online publication date: 15-Oct-2021
  • (2021)Elasecutor: Elastic Executor Scheduling in Data Analytics SystemsIEEE/ACM Transactions on Networking10.1109/TNET.2021.305092729:2(681-694)Online publication date: 15-Apr-2021
  • (2020)Scalable Deep Learning on Distributed InfrastructuresACM Computing Surveys10.1145/336355453:1(1-37)Online publication date: 6-Feb-2020
  • Show More Cited By

Index Terms

  1. REEF: Retainable Evaluator Execution Framework

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
      May 2015
      2110 pages
      ISBN:9781450327589
      DOI:10.1145/2723372
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 May 2015

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. big data
      2. databases
      3. distributed systems
      4. hadoop
      5. high performance computing
      6. machine learning

      Qualifiers

      • Research-article

      Funding Sources

      • NIBIB
      • NSF

      Conference

      SIGMOD/PODS'15
      Sponsor:
      SIGMOD/PODS'15: International Conference on Management of Data
      May 31 - June 4, 2015
      Victoria, Melbourne, Australia

      Acceptance Rates

      SIGMOD '15 Paper Acceptance Rate 106 of 415 submissions, 26%;
      Overall Acceptance Rate 785 of 4,003 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)13
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 26 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)Apache Nemo: A Framework for Optimizing Distributed Data ProcessingACM Transactions on Computer Systems10.1145/346814438:3-4(1-31)Online publication date: 15-Oct-2021
      • (2021)Elasecutor: Elastic Executor Scheduling in Data Analytics SystemsIEEE/ACM Transactions on Networking10.1109/TNET.2021.305092729:2(681-694)Online publication date: 15-Apr-2021
      • (2020)Scalable Deep Learning on Distributed InfrastructuresACM Computing Surveys10.1145/336355453:1(1-37)Online publication date: 6-Feb-2020
      • (2019)Apache nemoProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358824(177-190)Online publication date: 10-Jul-2019
      • (2019)BLAS-on-flashProceedings of the 16th USENIX Conference on Networked Systems Design and Implementation10.5555/3323234.3323273(469-483)Online publication date: 26-Feb-2019
      • (2019)HydraProceedings of the 16th USENIX Conference on Networked Systems Design and Implementation10.5555/3323234.3323250(177-191)Online publication date: 26-Feb-2019
      • (2019)FfDLProceedings of the 20th International Middleware Conference10.1145/3361525.3361538(82-95)Online publication date: 9-Dec-2019
      • (2019)Automating System Configuration of Distributed Machine Learning2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS.2019.00203(2057-2067)Online publication date: Jul-2019
      • (2018)ElasecutorProceedings of the ACM Symposium on Cloud Computing10.1145/3267809.3267818(107-120)Online publication date: 11-Oct-2018
      • (2017)Apache REEFACM Transactions on Computer Systems10.1145/313203735:2(1-31)Online publication date: 10-Oct-2017
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media