[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1807128.1807148acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Nephele/PACTs: a programming model and execution framework for web-scale analytical processing

Published: 10 June 2010 Publication History

Abstract

We present a parallel data processor centered around a programming model of so called Parallelization Contracts (PACTs) and the scalable parallel execution engine Nephele [18]. The PACT programming model is a generalization of the well-known map/reduce programming model, extending it with further second-order functions, as well as with Output Contracts that give guarantees about the behavior of a function. We describe methods to transform a PACT program into a data flow for Nephele, which executes its sequential building blocks in parallel and deals with communication, synchronization and fault tolerance. Our definition of PACTs allows to apply several types of optimizations on the data flow during the transformation.
The system as a whole is designed to be as generic as (and compatible to) map/reduce systems, while overcoming several of their major weaknesses: 1) The functions map and reduce alone are not sufficient to express many data processing tasks both naturally and efficiently. 2) Map/reduce ties a program to a single fixed execution strategy, which is robust but highly suboptimal for many tasks. 3) Map/reduce makes no assumptions about the behavior of the functions. Hence, it offers only very limited optimization opportunities. With a set of examples and experiments, we illustrate how our system is able to naturally represent and efficiently execute several tasks that do not fit the map/reduce model well.

References

[1]
Hadoop. URL: http://hadoop.apache.org.
[2]
TPC-H. URL: http://www.tpc.org/tpch/.
[3]
K. Beyer, V. Ercegovac, J. Rao, and E. Shekita. Jaql: A JSON Query Language. URL: http://jaql.org.
[4]
R. Chaiken, B. Jenkins, P.-A. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou. SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets. PVLDB, 1(2):1265--1276, 2008.
[5]
J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In OSDI, pages 137--150, 2004.
[6]
J. Delmerico, N. Byrnes, A. Bruno, M. Jones, S. Gallo, and V. Chaudhary. Comparing the Performance of Clusters, Hadoop, and Active Disks on Microarray Correlation Computations. In International Conference on High Performance Computing, 2009.
[7]
D. J. DeWitt, R. H. Gerber, G. Graefe, M. L. Heytens, K. B. Kumar, and M. Muralikrishna. GAMMA - A High Performance Dataflow Database Machine. In W. W. Chu, G. Gardarin, S. Ohsuga, and Y. Kambayashi, editors, VLDB, pages 228--237. Morgan Kaufmann, 1986.
[8]
E. Friedman, P. M. Pawlowski, and J. Cieslewicz. SQL/MapReduce: A Practical Approach to Self-describing, Polymorphic, and Parallelizable User-defined Functions. PVLDB, 2(2):1402--1413, 2009.
[9]
S. Fushimi, M. Kitsuregawa, and H. Tanaka. An Overview of The System Software of A Parallel Relational Database Machine GRACE. In W. W. Chu, G. Gardarin, S. Ohsuga, and Y. Kambayashi, editors, VLDB, pages 209--219. Morgan Kaufmann, 1986.
[10]
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks. In P. Ferreira, T. R. Gross, and L. Veiga, editors, EuroSys, pages 59--72. ACM, 2007.
[11]
C. Olston, B. Reed, A. Silberstein, and U. Srivastava. Automatic Optimization of Parallel Dataflow Programs. In R. Isaacs and Y. Zhou, editors, USENIX Annual Technical Conference, pages 267--273. USENIX Association, 2008.
[12]
C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: A Not-So-Foreign Language for Data Processing. In J. T.-L. Wang, editor, SIGMOD Conference, pages 1099--1110. ACM, 2008.
[13]
A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker. A Comparison of Approaches to Large-Scale Data Analysis. In U. Cetintemel, S. B. Zdonik, D. Kossmann, and N. Tatbul, editors, SIGMOD Conference, pages 165--178. ACM, 2009.
[14]
D. A. Schneider and D. J. DeWitt. A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment. In J. Clifford, B. G. Lindsay, and D. Maier, editors, SIGMOD Conference, pages 110--121. ACM Press, 1989.
[15]
P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access Path Selection in a Relational Database Management System. In P. A. Bernstein, editor, SIGMOD Conference, pages 23--34. ACM, 1979.
[16]
J. W. Stamos and H. C. Young. A Symmetric Fragment and Replicate Algorithm for Distributed Joins. IEEE Trans. Parallel Distrib. Syst., 4(12):1345--1354, 1993.
[17]
A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive - A Warehousing Solution Over a Map-Reduce Framework. PVLDB, 2(2):1626--1629, 2009.
[18]
D. Warneke and O. Kao. Nephele: Efficient Parallel Data Processing in the Cloud. In I. Raicu, I. T. Foster, and Y. Zhao, editors, SC-MTAGS. ACM, 2009.
[19]
C. Yang, C. Yen, C. Tan, and S. Madden. Osprey: Implementing MapReduce-Style Fault Tolerance in a Shared-Nothing Distributed Database. In ICDE, 2009.
[20]
H. Yang, A. Dasdan, R.-L. Hsiao, and D. S. Parker. Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters. In C. Y. Chan, B. C. Ooi, and A. Zhou, editors, SIGMOD Conference, pages 1029--1040. ACM, 2007.
[21]
Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K. Gunda, and J. Currey. DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language. In R. Draves and R. van Renesse, editors, OSDI, pages 1--14. USENIX Association, 2008.

Cited By

View all
  • (2024)FUDJ: Flexible User-Defined Distributed Joins2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00320(4194-4207)Online publication date: 13-May-2024
  • (2023)Symphony of Sensors: The Harmonious Art of Massive Data Generation by Devices2023 26th International Symposium on Wireless Personal Multimedia Communications (WPMC)10.1109/WPMC59531.2023.10338947(1-5)Online publication date: 19-Nov-2023
  • (2022)Hybrid Clustering Technique to Cluster Big Data in the Hadoop EcosystemHandbook of Research on Technologies and Systems for E-Collaboration During Global Crises10.4018/978-1-7998-9640-1.ch015(218-233)Online publication date: 2022
  • Show More Cited By

Index Terms

  1. Nephele/PACTs: a programming model and execution framework for web-scale analytical processing

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SoCC '10: Proceedings of the 1st ACM symposium on Cloud computing
    June 2010
    264 pages
    ISBN:9781450300360
    DOI:10.1145/1807128
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 June 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cloud computing
    2. map reduce
    3. web-scale data

    Qualifiers

    • Research-article

    Conference

    SOCC '10
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 169 of 722 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 14 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)FUDJ: Flexible User-Defined Distributed Joins2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00320(4194-4207)Online publication date: 13-May-2024
    • (2023)Symphony of Sensors: The Harmonious Art of Massive Data Generation by Devices2023 26th International Symposium on Wireless Personal Multimedia Communications (WPMC)10.1109/WPMC59531.2023.10338947(1-5)Online publication date: 19-Nov-2023
    • (2022)Hybrid Clustering Technique to Cluster Big Data in the Hadoop EcosystemHandbook of Research on Technologies and Systems for E-Collaboration During Global Crises10.4018/978-1-7998-9640-1.ch015(218-233)Online publication date: 2022
    • (2022)Relationship between nitrifying microorganisms and other microorganisms residing in the maize rhizosphereArchives of Microbiology10.1007/s00203-022-02857-2204:5Online publication date: 8-Apr-2022
    • (2022)Apache FlinkEncyclopedia of Big Data Technologies10.1007/978-3-319-63962-8_303-2(1-9)Online publication date: 17-May-2022
    • (2021)The Big Data EraAnalyzing Future Applications of AI, Sensors, and Robotics in Society10.4018/978-1-7998-3499-1.ch006(87-103)Online publication date: 2021
    • (2021)SPARQL2Flink: Evaluation of SPARQL Queries on Apache FlinkApplied Sciences10.3390/app1115703311:15(7033)Online publication date: 30-Jul-2021
    • (2021)FangornProceedings of the VLDB Endowment10.14778/3476311.347637614:12(2972-2985)Online publication date: 1-Jul-2021
    • (2021)Scalable querying of nested dataProceedings of the VLDB Endowment10.14778/3430915.343093314:3(445-457)Online publication date: 9-Dec-2021
    • (2021)Declarative Data Analytics: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.295808433:6(2392-2411)Online publication date: 1-Jun-2021
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media