[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2463676.2465258acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
demonstration

CARTILAGE: adding flexibility to the Hadoop skeleton

Published: 22 June 2013 Publication History

Abstract

Modern enterprises have to deal with a variety of analytical queries over very large datasets. In this respect, Hadoop has gained much popularity since it scales to thousand of nodes and terabytes of data. However, Hadoop suffers from poor performance, especially in I/O performance. Several works have proposed alternate data storage for Hadoop in order to improve the query performance. However, many of these works end up making deep changes in Hadoop or HDFS. As a result, they are (i) difficult to adopt by several users, and (ii) not compatible with future Hadoop releases. In this paper, we present CARTILAGE, a comprehensive data storage framework built on top of HDFS. CARTILAGE allows users full control over their data storage, including data partitioning, data replication, data layouts, and data placement. Furthermore, CARTILAGE can be layered on top of an existing HDFS installation. This means that Hadoop, as well as other query engines, can readily make use of CARTILAGE. We describe several use-cases of CARTILAGE and propose to demonstrate the flexibility and efficiency of CARTILAGE through a set of novel scenarios.

References

[1]
A. Abouzeid, K. Bajda-Pawlikowski, D. Abadi, A. Silberschatz, and A. Rasin. HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. PVLDB, 2(1), 2009.
[2]
P. Bhatotia et al. Incoop: Mapreduce for Incremental Computations. In SOCC, 2011.
[3]
S. Chen. Cheetah: A High Performance, Custom Data Warehouse on Top of MapReduce. PVLDB, 3(2), 2010.
[4]
P. Cudre-Mauroux et al. The Case for RodentStore, an Adaptive, Declarative Storage System. In CIDR, 2009.
[5]
J. Dittrich, J.-A. Quiané-Ruiz, A. Jindal, Y. Kargin, V. Setty, and J. Schad. Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing). PVLDB, 3(1), 2010.
[6]
J. Dittrich, J.-A. Quiané-Ruiz, S. Richter, S. Schuh, A. Jindal, and J. Schad. Only Aggressive Elephants are Fast Elephants. PVLDB, 5(11), 2012.
[7]
A. Floratou et al. Column-Oriented Storage Techniques for MapReduce. PVLDB, 4(7), 2011.
[8]
Y. He et al. RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems. In ICDE, 2011.
[9]
A. Jindal, J.-A. Quiané-Ruiz, and J. Dittrich. Trojan Data Layouts: Right Shoes for a Running Elephant. In SOCC, 2011.
[10]
A. Jindal, J.-A. Quiané-Ruiz, and J. Dittrich. WWHow!: Freeing Data Storage from Cages. In CIDR, 2013.
[11]
J. Lin et al. Full-Text Indexing for Optimizing Selection Operations in Large-Scale Data Analytics. MapReduce Workshop, 2011.
[12]
Y. Lin et al. Llama: Leveraging Columnar Storage for Scalable Join Processing in the MapReduce Framework. In SIGMOD, 2011.
[13]
LUBM benchmark, swat.cse.lehigh.edu/projects/lubm.
[14]
A. Pavlo et al. A comparison of approaches to large-scale data analysis. In SIGMOD, 2009.
[15]
Sloan Digital Sky Survey, sdss.org.
[16]
TPC-H, tpc.org/tpch.

Cited By

View all
  • (2020)A distributed data exchange engine for polystoresit - Information Technology10.1515/itit-2019-003762:3-4(145-156)Online publication date: 4-Mar-2020
  • (2019)Muses: Distributed Data Migration System for Polystores2019 IEEE 35th International Conference on Data Engineering (ICDE)10.1109/ICDE.2019.00152(1602-1605)Online publication date: Apr-2019
  • (2016)KangarooProceedings of the Ninth ACM International Conference on Web Search and Data Mining10.1145/2835776.2835841(397-406)Online publication date: 8-Feb-2016
  • Show More Cited By

Index Terms

  1. CARTILAGE: adding flexibility to the Hadoop skeleton

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
    June 2013
    1322 pages
    ISBN:9781450320375
    DOI:10.1145/2463676
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 June 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. ease of use
    2. flexible storage
    3. hdfs
    4. portability

    Qualifiers

    • Demonstration

    Conference

    SIGMOD/PODS'13
    Sponsor:

    Acceptance Rates

    SIGMOD '13 Paper Acceptance Rate 76 of 372 submissions, 20%;
    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 01 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)A distributed data exchange engine for polystoresit - Information Technology10.1515/itit-2019-003762:3-4(145-156)Online publication date: 4-Mar-2020
    • (2019)Muses: Distributed Data Migration System for Polystores2019 IEEE 35th International Conference on Data Engineering (ICDE)10.1109/ICDE.2019.00152(1602-1605)Online publication date: Apr-2019
    • (2016)KangarooProceedings of the Ninth ACM International Conference on Web Search and Data Mining10.1145/2835776.2835841(397-406)Online publication date: 8-Feb-2016
    • (2015)BigDansingProceedings of the 2015 ACM SIGMOD International Conference on Management of Data10.1145/2723372.2747646(1215-1230)Online publication date: 27-May-2015

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media