[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1871929.1871933acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Adaptive query execution for data management in the cloud

Published: 30 October 2010 Publication History

Abstract

A major component of many cloud services is query processing on data stored in the underlying cloud cluster. The traditional techniques for query processing on a cluster are those offered by parallel DBMS. These techniques however, cannot guarantee high performance for cloud; parallel DBMS lack adequate fault tolerance mechanisms in order to deal with non-negligible software and hardware failures. MapReduce, on the other hand, allows query processing solutions that are fault tolerant, but imposes substantial overheads. In this paper, we propose an adaptive software architecture which can effortlessly switch between MapReduce and parallel DBMS in order to efficiently process queries regardless of their response times. Switching between the two architectures is performed in a transparent manner based on an intuitive cost model, which computes the expected execution time in presence of failures. The experimental results show that the adaptive architecture achieves the lowest possible query execution time for various scenarios.

References

[1]
A. Abouzeid, K. Bajda-Pawlikowski, D. Abadi, A. Silberschatz, and A. Rasin. HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc. VLDB Endow., 2(1):922--933, 2009.
[2]
H. Boral, W. Alexander, L. Clay, G. Copeland, S. Danforth, M. Franklin, B. Hart, M. Smith, and P. Valduriez. Prototyping Bubba, a highly parallel database system. TKDE, 2(1):4--24, 1990.
[3]
Concurrent Inc. Cascading Project Website: http://www.cascading.org/.
[4]
T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, K. Elmeleegy, and R. Sears. MapReduce online. Technical Report UCB/EECS-2009-136, EECS Department, University of California, Berkeley, 2009.
[5]
J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. In OSDI, 2004.
[6]
D. J. DeWitt, S. Ghandeharizadeh, D. A. Schneider, A. Bricker, H. I. Hsiao, and R. Rasmussen. The Gamma database machine project. TKDE, 2(1):44--62, 1990.
[7]
S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google file system. SIGOPS Oper. Syst. Rev., 37(5):29--43, 2003.
[8]
R. Gilbert and R. Winter. Scale for success: will new requirements max out your data warehouse ? Scaling Up, WinterCorp, Newsletter Website: http://www.wintercorp.com/Newsletter/wcnl spring06 e.pdf, 2006.
[9]
MICS Project. MICS Project Website: http://www.mics.org/.
[10]
C. Monash. eBay's two enormous data warehouses. DBMS2 - A Monash Research Publication. http://www.dbms2.com/2009/04/30/ebays-twoenormous- data-warehouses.
[11]
C. Monash. Facebook, Hadoop, and Hive. DBMS2 - A Monash Research Publication. http://www.dbms2.com/2009/05/11/facebookhadoop- and-hive.
[12]
M. Stonebraker. The case for shared nothing. IEEE Database Eng. Bull., 9(1):4--9, 1986.
[13]
A. S. Szalay, J. Gray, A. R. Thakar, P. Z. Kunszt, T. Malik, J. Raddick, C. Stoughton, and J. vandenBerg. The SDSS skyserver: public access to the sloan digital sky server data. In SIGMOD, 2002.
[14]
Tandem Database Group. NonStop SQL: A distributed, high-performance, high-availability implementation of sql. In HPTS, 1989.
[15]
Teradata Database. Teradata Website: http://www.teradata.com.
[16]
The Apache Hadoop Framework. Hadoop Project Website: http://hadoop.apache.org.
[17]
The Apache Hadoop HBase Project. HBase Project Website: http://hadoop.apache.org/hbase/.
[18]
The Apache Hive Project. Hive Project Website: http://hadoop.apache.org/hive/.
[19]
The Apache Pig Project. Apache Pig Project Website: http://hadoop.apache.org/pig/.
[20]
The Brain Blue Project. Brain Blue Project Website: http://bluebrain.epfl.ch/.
[21]
A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Antony, H. Liu, and R. Murth. Hive: A petabyte scale data warehouse using Hadoop. In ICDE, 2010.
[22]
A. Thusoo, Z. Shao, S. Anthony, D. Borthakur, N. Jain, J. Sen Sarma, R. Murthy, and H. Liu. Data warehousing and analytics infrastructure at Facebook. In SIGMOD, 2010.
[23]
Transaction Processing Performance Council. TPC-H Website: http://www.tpc.org/tpch/.
[24]
X. Wang, T. Malik, R. Burns, S. Papadomanolakis, and A. Ailamaki. A workload-driven unit of cache replacement for mid-tier database caching. In DASFAA, 2007.
[25]
C. Yang, C. Yen, C. Tan, and S. Madden. Osprey: Implementing MapReduce-style fault tolerance in a shared-nothing distributed database. In ICDE, 2010.

Cited By

View all
  • (2023)Secure query processing and optimization in cloud environment: a reviewInformation Security Journal: A Global Perspective10.1080/19393555.2023.227097633:2(172-191)Online publication date: 20-Dec-2023
  • (2019)OLAP parallel query processing in clouds with C‐ParGRESConcurrency and Computation: Practice and Experience10.1002/cpe.559032:7Online publication date: 19-Dec-2019
  • (2016)From relations to multi-dimensional mapsProceedings of the 26th Annual International Conference on Computer Science and Software Engineering10.5555/3049877.3049893(156-165)Online publication date: 31-Oct-2016
  • Show More Cited By

Index Terms

  1. Adaptive query execution for data management in the cloud

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CloudDB '10: Proceedings of the second international workshop on Cloud data management
    October 2010
    72 pages
    ISBN:9781450303804
    DOI:10.1145/1871929
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 October 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cloud dbms
    2. cost model
    3. mapreduce
    4. parallel dbms

    Qualifiers

    • Research-article

    Conference

    CIKM '10

    Acceptance Rates

    Overall Acceptance Rate 12 of 17 submissions, 71%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 13 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Secure query processing and optimization in cloud environment: a reviewInformation Security Journal: A Global Perspective10.1080/19393555.2023.227097633:2(172-191)Online publication date: 20-Dec-2023
    • (2019)OLAP parallel query processing in clouds with C‐ParGRESConcurrency and Computation: Practice and Experience10.1002/cpe.559032:7Online publication date: 19-Dec-2019
    • (2016)From relations to multi-dimensional mapsProceedings of the 26th Annual International Conference on Computer Science and Software Engineering10.5555/3049877.3049893(156-165)Online publication date: 31-Oct-2016
    • (2016)Storing and Querying DICOM Data with HYTORMOProceedings of the Second International Workshop on Data Management and Analytics for Medicine and Healthcare - Volume 1018610.1007/978-3-319-57741-8_4(43-61)Online publication date: 9-Sep-2016
    • (2015)From Relations to Multi-dimensional MapsProceedings of the 2015 IEEE 8th International Conference on Cloud Computing10.1109/CLOUD.2015.21(81-89)Online publication date: 27-Jun-2015
    • (2015)Practical algorithms for execution engine selection in data flowsFuture Generation Computer Systems10.1016/j.future.2014.11.01145:C(133-148)Online publication date: 1-Apr-2015
    • (2014)Integrated Obj_FedRep: Evaluation of Surrogate Object based Mobile Cloud System for Federation, Replica and Data ManagementArabian Journal for Science and Engineering10.1007/s13369-014-1001-239:6(4577-4592)Online publication date: 29-Mar-2014
    • (2013)Biomedical Research Data Cloud Services with Duckling Collaboration LiBrary (CLB)Proceedings of the 2013 IEEE 9th International Conference on e-Science10.1109/eScience.2013.17(221-227)Online publication date: 22-Oct-2013
    • (2013)Non-Intrusive Elastic Query Processing in the CloudJournal of Computer Science and Technology10.1007/s11390-013-1389-228:6(932-947)Online publication date: 8-Nov-2013
    • (2012)Data Management in the Mobile Cloud Using Surrogate ObjectInternational Journal of Future Computer and Communication10.7763/IJFCC.2012.V1.49(187-192)Online publication date: 2012
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media