More Web Proxy on the site http://driver.im/

research-article

Adaptive query execution for data management in the cloud

Authors:

Adrian Daniel Popescu,

Debabrata Dash,

Verena Kantere,

Anastasia AilamakiAuthors Info & Claims

CloudDB '10: Proceedings of the second international workshop on Cloud data management

Pages 17 - 24

https://doi.org/10.1145/1871929.1871933

Published: 30 October 2010 Publication History

Abstract

A major component of many cloud services is query processing on data stored in the underlying cloud cluster. The traditional techniques for query processing on a cluster are those offered by parallel DBMS. These techniques however, cannot guarantee high performance for cloud; parallel DBMS lack adequate fault tolerance mechanisms in order to deal with non-negligible software and hardware failures. MapReduce, on the other hand, allows query processing solutions that are fault tolerant, but imposes substantial overheads. In this paper, we propose an adaptive software architecture which can effortlessly switch between MapReduce and parallel DBMS in order to efficiently process queries regardless of their response times. Switching between the two architectures is performed in a transparent manner based on an intuitive cost model, which computes the expected execution time in presence of failures. The experimental results show that the adaptive architecture achieves the lowest possible query execution time for various scenarios.

References

[1]

A. Abouzeid, K. Bajda-Pawlikowski, D. Abadi, A. Silberschatz, and A. Rasin. HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc. VLDB Endow., 2(1):922--933, 2009.

Digital Library

[2]

H. Boral, W. Alexander, L. Clay, G. Copeland, S. Danforth, M. Franklin, B. Hart, M. Smith, and P. Valduriez. Prototyping Bubba, a highly parallel database system. TKDE, 2(1):4--24, 1990.

Digital Library

[3]

Concurrent Inc. Cascading Project Website: http://www.cascading.org/.

[4]

T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, K. Elmeleegy, and R. Sears. MapReduce online. Technical Report UCB/EECS-2009-136, EECS Department, University of California, Berkeley, 2009.

[5]

J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. In OSDI, 2004.

Digital Library

[6]

D. J. DeWitt, S. Ghandeharizadeh, D. A. Schneider, A. Bricker, H. I. Hsiao, and R. Rasmussen. The Gamma database machine project. TKDE, 2(1):44--62, 1990.

Digital Library

[7]

S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google file system. SIGOPS Oper. Syst. Rev., 37(5):29--43, 2003.

Digital Library

[8]

R. Gilbert and R. Winter. Scale for success: will new requirements max out your data warehouse ? Scaling Up, WinterCorp, Newsletter Website: http://www.wintercorp.com/Newsletter/wcnl spring06 e.pdf, 2006.

[9]

MICS Project. MICS Project Website: http://www.mics.org/.

[10]

C. Monash. eBay's two enormous data warehouses. DBMS2 - A Monash Research Publication. http://www.dbms2.com/2009/04/30/ebays-twoenormous- data-warehouses.

[11]

C. Monash. Facebook, Hadoop, and Hive. DBMS2 - A Monash Research Publication. http://www.dbms2.com/2009/05/11/facebookhadoop- and-hive.

[12]

M. Stonebraker. The case for shared nothing. IEEE Database Eng. Bull., 9(1):4--9, 1986.

[13]

A. S. Szalay, J. Gray, A. R. Thakar, P. Z. Kunszt, T. Malik, J. Raddick, C. Stoughton, and J. vandenBerg. The SDSS skyserver: public access to the sloan digital sky server data. In SIGMOD, 2002.

Digital Library

[14]

Tandem Database Group. NonStop SQL: A distributed, high-performance, high-availability implementation of sql. In HPTS, 1989.

[15]

Teradata Database. Teradata Website: http://www.teradata.com.

[16]

The Apache Hadoop Framework. Hadoop Project Website: http://hadoop.apache.org.

[17]

The Apache Hadoop HBase Project. HBase Project Website: http://hadoop.apache.org/hbase/.

[18]

The Apache Hive Project. Hive Project Website: http://hadoop.apache.org/hive/.

[19]

The Apache Pig Project. Apache Pig Project Website: http://hadoop.apache.org/pig/.

[20]

The Brain Blue Project. Brain Blue Project Website: http://bluebrain.epfl.ch/.

[21]

A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Antony, H. Liu, and R. Murth. Hive: A petabyte scale data warehouse using Hadoop. In ICDE, 2010.

[22]

A. Thusoo, Z. Shao, S. Anthony, D. Borthakur, N. Jain, J. Sen Sarma, R. Murthy, and H. Liu. Data warehousing and analytics infrastructure at Facebook. In SIGMOD, 2010.

Digital Library

[23]

Transaction Processing Performance Council. TPC-H Website: http://www.tpc.org/tpch/.

[24]

X. Wang, T. Malik, R. Burns, S. Papadomanolakis, and A. Ailamaki. A workload-driven unit of cache replacement for mid-tier database caching. In DASFAA, 2007.

Digital Library

[25]

C. Yang, C. Yen, C. Tan, and S. Madden. Osprey: Implementing MapReduce-style fault tolerance in a shared-nothing distributed database. In ICDE, 2010.

Cited By

VL DPA JMathew K P(2023)Secure query processing and optimization in cloud environment: a reviewInformation Security Journal: A Global Perspective10.1080/19393555.2023.227097633:2(172-191)Online publication date: 20-Dec-2023
https://doi.org/10.1080/19393555.2023.2270976
W. M. Ribeiro MA. B. Lima Ade Oliveira D(2019)OLAP parallel query processing in clouds with C‐ParGRESConcurrency and Computation: Practice and Experience10.1002/cpe.559032:7Online publication date: 19-Dec-2019
https://doi.org/10.1002/cpe.5590
Serrano DStroulia EMindel MMüller HOnut V(2016)From relations to multi-dimensional mapsProceedings of the 26th Annual International Conference on Computer Science and Software Engineering10.5555/3049877.3049893(156-165)Online publication date: 31-Oct-2016
https://dl.acm.org/doi/10.5555/3049877.3049893
Show More Cited By

Index Terms

Adaptive query execution for data management in the cloud
1. Information systems
  1. Data management systems
    1. Database management system engines

Recommendations

Query optimization for massively parallel data processing
SOCC '11: Proceedings of the 2nd ACM Symposium on Cloud Computing

MapReduce has been widely recognized as an efficient tool for large-scale data analysis. It achieves high performance by exploiting parallelism among processing nodes while providing a simple interface for upper-layer applications. Some vendors have ...
Efficient processing of data warehousing queries in a split execution environment
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

Hadapt is a start-up company currently commercializing the Yale University research project called HadoopDB. The company focuses on building a platform for Big Data analytics in the cloud by introducing a storage layer optimized for structured data and ...
Optimizing RDF(S) queries on cloud platforms
WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide Web

Scalable processing of Semantic Web queries has become a critical need given the rapid upward trend in availability of Semantic Web data. The MapReduce paradigm is emerging as a platform of choice for large scale data processing and analytics due to its ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CloudDB '10: Proceedings of the second international workshop on Cloud data management

October 2010

72 pages

ISBN:9781450303804

DOI:10.1145/1871929

General Chairs:
Xiaofeng Meng
Renmin University of China, China
,
Ying Chen
IBM China Research Lab, China
,
Program Chairs:
Jianliang Xu
Hong Kong Baptist University, Hong Kong
,
Jiaheng Lu
Renmin University of China, China

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM '10

Sponsor:

CIKM '10: International Conference on Information and Knowledge Management

October 30, 2010

ON, Toronto, Canada

Acceptance Rates

Overall Acceptance Rate 12 of 17 submissions, 71%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
501
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)1

Reflects downloads up to 13 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

VL DPA JMathew K P(2023)Secure query processing and optimization in cloud environment: a reviewInformation Security Journal: A Global Perspective10.1080/19393555.2023.227097633:2(172-191)Online publication date: 20-Dec-2023
https://doi.org/10.1080/19393555.2023.2270976
W. M. Ribeiro MA. B. Lima Ade Oliveira D(2019)OLAP parallel query processing in clouds with C‐ParGRESConcurrency and Computation: Practice and Experience10.1002/cpe.559032:7Online publication date: 19-Dec-2019
https://doi.org/10.1002/cpe.5590
Serrano DStroulia EMindel MMüller HOnut V(2016)From relations to multi-dimensional mapsProceedings of the 26th Annual International Conference on Computer Science and Software Engineering10.5555/3049877.3049893(156-165)Online publication date: 31-Oct-2016
https://dl.acm.org/doi/10.5555/3049877.3049893
Nguyen-Cong DD'Orazio LTran NHacid M(2016)Storing and Querying DICOM Data with HYTORMOProceedings of the Second International Workshop on Data Management and Analytics for Medicine and Healthcare - Volume 1018610.1007/978-3-319-57741-8_4(43-61)Online publication date: 9-Sep-2016
https://dl.acm.org/doi/10.1007/978-3-319-57741-8_4
Serrano DHan DStroulia E(2015)From Relations to Multi-dimensional MapsProceedings of the 2015 IEEE 8th International Conference on Cloud Computing10.1109/CLOUD.2015.21(81-89)Online publication date: 27-Jun-2015
https://dl.acm.org/doi/10.1109/CLOUD.2015.21
Kougka GGounaris ATsichlas K(2015)Practical algorithms for execution engine selection in data flowsFuture Generation Computer Systems10.1016/j.future.2014.11.01145:C(133-148)Online publication date: 1-Apr-2015
https://dl.acm.org/doi/10.1016/j.future.2014.11.011
Ravimaran SMaluk Mohamed M(2014)Integrated Obj_FedRep: Evaluation of Surrogate Object based Mobile Cloud System for Federation, Replica and Data ManagementArabian Journal for Science and Engineering10.1007/s13369-014-1001-239:6(4577-4592)Online publication date: 29-Mar-2014
https://doi.org/10.1007/s13369-014-1001-2
Dong KLi JNan KLi W(2013)Biomedical Research Data Cloud Services with Duckling Collaboration LiBrary (CLB)Proceedings of the 2013 IEEE 9th International Conference on e-Science10.1109/eScience.2013.17(221-227)Online publication date: 22-Oct-2013
https://dl.acm.org/doi/10.1109/eScience.2013.17
Coelho da Silva TNascimento Mde Macêdo JSousa FMachado J(2013)Non-Intrusive Elastic Query Processing in the CloudJournal of Computer Science and Technology10.1007/s11390-013-1389-228:6(932-947)Online publication date: 8-Nov-2013
https://doi.org/10.1007/s11390-013-1389-2
Shanmugam RM. A. M(2012)Data Management in the Mobile Cloud Using Surrogate ObjectInternational Journal of Future Computer and Communication10.7763/IJFCC.2012.V1.49(187-192)Online publication date: 2012
https://doi.org/10.7763/IJFCC.2012.V1.49
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents