research-article

Redoop infrastructure for recurring big data queries

Authors:

Chuan Lei,

Zhongfang Zhuang,

Elke A. Rundensteiner,

Mohamed Y. EltabakhAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 7, Issue 13

Pages 1589 - 1592

https://doi.org/10.14778/2733004.2733037

Published: 01 August 2014 Publication History

Get Access

Abstract

This demonstration presents the Redoop infrastructure, the first full-fledged MapReduce framework with native support for recurring big data queries. Recurring queries, repeatedly being executed for long periods of time over evolving high-volume data, have become a bedrock component in most large-scale data analytic applications. Redoop is a comprehensive extension to Hadoop that pushes the support and optimization of recurring queries into Hadoop's core functionality. While backward compatible with regular MapReduce jobs, Redoop achieves an order of magnitude better performance than Hadoop for recurring workloads. Redoop employs innovative window-aware optimization techniques for such recurring workloads including adaptive window-aware data partitioning, cache-aware task scheduling, and inter-window caching mechanisms. We will demonstrate Redoop's capabilities on a compute cluster against real life workloads including click-stream and sensor data analysis.

References

[1]

1998 world cup. http://ita.ee.lbl.gov/html/contrib/WorldCup.html.

Google Scholar

[2]

Soccer - real time tracking system. http://www.iis.fraunhofer.de/en/bf/ln/referenzprojekte/redfir.html.

Google Scholar

[3]

Y. Bu, B. Howe, M. Balazinska, and others. Haloop: Efficient iterative data processing on large clusters. PVLDB, 3(1):285--296, 2010.

Digital Library

Google Scholar

[4]

T. Condie, N. Conway, P. Alvaro, et al. Mapreduce online. In NSDI, pages 313--328, 2010.

Digital Library

Google Scholar

[5]

J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In OSDI, pages 137--150, 2004.

Digital Library

Google Scholar

[6]

J. Ekanayake, H. Li, B. Zhang, et al. Twister: a runtime for iterative mapreduce. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pages 810--818, 2010.

Digital Library

Google Scholar

[7]

Hive. Hive. http://hadoop.apache.org/hive.

Google Scholar

[8]

C. Lei, E. A. Rundensteiner, and M. Y. Eltabakh. Redoop: Supporting recurring queries in hadoop. In EDBT, pages 25--36, 2014.

Google Scholar

[9]

B. Li, E. Mazur, Y. Diao, et al. A platform for scalable one-pass analytics using mapreduce. In SIGMOD, pages 985--996, 2011.

Digital Library

Google Scholar

[10]

D. Logothetis, C. Trezzo, K. C. Webb, et al. In-situ mapreduce for log processing. In USENIXATC, pages 9--9, 2011.

Digital Library

Google Scholar

[11]

C. Olston, G. Chiou, L. Chitnis, et al. Nova: continuous pig/hadoop workflows. In SIGMOD, pages 1081--1090, 2011.

Digital Library

Google Scholar

[12]

Pig. http://hadoop.apache.org/pig.

Google Scholar

[13]

R. Sumbaly, J. Kreps, and S. Shah. The big data ecosystem at linkedin. In SIGMOD, pages 1125--1134, 2013.

Digital Library

Google Scholar

[14]

The Apache Software Foundation. Hadoop. http://hadoop.apache.org.

Google Scholar

Cited By

View all

Bakni NAssayad I(2021)Survey on improving the performance of MapReduce in HadoopProceedings of the 4th International Conference on Networking, Information Systems & Security10.1145/3454127.3456617(1-5)Online publication date: 1-Apr-2021
https://dl.acm.org/doi/10.1145/3454127.3456617
Siddiqui TJindal AQiao SPatel HLe WMaier DPottinger RDoan ATan WAlawini ANgo H(2020)Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our FindingsProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3380584(99-113)Online publication date: 11-Jun-2020
https://dl.acm.org/doi/10.1145/3318464.3380584

Redoop infrastructure for recurring big data queries

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 7, Issue 13

August 2014

466 pages

ISSN:2150-8097

Editors:
H. V. Jagadish
University of Michigan
,
Aoying Zhou
East Normal University, China

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2014

Published in PVLDB Volume 7, Issue 13

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
79
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Bakni NAssayad I(2021)Survey on improving the performance of MapReduce in HadoopProceedings of the 4th International Conference on Networking, Information Systems & Security10.1145/3454127.3456617(1-5)Online publication date: 1-Apr-2021
https://dl.acm.org/doi/10.1145/3454127.3456617
Siddiqui TJindal AQiao SPatel HLe WMaier DPottinger RDoan ATan WAlawini ANgo H(2020)Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our FindingsProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3380584(99-113)Online publication date: 11-Jun-2020
https://dl.acm.org/doi/10.1145/3318464.3380584

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Recommendations

Big Data Analytics

Next-Generation Big Data: A Practical Guide to Apache Kudu, Impala, and Spark

Big Data Analytics with R and Hadoop