[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Redoop infrastructure for recurring big data queries

Published: 01 August 2014 Publication History

Abstract

This demonstration presents the Redoop infrastructure, the first full-fledged MapReduce framework with native support for recurring big data queries. Recurring queries, repeatedly being executed for long periods of time over evolving high-volume data, have become a bedrock component in most large-scale data analytic applications. Redoop is a comprehensive extension to Hadoop that pushes the support and optimization of recurring queries into Hadoop's core functionality. While backward compatible with regular MapReduce jobs, Redoop achieves an order of magnitude better performance than Hadoop for recurring workloads. Redoop employs innovative window-aware optimization techniques for such recurring workloads including adaptive window-aware data partitioning, cache-aware task scheduling, and inter-window caching mechanisms. We will demonstrate Redoop's capabilities on a compute cluster against real life workloads including click-stream and sensor data analysis.

References

[1]
1998 world cup. http://ita.ee.lbl.gov/html/contrib/WorldCup.html.
[2]
Soccer - real time tracking system. http://www.iis.fraunhofer.de/en/bf/ln/referenzprojekte/redfir.html.
[3]
Y. Bu, B. Howe, M. Balazinska, and others. Haloop: Efficient iterative data processing on large clusters. PVLDB, 3(1):285--296, 2010.
[4]
T. Condie, N. Conway, P. Alvaro, et al. Mapreduce online. In NSDI, pages 313--328, 2010.
[5]
J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In OSDI, pages 137--150, 2004.
[6]
J. Ekanayake, H. Li, B. Zhang, et al. Twister: a runtime for iterative mapreduce. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pages 810--818, 2010.
[7]
Hive. Hive. http://hadoop.apache.org/hive.
[8]
C. Lei, E. A. Rundensteiner, and M. Y. Eltabakh. Redoop: Supporting recurring queries in hadoop. In EDBT, pages 25--36, 2014.
[9]
B. Li, E. Mazur, Y. Diao, et al. A platform for scalable one-pass analytics using mapreduce. In SIGMOD, pages 985--996, 2011.
[10]
D. Logothetis, C. Trezzo, K. C. Webb, et al. In-situ mapreduce for log processing. In USENIXATC, pages 9--9, 2011.
[11]
C. Olston, G. Chiou, L. Chitnis, et al. Nova: continuous pig/hadoop workflows. In SIGMOD, pages 1081--1090, 2011.
[12]
Pig. http://hadoop.apache.org/pig.
[13]
R. Sumbaly, J. Kreps, and S. Shah. The big data ecosystem at linkedin. In SIGMOD, pages 1125--1134, 2013.
[14]
The Apache Software Foundation. Hadoop. http://hadoop.apache.org.

Cited By

View all
  • (2021)Survey on improving the performance of MapReduce in HadoopProceedings of the 4th International Conference on Networking, Information Systems & Security10.1145/3454127.3456617(1-5)Online publication date: 1-Apr-2021
  • (2020)Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our FindingsProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3380584(99-113)Online publication date: 11-Jun-2020

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 7, Issue 13
August 2014
466 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2014
Published in PVLDB Volume 7, Issue 13

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Survey on improving the performance of MapReduce in HadoopProceedings of the 4th International Conference on Networking, Information Systems & Security10.1145/3454127.3456617(1-5)Online publication date: 1-Apr-2021
  • (2020)Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our FindingsProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3380584(99-113)Online publication date: 11-Jun-2020

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media