[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2168836.2168842acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

Energy efficiency for large-scale MapReduce workloads with significant interactive analysis

Published: 10 April 2012 Publication History

Abstract

MapReduce workloads have evolved to include increasing amounts of time-sensitive, interactive data analysis; we refer to such workloads as MapReduce with Interactive Analysis (MIA). Such workloads run on large clusters, whose size and cost make energy efficiency a critical concern. Prior works on MapReduce energy efficiency have not yet considered this workload class. Increasing hardware utilization helps improve efficiency, but is challenging to achieve for MIA workloads. These concerns lead us to develop BEEMR (Berkeley Energy Efficient MapReduce), an energy efficient MapReduce workload manager motivated by empirical analysis of real-life MIA traces at Facebook. The key insight is that although MIA clusters host huge data volumes, the interactive jobs operate on a small fraction of the data, and thus can be served by a small pool of dedicated machines; the less time-sensitive jobs can run on the rest of the cluster in a batch fashion. BEEMR achieves 40-50% energy savings under tight design constraints, and represents a first step towards improving energy efficiency for an increasingly important class of datacenter workloads.

References

[1]
Amazon Web Services. Amazon Elastic Computing Cloud. http://aws.amazon.com/ec2/.
[2]
G. Ananthanarayanan et al. Scarlett: coping with skewed content popularity in mapreduce clusters. In Eurosys 2011.
[3]
R. H. Arpaci et al. The interaction of parallel and sequential workloads on a network of workstations. In SIGMETRICS 1995.
[4]
I. Ashok and J. Zahorjan. Scheduling a mixed interactive and batch workload on a parallel, shared memory supercomputer. In Supercomputing 1992.
[5]
L. A. Barroso. Warehouse-scale computing: Entering the teenage decade. In ISCA 2011.
[6]
C. Belady. In the data center, power and cooling costs more than the IT equipment it supports. Electronics Cooling Magazine, Feb. 2007.
[7]
R. Bianchini and R. Rajamony. Power and energy management for server systems. Computer, Nov. 2004.
[8]
D. Borthakur. Facebook has the world's largest Hadoop cluster! http://hadoopblog.blogspot.com/2010/05/facebook-has-worlds-largest-hadoop.html.
[9]
D. Borthakur et al. Apache Hadoop goes realtime at Facebook. In SIGMOD 2011.
[10]
L. Breslau et al. Web Caching and Zipf-like Distributions: Evidence and Implications. In INFOCOM 1999.
[11]
Y. Chen, L. Keys, and R. H. Katz. Towards Energy Efficient MapReduce. Technical Report UCB/EECS-2009-109, EECS Department, University of California, Berkeley, Aug 2009.
[12]
Y. Chen et al. The Case for Evaluating MapReduce Performance Using Workload Suites. In MASCOTS 2011.
[13]
Y. Chen et al. Statistical Workloads for Energy Efficient MapReduce. Technical Report UCB/EECS-2010-6, EECS Department, University of California, Berkeley, Jan 2010.
[14]
J. Corbet. LWN.net 2009 Kernel Summit coverage: How Google uses Linux. 2009.
[15]
J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Comm. of the ACM, 51(1):107--113, January 2008.
[16]
Q. Deng et al. Memscale: active low-power modes for main memory. In ASPLOS 2011.
[17]
EMC and IDC iView. Digital Universe. http://www.emc.com/leadership/programs/digital-universe.htm.
[18]
S. Eyerman and L. Eeckhout. System-level performance metrics for multiprogram workloads. Micro, IEEE, 28(3):42--53, May-June 2008.
[19]
X. Fan, W.-D. Weber, and L. A. Barroso. Power provisioning for a warehouse-sized computer. In ISCA 2007.
[20]
A. Ganapathi et al. Statistics-driven workload modeling for the cloud. In ICDEW 2010.
[21]
J. Gray et al. Quickly generating billion-record synthetic databases. In SIGMOD 1994.
[22]
Gridmix. HADOOP-HOME/mapred/src/benchmarks/gridmix2 in Hadoop 0.20.2 onwards.
[23]
Hadoop World 2011. Hadoop World 2011 Speakers. http://www.hadoopworld.com/speakers/.
[24]
J. Hamilton. Overall Data Center Costs. http://perspectives.mvdirona.com/2010/09/18/OverallDataCenterCosts.aspx, 2010.
[25]
Hewlett-Packard Corp., Intel Corp., Microsoft Corp., Phoenix Technologies Ltd., Toshiba Corp. Advanced Configuration and Power Interface 5.0. http://www.acpi.info/.
[26]
M. Isard et al. Quincy: fair scheduling for distributed computing clusters. In SOSP 2009.
[27]
R. T. Kaushik et al. Evaluation and Analysis of GreenHDFS: A Self-Adaptive, Energy-Conserving Variant of the Hadoop Distributed File System. In IEEE CloudCom 2010.
[28]
W. Lang and J. Patel. Energy management for mapreduce clusters. In VLDB 2010.
[29]
J. Leverich and C. Kozyrakis. On the Energy (In)efficiency of Hadoop Clusters. In HotPower 2009.
[30]
P. Lieberman. White paper: Wake on lan technology, June 2006.
[31]
B. Liu et al. A study of networks simulation efficiency: Fluid simulation vs. packet-level simulation. In Infocom 2001.
[32]
D. Meisner et al. Power management of online data-intensive services. In ISCA 2011.
[33]
D. Meisner et al. Powernap: eliminating server idle power. In ASPLOS 2009.
[34]
S. Melnik et al. Dremel: interactive analysis of web-scale datasets. In VLDB 2010.
[35]
A. K. Mishra et al. Towards characterizing cloud backend workloads: insights from Google compute clusters. SIGMETRICS Perform. Eval. Rev., 37:34--41, March 2010.
[36]
K. Morton et al. ParaTimer: a progress indicator for MapReduce DAGs. In SIGMOD 2010.
[37]
Mumak. Mumak: Map-Reduce Simulator. https://issues.apache.org/jira/browse/MAPREDUCE-728.
[38]
A. Murthy. Next Generation Hadoop Map-Reduce. Apache Hadoop Summit 2011.
[39]
D. Patterson. Energy-Efficient Computing: the State of the Art. Microsoft Research Faculty Summit 2009.
[40]
Personal email. Communication regarding release of Google production cluster data.
[41]
E. Pinheiro et al. Failure trends in a large disk drive population. In FAST 2007.
[42]
G. F. Riley, T. M. Jaafar, and R. M. Fujimoto. Integrated fluid and packet network simulations. In MASCOTS 2002.
[43]
S. Rivoire et al. Joulesort: a balanced energy-efficiency benchmark. In SIGMOD 2007.
[44]
Rumen: a tool to extract job characterization data from job tracker logs. https://issues.apache.org/jira/browse/MAPREDUCE-751.
[45]
A. Ryan. Next-Generation Hadoop Operations. Bay Area Hadoop User Group, February 2010.
[46]
J. H. Saltzer. A simple linear model of demand paging performance. Commun. ACM, 17:181--186, April 1974.
[47]
K. Shvachko. HDFS Scalability: the limits to growth. Login, 35(2):6--16, April 2010.
[48]
D. C. Snowdon et al. Accurate on-line prediction of processor and memory energy usage under voltage scaling. In EMSOFT 2007.
[49]
SPEC. SPECpower 2008. http://www.spec.org/power_ssj2008/.
[50]
The Green Grid. The Green Grid Data Center Power Efficiency Metrics: PUE and DCiE, 2007.
[51]
A. Thusoo et al. Data warehousing and analytics infrastructure at Facebook. In SIGMOD 2010.
[52]
U.S. Environmental Protection Agency. Report to Congress on Server and Data Center Energy Efficiency, Public Law 109-431, 2007.
[53]
G. Wang et al. A simulation approach to evaluating design decisions in MapReduce setups. In MASCOTS 2009.
[54]
M. Zaharia et al. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In EuroSys 2010.

Cited By

View all
  • (2024)Cloud-Based Analysis of Large-Scale Hyperspectral Imagery for Oil Spill DetectionIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing10.1109/JSTARS.2023.334402217(2461-2474)Online publication date: 2024
  • (2023)Treehouse: A Case For Carbon-Aware Datacenter SoftwareACM SIGEnergy Energy Informatics Review10.1145/3630614.36306263:3(64-70)Online publication date: 25-Oct-2023
  • (2023)Energy Saving Techniques for Cloud Data Centres: An Empirical Research AnalysisMachine Learning, Image Processing, Network Security and Data Sciences10.1007/978-981-19-5868-7_57(763-779)Online publication date: 1-Jan-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
EuroSys '12: Proceedings of the 7th ACM european conference on Computer Systems
April 2012
394 pages
ISBN:9781450312233
DOI:10.1145/2168836
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 April 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. MapReduce
  2. energy efficiency

Qualifiers

  • Research-article

Conference

EuroSys '12
Sponsor:
EuroSys '12: Seventh EuroSys Conference 2012
April 10 - 13, 2012
Bern, Switzerland

Acceptance Rates

Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)24
  • Downloads (Last 6 weeks)1
Reflects downloads up to 25 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Cloud-Based Analysis of Large-Scale Hyperspectral Imagery for Oil Spill DetectionIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing10.1109/JSTARS.2023.334402217(2461-2474)Online publication date: 2024
  • (2023)Treehouse: A Case For Carbon-Aware Datacenter SoftwareACM SIGEnergy Energy Informatics Review10.1145/3630614.36306263:3(64-70)Online publication date: 25-Oct-2023
  • (2023)Energy Saving Techniques for Cloud Data Centres: An Empirical Research AnalysisMachine Learning, Image Processing, Network Security and Data Sciences10.1007/978-981-19-5868-7_57(763-779)Online publication date: 1-Jan-2023
  • (2022)A Cost-Optimized Data Parallel Task Scheduling in Multi-Core Resources Under Deadline and Budget ConstraintsInternational Journal of Cloud Applications and Computing10.4018/IJCAC.30585712:2(1-16)Online publication date: 26-Jul-2022
  • (2022)A Cost-Optimized Data Parallel Task Scheduling with Deadline Constraints in CloudElectronics10.3390/electronics1113202211:13(2022)Online publication date: 28-Jun-2022
  • (2022)Collaborative Management of Correlated Incast TransferData Center Networking10.1007/978-981-16-9368-7_7(161-184)Online publication date: 24-Feb-2022
  • (2021)Energy-Efficient Resources Management in Container- Based CloudsInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology10.32628/CSEIT217278(361-365)Online publication date: 15-Apr-2021
  • (2021)Energy and SLA-driven MapReduce Job Scheduling Framework for Cloud-based Cyber-Physical SystemsACM Transactions on Internet Technology10.1145/340977221:2(1-24)Online publication date: 3-May-2021
  • (2021)An Energy-Efficient Scheduling Algorithm for Shared Facility Supercomputer CentersLobachevskii Journal of Mathematics10.1134/S199508022111014742:11(2554-2561)Online publication date: 1-Nov-2021
  • (2021)The Energy Efficiency Evaluating Method Determining Energy Consumption of the Parallel Program According to Its ProfileLobachevskii Journal of Mathematics10.1134/S199508022012016141:12(2542-2551)Online publication date: 4-Feb-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media