More Web Proxy on the site http://driver.im/

research-article

Public Access

Heterogeneous chip multiprocessor architectures for big data applications

Author:

Houman HomayounAuthors Info & Claims

CF '16: Proceedings of the ACM International Conference on Computing Frontiers

Pages 400 - 405

https://doi.org/10.1145/2903150.2908078

Published: 16 May 2016 Publication History

Abstract

Emerging big data analytics applications require a significant amount of server computational power. The costs of building and running a computing server to process big data and the capacity to which we can scale it are driven in large part by those computational resources. However, big data applications share many characteristics that are fundamentally different from traditional desktop, parallel, and scale-out applications. Big data analytics applications rely heavily on specific deep machine learning and data mining algorithms, and are running a complex and deep software stack with various components (e.g. Hadoop, Spark, MPI, Hbase, Impala, MySQL, Hive, Shark, Apache, and MangoDB) that are bound together with a runtime software system and interact significantly with I/O and OS, exhibiting high computational intensity, memory intensity, I/O intensity and control intensity. Current server designs, based on commodity homogeneous processors, will not be the most efficient in terms of performance/watt for this emerging class of applications. In other domains, heterogeneous architectures have emerged as a promising solution to enhance energy-efficiency by allowing each application to run on a core that matches resource needs more closely than a one-size-fits-all core. A heterogeneous architecture integrates cores with various micro-architectures and accelerators to provide more opportunity for efficient workload mapping. In this work, through methodical investigation of power and performance measurements, and comprehensive system level characterization, we demonstrate that a heterogeneous architecture combining high performance big and low power little cores is required for efficient big data analytics applications processing, and in particular in the presence of accelerators and near real-time performance constraints.

References

[1]

Wu, Ren, Bin Zhang, and Meichun Hsu. "GPU-accelerated large scale analytics." IACM UCHPC (2009).

[2]

Gao, W. "BigDataBench: a Big Data Benchmark Suite from Web Search Engines". ASBD 2013 in conjunction with ISCA 2013

[3]

Ferdman, M., et al. "Clearing the clouds: a study of emerging scale-out workloads on modern hardware." ACM SIGARCH Computer Architecture News40.1 (2012): 37--48.

Digital Library

[4]

Ghazal, A. "Bigbench: Towards an industry standard benchmark for big data analytics". In: ACM SIGMOD Conference (2013)

Digital Library

[5]

Gutierrez, A. et al. "Integrated 3D-stacked server designs for increasing physical density of key-value stores." ASPLOS. 2014.

Digital Library

[6]

Reddi, V. J., et al. "Web search using mobile cores: quantifying and mitigating the price of efficiency." ACM SIGARCH Computer Architecture News38.3 (2010): 314--325.

Digital Library

[7]

Andersen, D. G. et al. "FAWN: A Fast Array of Wimpy Nodes". In the Proceedings of ACM SIGOPS 22nd SOSP, pages 1--14, 2009.

Digital Library

[8]

Hardavellas, Nikos, et al. "Toward dark silicon in servers." IEEE Micro 31.EPFL-ARTICLE-168285 (2011): 6--15.

Digital Library

[9]

Neshatpour, Katayoun, et al. "Energy-efficient acceleration of big data analytics applications using FPGAs." Big Data (Big Data), 2015 IEEE International Conference on. IEEE, 2015.

Digital Library

[10]

Kumar, Rakesh, et al. "Heterogeneous chip multiprocessors." Computer 11 (2005): 32--38.

Digital Library

[11]

Kontorinis, Vasileios, et al. "Enabling dynamic heterogeneity through core-on-core stacking." Proceedings of the 51st Annual Design Automation Conference. ACM, 2014.

Digital Library

[12]

Homayoun, Houman, et al. "Dynamically heterogeneous cores through 3D resource pooling." High Performance Computer Architecture (HPCA), 2012 IEEE 18th International Symposium on.

Digital Library

[13]

Intel VTune Amplifier XE Performance Profiler. http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe/

[14]

Homayoun, Houman, et al. "Reducing execution unit leakage power in embedded processors." Embedded Computer Systems: Architectures, Modeling, and Simulation. Springer, 2006. 299--308

Digital Library

[15]

Nilakantan, S., et al. "Platform-independent analysis of function-level communication in workloads." IISWC, IEEE, 2013.

[16]

Neshatpour, Katayoun, Maria Malik, and Houman Homayoun. "Accelerating machine learning kernel in hadoop using fpgas." Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on. IEEE, 2015.

Digital Library

[17]

James T Kukunas, et al. "High Performance ZLIB Compression on Intel®Architecture Processors", White paper, April 2014.

[18]

Shan, Y., et al. "FPMR: Mapreduce framework on FPGA," in Proc ACM/SIGDA Int Symp Field Programmable Gate Arrays, 2010.

Digital Library

[19]

Neshatpour, Katayoun, et al. "Accelerating big data analytics using fpgas." Field-Programmable Custom Computing Machines (FCCM), 2015 IEEE 23rd Annual International Symposium on. IEEE, 2015.

Digital Library

[20]

Z. Lin and P. Chow, "Zcluster: A zynq-based hadoop cluster," in Int. Conf. FPT, Dec 2013, pp. 450--453.

[21]

T. K. Prakash et al. Performance Characterization of SPEC CPU2006 Benchmarks on Intel Core 2 Duo Processor. In Transactions on Computers and Software Engineering, No. 1, Vol 2, pp. 36--41, 2008.

[22]

Baru, C., et al. "Setting the Direction for Big Data Benchmark Standards", Lecture Notes in Computer Science

[23]

Arora, Manish, et al. "Redefining the Role of the CPU in the Era of CPU-GPU Integration." Micro, IEEE 32.6 (2012): 4--16.

Digital Library

[24]

Huang, S., et al. "The HiBench benchmark suite: Characterization of the MapReduce-based data analysis." In the proc. of ICDEW, 2010

[25]

Li, A., et al. "CloudCmp: comparing public cloud providers." ACM,'10

[26]

Armstrong, et al. "Linkbench: a database benchmark based on the facebook social graph." Proceedings of the ACM SIGMOD, 2013.

Digital Library

[27]

Xi Luo, Walid A. Najjar, Vagelis "Hristidis: Efficient near-duplicate document detection using FPGAs". BigData 2013

[28]

YU, P., et al. "Scalable custom instructions identification for instruction-set extensible processors". In Proc. of the CASES'04. ACM, New York.

Digital Library

[29]

YU, P. et al. "Disjoint pattern enumeration for custom instructions identification". In Proceedings of the FPL'07, 273--278.

[30]

ARNOLD, M. et al. "Designing domain-specific processors." In Proceedings of the 9th CODES. ACM 2001.

Digital Library

[31]

Clark, N. T., et al. "Automated custom instruction generation for domain-specific processor acceleration." Computers, IEEE Transactions on 54.10 (2005): 1258--1270.

Digital Library

[32]

Homayoun, Houman, and et. al. "ZZ-HVS: Zig-zag horizontal and vertical sleep transistor sharing to reduce leakage power in on-chip SRAM peripheral circuits.", 2008, ICCD, IEEE International Conference on Computer Design.

[33]

Arora, N, et al. "Instruction selection in asip synthesis using functional matching." VLSI Design, 2010.

Digital Library

[34]

Chung, E. S., et al. "Linqits: Big data on little clients." ACM SIGARCH Computer Architecture News. Vol. 41. No. 3, 2013

Digital Library

[35]

http://www.chipestimate.com/tech-talks/2013/07/16/Cadence-5-Emerging-DRAM-Interfaces-You-Should-Know-for-Your-Next-Design-

[36]

S. Bird, et al. Performance Characterization of SPEC CPU Benchmarks on Intel's Core Microarchitecture based processor, in Proceedings of 2007 SPEC Benchmark Workshop, Jan 2007.

Cited By

Park J(2019)Efficient Pipelined Broadcast with Monitoring Processing Node Status on a Multi-Core ProcessorMathematics10.3390/math71211597:12(1159)Online publication date: 1-Dec-2019
https://doi.org/10.3390/math7121159
Vougioukas ISandberg ADiestelhorst SAl-Hashimi BMerrett G(2017)NucleusACM Transactions on Embedded Computing Systems10.1145/312654416:5s(1-16)Online publication date: 27-Sep-2017
https://dl.acm.org/doi/10.1145/3126544
Sayadi HHomayoun H(2017)Scheduling multithreaded applications onto heterogeneous composite cores architecture2017 Eighth International Green and Sustainable Computing Conference (IGSC)10.1109/IGCC.2017.8323570(1-8)Online publication date: Oct-2017
https://doi.org/10.1109/IGCC.2017.8323570
Show More Cited By

Index Terms

Heterogeneous chip multiprocessor architectures for big data applications
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Heterogeneous (hybrid) systems
2. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators

Recommendations

System and Architecture Level Characterization of Big Data Applications on Big and Little Core Server Architectures

The rapid growth in data yields challenges to process data efficiently using current high-performance server architectures such as big Xeon cores. Furthermore, physical design constraints, such as power and density, have become the dominant limiting ...
Big vs little core for energy-efficient hadoop computing
DATE '17: Proceedings of the Conference on Design, Automation & Test in Europe

The rapid growth in the data yields challenges to process data efficiently using current high-performance server architectures such as big Xeon cores. Furthermore, physical design constraints, such as power and density, have become the dominant limiting ...
Big data and ICT applications: A study
ICTCS '16: Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies

Big Data is used to manage the data due to their large size and complexity, because it can't be handled with the traditional methods and the current technology or tools used for that. Big Data mining is populated with 5 V's volume, variability, velocity,...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CF '16: Proceedings of the ACM International Conference on Computing Frontiers

May 2016

487 pages

ISBN:9781450341288

DOI:10.1145/2903150

General Chairs:
Gianluca Palermo
Politecnico di Milano, IT
,
John Feo
Pacific Northwest National Laboratory and Northwest Institute for Advanced Computing
,
Program Chairs:
Antonino Tumeo
Pacific Northwest National Laboratory, USA
,
Hubertus Franke
New York University and IBM Research, USA

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Micron Foundation: Micron Technology Foundation, Inc.
ACM: Association for Computing Machinery
Politecnico di Milano: Politecnico di Milano
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
IBM: IBM

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 May 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

CF'16

Sponsor:

Micron Foundation
ACM
Politecnico di Milano
SIGMICRO
IBM

CF'16: Computing Frontiers Conference

May 16 - 19, 2016

Como, Italy

Acceptance Rates

CF '16 Paper Acceptance Rate 30 of 94 submissions, 32%;

Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Sponsor:
sigmicro

22nd ACM International Conference on Computing Frontiers

May 28 - 30, 2025

Cagliari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
520
Total Downloads

Downloads (Last 12 months)103
Downloads (Last 6 weeks)15

Reflects downloads up to 19 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Park J(2019)Efficient Pipelined Broadcast with Monitoring Processing Node Status on a Multi-Core ProcessorMathematics10.3390/math71211597:12(1159)Online publication date: 1-Dec-2019
https://doi.org/10.3390/math7121159
Vougioukas ISandberg ADiestelhorst SAl-Hashimi BMerrett G(2017)NucleusACM Transactions on Embedded Computing Systems10.1145/312654416:5s(1-16)Online publication date: 27-Sep-2017
https://dl.acm.org/doi/10.1145/3126544
Sayadi HHomayoun H(2017)Scheduling multithreaded applications onto heterogeneous composite cores architecture2017 Eighth International Green and Sustainable Computing Conference (IGSC)10.1109/IGCC.2017.8323570(1-8)Online publication date: Oct-2017
https://doi.org/10.1109/IGCC.2017.8323570
Sayadi HPatel NSasan AHomayoun H(2017)Machine Learning-Based Approaches for Energy-Efficiency Prediction and Scheduling in Composite Cores Architectures2017 IEEE International Conference on Computer Design (ICCD)10.1109/ICCD.2017.28(129-136)Online publication date: Nov-2017
https://doi.org/10.1109/ICCD.2017.28
Neshatpour KSasan AHomayoun H(2016)Big data analytics on heterogeneous accelerator architecturesProceedings of the Eleventh IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis10.1145/2968456.2976765(1-3)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1145/2968456.2976765

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents