[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2903150.2908078acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article
Public Access

Heterogeneous chip multiprocessor architectures for big data applications

Published: 16 May 2016 Publication History

Abstract

Emerging big data analytics applications require a significant amount of server computational power. The costs of building and running a computing server to process big data and the capacity to which we can scale it are driven in large part by those computational resources. However, big data applications share many characteristics that are fundamentally different from traditional desktop, parallel, and scale-out applications. Big data analytics applications rely heavily on specific deep machine learning and data mining algorithms, and are running a complex and deep software stack with various components (e.g. Hadoop, Spark, MPI, Hbase, Impala, MySQL, Hive, Shark, Apache, and MangoDB) that are bound together with a runtime software system and interact significantly with I/O and OS, exhibiting high computational intensity, memory intensity, I/O intensity and control intensity. Current server designs, based on commodity homogeneous processors, will not be the most efficient in terms of performance/watt for this emerging class of applications. In other domains, heterogeneous architectures have emerged as a promising solution to enhance energy-efficiency by allowing each application to run on a core that matches resource needs more closely than a one-size-fits-all core. A heterogeneous architecture integrates cores with various micro-architectures and accelerators to provide more opportunity for efficient workload mapping. In this work, through methodical investigation of power and performance measurements, and comprehensive system level characterization, we demonstrate that a heterogeneous architecture combining high performance big and low power little cores is required for efficient big data analytics applications processing, and in particular in the presence of accelerators and near real-time performance constraints.

References

[1]
Wu, Ren, Bin Zhang, and Meichun Hsu. "GPU-accelerated large scale analytics." IACM UCHPC (2009).
[2]
Gao, W. "BigDataBench: a Big Data Benchmark Suite from Web Search Engines". ASBD 2013 in conjunction with ISCA 2013
[3]
Ferdman, M., et al. "Clearing the clouds: a study of emerging scale-out workloads on modern hardware." ACM SIGARCH Computer Architecture News40.1 (2012): 37--48.
[4]
Ghazal, A. "Bigbench: Towards an industry standard benchmark for big data analytics". In: ACM SIGMOD Conference (2013)
[5]
Gutierrez, A. et al. "Integrated 3D-stacked server designs for increasing physical density of key-value stores." ASPLOS. 2014.
[6]
Reddi, V. J., et al. "Web search using mobile cores: quantifying and mitigating the price of efficiency." ACM SIGARCH Computer Architecture News38.3 (2010): 314--325.
[7]
Andersen, D. G. et al. "FAWN: A Fast Array of Wimpy Nodes". In the Proceedings of ACM SIGOPS 22nd SOSP, pages 1--14, 2009.
[8]
Hardavellas, Nikos, et al. "Toward dark silicon in servers." IEEE Micro 31.EPFL-ARTICLE-168285 (2011): 6--15.
[9]
Neshatpour, Katayoun, et al. "Energy-efficient acceleration of big data analytics applications using FPGAs." Big Data (Big Data), 2015 IEEE International Conference on. IEEE, 2015.
[10]
Kumar, Rakesh, et al. "Heterogeneous chip multiprocessors." Computer 11 (2005): 32--38.
[11]
Kontorinis, Vasileios, et al. "Enabling dynamic heterogeneity through core-on-core stacking." Proceedings of the 51st Annual Design Automation Conference. ACM, 2014.
[12]
Homayoun, Houman, et al. "Dynamically heterogeneous cores through 3D resource pooling." High Performance Computer Architecture (HPCA), 2012 IEEE 18th International Symposium on.
[13]
Intel VTune Amplifier XE Performance Profiler. http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe/
[14]
Homayoun, Houman, et al. "Reducing execution unit leakage power in embedded processors." Embedded Computer Systems: Architectures, Modeling, and Simulation. Springer, 2006. 299--308
[15]
Nilakantan, S., et al. "Platform-independent analysis of function-level communication in workloads." IISWC, IEEE, 2013.
[16]
Neshatpour, Katayoun, Maria Malik, and Houman Homayoun. "Accelerating machine learning kernel in hadoop using fpgas." Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on. IEEE, 2015.
[17]
James T Kukunas, et al. "High Performance ZLIB Compression on Intel®Architecture Processors", White paper, April 2014.
[18]
Shan, Y., et al. "FPMR: Mapreduce framework on FPGA," in Proc ACM/SIGDA Int Symp Field Programmable Gate Arrays, 2010.
[19]
Neshatpour, Katayoun, et al. "Accelerating big data analytics using fpgas." Field-Programmable Custom Computing Machines (FCCM), 2015 IEEE 23rd Annual International Symposium on. IEEE, 2015.
[20]
Z. Lin and P. Chow, "Zcluster: A zynq-based hadoop cluster," in Int. Conf. FPT, Dec 2013, pp. 450--453.
[21]
T. K. Prakash et al. Performance Characterization of SPEC CPU2006 Benchmarks on Intel Core 2 Duo Processor. In Transactions on Computers and Software Engineering, No. 1, Vol 2, pp. 36--41, 2008.
[22]
Baru, C., et al. "Setting the Direction for Big Data Benchmark Standards", Lecture Notes in Computer Science
[23]
Arora, Manish, et al. "Redefining the Role of the CPU in the Era of CPU-GPU Integration." Micro, IEEE 32.6 (2012): 4--16.
[24]
Huang, S., et al. "The HiBench benchmark suite: Characterization of the MapReduce-based data analysis." In the proc. of ICDEW, 2010
[25]
Li, A., et al. "CloudCmp: comparing public cloud providers." ACM,'10
[26]
Armstrong, et al. "Linkbench: a database benchmark based on the facebook social graph." Proceedings of the ACM SIGMOD, 2013.
[27]
Xi Luo, Walid A. Najjar, Vagelis "Hristidis: Efficient near-duplicate document detection using FPGAs". BigData 2013
[28]
YU, P., et al. "Scalable custom instructions identification for instruction-set extensible processors". In Proc. of the CASES'04. ACM, New York.
[29]
YU, P. et al. "Disjoint pattern enumeration for custom instructions identification". In Proceedings of the FPL'07, 273--278.
[30]
ARNOLD, M. et al. "Designing domain-specific processors." In Proceedings of the 9th CODES. ACM 2001.
[31]
Clark, N. T., et al. "Automated custom instruction generation for domain-specific processor acceleration." Computers, IEEE Transactions on 54.10 (2005): 1258--1270.
[32]
Homayoun, Houman, and et. al. "ZZ-HVS: Zig-zag horizontal and vertical sleep transistor sharing to reduce leakage power in on-chip SRAM peripheral circuits.", 2008, ICCD, IEEE International Conference on Computer Design.
[33]
Arora, N, et al. "Instruction selection in asip synthesis using functional matching." VLSI Design, 2010.
[34]
Chung, E. S., et al. "Linqits: Big data on little clients." ACM SIGARCH Computer Architecture News. Vol. 41. No. 3, 2013
[35]
http://www.chipestimate.com/tech-talks/2013/07/16/Cadence-5-Emerging-DRAM-Interfaces-You-Should-Know-for-Your-Next-Design-
[36]
S. Bird, et al. Performance Characterization of SPEC CPU Benchmarks on Intel's Core Microarchitecture based processor, in Proceedings of 2007 SPEC Benchmark Workshop, Jan 2007.

Cited By

View all
  • (2019)Efficient Pipelined Broadcast with Monitoring Processing Node Status on a Multi-Core ProcessorMathematics10.3390/math71211597:12(1159)Online publication date: 1-Dec-2019
  • (2017)NucleusACM Transactions on Embedded Computing Systems10.1145/312654416:5s(1-16)Online publication date: 27-Sep-2017
  • (2017)Scheduling multithreaded applications onto heterogeneous composite cores architecture2017 Eighth International Green and Sustainable Computing Conference (IGSC)10.1109/IGCC.2017.8323570(1-8)Online publication date: Oct-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CF '16: Proceedings of the ACM International Conference on Computing Frontiers
May 2016
487 pages
ISBN:9781450341288
DOI:10.1145/2903150
  • General Chairs:
  • Gianluca Palermo,
  • John Feo,
  • Program Chairs:
  • Antonino Tumeo,
  • Hubertus Franke
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 May 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. accelerator
  2. application characterization
  3. big data
  4. heterogeneous architectures
  5. performance
  6. power

Qualifiers

  • Research-article

Funding Sources

Conference

CF'16
Sponsor:
CF'16: Computing Frontiers Conference
May 16 - 19, 2016
Como, Italy

Acceptance Rates

CF '16 Paper Acceptance Rate 30 of 94 submissions, 32%;
Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)103
  • Downloads (Last 6 weeks)15
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Efficient Pipelined Broadcast with Monitoring Processing Node Status on a Multi-Core ProcessorMathematics10.3390/math71211597:12(1159)Online publication date: 1-Dec-2019
  • (2017)NucleusACM Transactions on Embedded Computing Systems10.1145/312654416:5s(1-16)Online publication date: 27-Sep-2017
  • (2017)Scheduling multithreaded applications onto heterogeneous composite cores architecture2017 Eighth International Green and Sustainable Computing Conference (IGSC)10.1109/IGCC.2017.8323570(1-8)Online publication date: Oct-2017
  • (2017)Machine Learning-Based Approaches for Energy-Efficiency Prediction and Scheduling in Composite Cores Architectures2017 IEEE International Conference on Computer Design (ICCD)10.1109/ICCD.2017.28(129-136)Online publication date: Nov-2017
  • (2016)Big data analytics on heterogeneous accelerator architecturesProceedings of the Eleventh IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis10.1145/2968456.2976765(1-3)Online publication date: 1-Oct-2016

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media