More Web Proxy on the site http://driver.im/

research-article

Node architecture implications for in-memory data analytics on scale-in clusters

Authors:

Ahsan Javed Awan,

Vladimir Vlassov,

Eduard AyguadeAuthors Info & Claims

BDCAT '16: Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies

Pages 237 - 246

https://doi.org/10.1145/3006299.3006319

Published: 06 December 2016 Publication History

Abstract

While cluster computing frameworks are continuously evolving to provide real-time data analysis capabilities, Apache Spark has managed to be at the forefront of big data analytics. Recent studies propose scale-in clusters with in-storage processing devices to process big data analytics with Spark However the proposal is based solely on the memory bandwidth characterization of in-memory data analytics and also does not shed light on the specification of host CPU and memory. Through empirical evaluation of in-memory data analytics with Apache Spark on an Ivy Bridge dual socket server, we have found that (i) simultaneous multi-threading is effective up to 6 cores (ii) data locality on NUMA nodes can improve the performance by 10% on average, (iii) disabling next-line L1-D prefetchers can reduce the execution time by up to 14%, (iv) DDR3 operating at 1333 MT/s is sufficient and (v) multiple small executors can provide up to 36% speedup over single large executor.

References

[1]

Hardware Prefetcher Control on Intel Processors. https://software.intel.com/en-us/articles/disclosure-of-hw-prefetcher-control-on-some-intel-processors.

[2]

HT Effectiveness. https://software.intel.com/en-us/articles/how-to-determine-the-effectiveness-of-hyper-threading-technology-with-an-application.

[3]

Hybrid memory cube consortium. hybrid memory cube specification 2.0. www.hybridmemorycube.org/ specification-v2-download-form/, Nov. 2014.

[4]

Intel Vtune Amplifier XE 2013. http://software.intel.com/en-us/node/544393.

[5]

msr-tools. https://01.org/msr-tools.

[6]

Numactl. http://linux.die.net/man/8/numactl.

[7]

Spark configuration. https://spark.apache.org/docs/1.5.1/configuration.html.

[8]

STREAM. https://www.cs.virginia.edu/stream/.

[9]

Using Intel VTune Amplifier XE to Tune Software on the Intel Xeon Processor E5/E7 v2 Family. https://software.intel.com/en-us/articles/using-intel-vtune-amplifier-xe-to-tune-software-on-the-intel-xeon-processor-e5e7-v2-family.

[10]

Appuswamy, R., Gkantsidis, C., Narayanan, D., Hodson, O., And Rowstron, A. I. T. Scale-up vs scale-out for hadoop: time to rethink? In ACM Symposium on Cloud Computing, SOCC (2013), p. 20.

Digital Library

[11]

Awan, A. J., Brorsson, M., Vlassov, V., And Ayguade, E. Big Data Benchmarks, Performance Optimization, and Emerging Hardware: 6th Workshop, BPOE 2015, Kohala, HI, USA, August 31 - September 4, 2015. Revised Selected Papers. Springer International Publishing, 2016, ch. How Data Volume Affects Spark Based Data Analytics on a Scale-up Server, pp. 81--92.

[12]

Beamer, S., Asanovic, K., And Patterson, D. Locality exists in graph processing: Workload characterization on an ivy bridge server. In Workload Characterization (IISWC), 2015 IEEE International Symposium on (2015), IEEE, pp. 56--65.

Digital Library

[13]

Broquedis, F., Clet-Ortega, J., Moreaud, S., Furmento, N., Goglin, B., Mercier, G., Thibault, S., And Namyst, R. hwloc: A generic framework for managing hardware affinities in hpc applications. In Parallel, Distributed and Network-Based Processing (PDP), 2010 18th Euromicro International Conference on (2010), IEEE, pp. 180--186.

Digital Library

[14]

Chen, R., Chen, H., And Zang, B. Tiled-mapreduce: Optimizing resource usages of data-parallel applications on multicore with tiling. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (2010), PACT '10, pp. 523--534.

Digital Library

[15]

Chiba, T., and Onodera, T. Workload characterization and optimization of tpc-h queries on apache spark. In 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) (April 2016), pp. 112--121.

[16]

Choi, I. S., and Kee, Y.-S. Energy efficient scale-in clusters with in-storage processing for big-data analytics. In Proceedings of the 2015 International Symposium on Memory Systems (2015), ACM, pp. 265--273.

Digital Library

[17]

Choi, I. S., Yang, W., and Kee, Y.-S. Early experience with optimizing i/o performance using high-performance ssds for in-memory cluster computing. In Big Data (Big Data), 2015 IEEE International Conference on (2015), IEEE, pp. 1073--1083.

Digital Library

[18]

Ferdman, M., Adileh, A., Kocberber, O., volos, S., Alisafaee, M., Jevdjic, D., Kaynak, C., Popescu, A. D., Ailamaki, A., and Falsafi, B. Clearing the clouds: A study of emerging scale-out workloads on modern hardware. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems (2012), ASPLOS XVII, pp. 37--48.

Digital Library

[19]

Huang, S., Huang, J., Dai, J., Xie, T., and Huang, B. The hibench benchmark suite: Characterization of the mapreduce-based data analysis. In Data Engineering Workshops (ICDEW), 2010 IEEE 26th International Conference on (2010), pp. 41--51.

[20]

Jacob, B. The memory system: you can't avoid it, you can't ignore it, you can't fake it. Synthesis Lectures on Computer Architecture 4, 1 (2009), 1--77.

Digital Library

[21]

Javed Awan, A., Brorsson, M., Vlassov, V., and Ayguade, E. Performance characterization of in-memory data analytics on a modern cloud server. In Big Data and Cloud Computing (BDCloud), 2015 IEEE Fifth International Conference on (2015), IEEE, pp. 1--8.

Digital Library

[22]

Jia, Z., Wang, L., Zhan, J., Zhang, L., and Luo, C. Characterizing data analysis workloads in data centers. In Workload Characterization (IISWC), IEEE International Symposium on (2013), pp. 66--76.

[23]

Jia, Z., Zhan, J., Wang, L., Han, R., Mckee, S. A., Yang, Q., Luo, C., and Li, J. Characterizing and subsetting big data workloads. In Workload Characterization (IISWC), IEEE International Symposium on (2014), pp. 191--201.

[24]

Jiang, T., Zhang, Q., Hou, R., Chai, L., McKee, S. A., Jia, Z., and Sun, N. Understanding the behavior of in-memory computing workloads. In Workload Characterization (IISWC), IEEE International Symposium on (2014), pp. 22--30.

[25]

Kanev, S., Darago, J. P., Hazelwood, K., Ranganathan, P., Moseley, T., Wei, G.-Y., Brooks, D., Campanoni, S., Brownell, K., Jones, T. M., et al. Profiling a warehouse-scale computer. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (2015), ACM, pp. 158--169.

Digital Library

[26]

Karakostas, V., Unsal, O. S., Nemirovsky, M., Cristal, A., and Swift, M. Performance analysis of the memory management unit under scale-out workloads. In Workload Characterization (IISWC), IEEE International Symposium on (Oct 2014), pp. 1--12.

[27]

Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D., Amde, M., Owen, S., et al. Mllib: Machine learning in apache spark. arXiv preprint arXiv:1505.06807 (2015).

[28]

Ming, Z., Luo, C., Gao, W., Han, R., Yang, Q., Wang, L., and Zhan, J. BDGS: A scalable big data generator suite in big data benchmarking. In Advancing Big Data Benchmarks, vol. 8585 of Lecture Notes in Computer Science. 2014, pp. 138--154.

[29]

Perera, S., and Suhothayan, S. Solution patterns for realtime streaming analytics. In Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems (2015), ACM, pp. 247--255.

Digital Library

[30]

Wang, L., Zhan, J., Luo, C., Zhu, Y., Yang, Q., He, Y., Gao, W., Jia, Z., Shi, Y., Zhang, S., Zheng, C., Lu, G., Zhan, K., Li, X., and Qiu, B. Bigdatabench: A big data benchmark suite from internet services. In 20th IEEE International Symposium on High Performance Computer Architecture, HPCA (2014), pp. 488--499.

[31]

Yasin, A. A top-down method for performance analysis and counters architecture. In 2014 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS (2014).

[32]

Yasin, A., Ben-Asher, Y., and Mendelson, A. Deep-dive analysis of the data analytics workload in cloudsuite. In Workload Characterization (IISWC), IEEE International Symposium on (Oct 2014), pp. 202--211.

[33]

Yoo, R. M., Romano, A., and Kozyrakis, C. Phoenix rebirth: Scalable mapreduce on a large-scale shared-memory system. In Proceedings of IEEE International Symposium on Workload Characterization (IISWC) (2009), pp. 198--207.

Digital Library

[34]

Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M. J., Shenker, S., and Stoica, I. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Presented as part of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12) (San Jose, CA, 2012), pp. 15--28.

Digital Library

[35]

Zhang, K., Chen, R., and Chen, H. Numa-aware graph-structured analytics. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2015), ACM, pp. 183--193.

Digital Library

Cited By

Giannoula CVijaykumar NPapadopoulou NKarakostas VFernandez IGomez-Luna JOrosa LKoziris NGoumas GMutlu O(2021)SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00031(263-276)Online publication date: Feb-2021
https://doi.org/10.1109/HPCA51647.2021.00031
Expósito RVeiga JTouriño J(2020)Enabling Hardware Affinity in JVM-Based Applications: A Case Study for Big DataComputational Science – ICCS 202010.1007/978-3-030-50371-0_3(31-44)Online publication date: 15-Jun-2020
https://doi.org/10.1007/978-3-030-50371-0_3
Ivanov TTaaffe JWolter KKnottenbelt Wvan Hoorn ANambiar MKoziolek H(2018)Exploratory Analysis of Spark Structured StreamingCompanion of the 2018 ACM/SPEC International Conference on Performance Engineering10.1145/3185768.3186360(141-146)Online publication date: 2-Apr-2018
https://dl.acm.org/doi/10.1145/3185768.3186360
Show More Cited By

Recommendations

A comprehensive memory analysis of data intensive workloads on server class architecture
MEMSYS '18: Proceedings of the International Symposium on Memory Systems

The emergence of data analytics frameworks requires computational resources and memory subsystems that can naturally scale to manage massive amounts of diverse data. Given the large size and heterogeneity of the data, it is currently unclear whether ...
Big Data Analytics Based on In-Memory Infrastructure On Traditional HPC: A Survey
ICTCS '16: Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies

As the capacity of main memory is growing, in-memory based big data analytics is becoming more popular. In-memory technologies support interactive analysis by providing high I/O throughput. On traditional high performance computing (HPC), big data ...
Making sense of performance in in-memory computing frameworks for scientific data analysis: A case study of the spark system
Abstract
Over the last five years, Apache Spark has become a major software platform for in-memory data analysis. Acknowledging its widespread use, we present a comprehensive study of system characteristics of Spark targeting scientific data ...
Highlights
- We develop a benchmark, ArrayBench, for benchmarking scientific data analytics that process gene expression matrices using Spark and SciDB.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

BDCAT '16: Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies

December 2016

373 pages

ISBN:9781450346177

DOI:10.1145/3006299

Program Chairs:
Ashiq Anjum
University of Derby, UK
,
Xinghui Zhao
Washington State University, Vancouver, WA

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 December 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Education, Audiovisual and Culture Executive Agency (EACEA) of the European Commission

Conference

UCC '16

Sponsor:

SIGHPC

UCC '16: 9th International Conference on Utility and Cloud Computing

December 6 - 9, 2016

Shanghai, China

Acceptance Rates

Overall Acceptance Rate 27 of 93 submissions, 29%

Upcoming Conference

BDCAT '24

Sponsor:
sigarch

IEEE/ACM 11th International Conference on Big Data Computing, Applications and Technologies

December 16 - 19, 2024

Sharjah , United Arab Emirates

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
203
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)1

Reflects downloads up to 10 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Giannoula CVijaykumar NPapadopoulou NKarakostas VFernandez IGomez-Luna JOrosa LKoziris NGoumas GMutlu O(2021)SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00031(263-276)Online publication date: Feb-2021
https://doi.org/10.1109/HPCA51647.2021.00031
Expósito RVeiga JTouriño J(2020)Enabling Hardware Affinity in JVM-Based Applications: A Case Study for Big DataComputational Science – ICCS 202010.1007/978-3-030-50371-0_3(31-44)Online publication date: 15-Jun-2020
https://doi.org/10.1007/978-3-030-50371-0_3
Ivanov TTaaffe JWolter KKnottenbelt Wvan Hoorn ANambiar MKoziolek H(2018)Exploratory Analysis of Spark Structured StreamingCompanion of the 2018 ACM/SPEC International Conference on Performance Engineering10.1145/3185768.3186360(141-146)Online publication date: 2-Apr-2018
https://dl.acm.org/doi/10.1145/3185768.3186360
Baig SAmaral MPolo JCarrera D(2018)Performance Characterization of Spark Workloads on Shared NUMA Systems2018 IEEE Fourth International Conference on Big Data Computing Service and Applications (BigDataService)10.1109/BigDataService.2018.00015(41-48)Online publication date: Mar-2018
https://doi.org/10.1109/BigDataService.2018.00015
Awan AOhara MAyguade EIshizaki KBrorsson MVlassov VJacob B(2017)Identifying the potential of near data processing for apache sparkProceedings of the International Symposium on Memory Systems10.1145/3132402.3132427(60-67)Online publication date: 2-Oct-2017
https://dl.acm.org/doi/10.1145/3132402.3132427
Bae JJang HJin WHeo JJang JHwang JCho SLee J(2017)Jointly optimizing task granularity and concurrency for in-memory mapreduce frameworks2017 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2017.8257921(130-140)Online publication date: Dec-2017
https://doi.org/10.1109/BigData.2017.8257921

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents