[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3006299.3006319acmconferencesArticle/Chapter ViewAbstractPublication PagesbdcatConference Proceedingsconference-collections
research-article

Node architecture implications for in-memory data analytics on scale-in clusters

Published: 06 December 2016 Publication History

Abstract

While cluster computing frameworks are continuously evolving to provide real-time data analysis capabilities, Apache Spark has managed to be at the forefront of big data analytics. Recent studies propose scale-in clusters with in-storage processing devices to process big data analytics with Spark However the proposal is based solely on the memory bandwidth characterization of in-memory data analytics and also does not shed light on the specification of host CPU and memory. Through empirical evaluation of in-memory data analytics with Apache Spark on an Ivy Bridge dual socket server, we have found that (i) simultaneous multi-threading is effective up to 6 cores (ii) data locality on NUMA nodes can improve the performance by 10% on average, (iii) disabling next-line L1-D prefetchers can reduce the execution time by up to 14%, (iv) DDR3 operating at 1333 MT/s is sufficient and (v) multiple small executors can provide up to 36% speedup over single large executor.

References

[1]
Hardware Prefetcher Control on Intel Processors. https://software.intel.com/en-us/articles/disclosure-of-hw-prefetcher-control-on-some-intel-processors.
[2]
HT Effectiveness. https://software.intel.com/en-us/articles/how-to-determine-the-effectiveness-of-hyper-threading-technology-with-an-application.
[3]
Hybrid memory cube consortium. hybrid memory cube specification 2.0. www.hybridmemorycube.org/ specification-v2-download-form/, Nov. 2014.
[4]
Intel Vtune Amplifier XE 2013. http://software.intel.com/en-us/node/544393.
[5]
msr-tools. https://01.org/msr-tools.
[6]
Numactl. http://linux.die.net/man/8/numactl.
[7]
Spark configuration. https://spark.apache.org/docs/1.5.1/configuration.html.
[8]
STREAM. https://www.cs.virginia.edu/stream/.
[9]
Using Intel VTune Amplifier XE to Tune Software on the Intel Xeon Processor E5/E7 v2 Family. https://software.intel.com/en-us/articles/using-intel-vtune-amplifier-xe-to-tune-software-on-the-intel-xeon-processor-e5e7-v2-family.
[10]
Appuswamy, R., Gkantsidis, C., Narayanan, D., Hodson, O., And Rowstron, A. I. T. Scale-up vs scale-out for hadoop: time to rethink? In ACM Symposium on Cloud Computing, SOCC (2013), p. 20.
[11]
Awan, A. J., Brorsson, M., Vlassov, V., And Ayguade, E. Big Data Benchmarks, Performance Optimization, and Emerging Hardware: 6th Workshop, BPOE 2015, Kohala, HI, USA, August 31 - September 4, 2015. Revised Selected Papers. Springer International Publishing, 2016, ch. How Data Volume Affects Spark Based Data Analytics on a Scale-up Server, pp. 81--92.
[12]
Beamer, S., Asanovic, K., And Patterson, D. Locality exists in graph processing: Workload characterization on an ivy bridge server. In Workload Characterization (IISWC), 2015 IEEE International Symposium on (2015), IEEE, pp. 56--65.
[13]
Broquedis, F., Clet-Ortega, J., Moreaud, S., Furmento, N., Goglin, B., Mercier, G., Thibault, S., And Namyst, R. hwloc: A generic framework for managing hardware affinities in hpc applications. In Parallel, Distributed and Network-Based Processing (PDP), 2010 18th Euromicro International Conference on (2010), IEEE, pp. 180--186.
[14]
Chen, R., Chen, H., And Zang, B. Tiled-mapreduce: Optimizing resource usages of data-parallel applications on multicore with tiling. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (2010), PACT '10, pp. 523--534.
[15]
Chiba, T., and Onodera, T. Workload characterization and optimization of tpc-h queries on apache spark. In 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) (April 2016), pp. 112--121.
[16]
Choi, I. S., and Kee, Y.-S. Energy efficient scale-in clusters with in-storage processing for big-data analytics. In Proceedings of the 2015 International Symposium on Memory Systems (2015), ACM, pp. 265--273.
[17]
Choi, I. S., Yang, W., and Kee, Y.-S. Early experience with optimizing i/o performance using high-performance ssds for in-memory cluster computing. In Big Data (Big Data), 2015 IEEE International Conference on (2015), IEEE, pp. 1073--1083.
[18]
Ferdman, M., Adileh, A., Kocberber, O., volos, S., Alisafaee, M., Jevdjic, D., Kaynak, C., Popescu, A. D., Ailamaki, A., and Falsafi, B. Clearing the clouds: A study of emerging scale-out workloads on modern hardware. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems (2012), ASPLOS XVII, pp. 37--48.
[19]
Huang, S., Huang, J., Dai, J., Xie, T., and Huang, B. The hibench benchmark suite: Characterization of the mapreduce-based data analysis. In Data Engineering Workshops (ICDEW), 2010 IEEE 26th International Conference on (2010), pp. 41--51.
[20]
Jacob, B. The memory system: you can't avoid it, you can't ignore it, you can't fake it. Synthesis Lectures on Computer Architecture 4, 1 (2009), 1--77.
[21]
Javed Awan, A., Brorsson, M., Vlassov, V., and Ayguade, E. Performance characterization of in-memory data analytics on a modern cloud server. In Big Data and Cloud Computing (BDCloud), 2015 IEEE Fifth International Conference on (2015), IEEE, pp. 1--8.
[22]
Jia, Z., Wang, L., Zhan, J., Zhang, L., and Luo, C. Characterizing data analysis workloads in data centers. In Workload Characterization (IISWC), IEEE International Symposium on (2013), pp. 66--76.
[23]
Jia, Z., Zhan, J., Wang, L., Han, R., Mckee, S. A., Yang, Q., Luo, C., and Li, J. Characterizing and subsetting big data workloads. In Workload Characterization (IISWC), IEEE International Symposium on (2014), pp. 191--201.
[24]
Jiang, T., Zhang, Q., Hou, R., Chai, L., McKee, S. A., Jia, Z., and Sun, N. Understanding the behavior of in-memory computing workloads. In Workload Characterization (IISWC), IEEE International Symposium on (2014), pp. 22--30.
[25]
Kanev, S., Darago, J. P., Hazelwood, K., Ranganathan, P., Moseley, T., Wei, G.-Y., Brooks, D., Campanoni, S., Brownell, K., Jones, T. M., et al. Profiling a warehouse-scale computer. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (2015), ACM, pp. 158--169.
[26]
Karakostas, V., Unsal, O. S., Nemirovsky, M., Cristal, A., and Swift, M. Performance analysis of the memory management unit under scale-out workloads. In Workload Characterization (IISWC), IEEE International Symposium on (Oct 2014), pp. 1--12.
[27]
Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D., Amde, M., Owen, S., et al. Mllib: Machine learning in apache spark. arXiv preprint arXiv:1505.06807 (2015).
[28]
Ming, Z., Luo, C., Gao, W., Han, R., Yang, Q., Wang, L., and Zhan, J. BDGS: A scalable big data generator suite in big data benchmarking. In Advancing Big Data Benchmarks, vol. 8585 of Lecture Notes in Computer Science. 2014, pp. 138--154.
[29]
Perera, S., and Suhothayan, S. Solution patterns for realtime streaming analytics. In Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems (2015), ACM, pp. 247--255.
[30]
Wang, L., Zhan, J., Luo, C., Zhu, Y., Yang, Q., He, Y., Gao, W., Jia, Z., Shi, Y., Zhang, S., Zheng, C., Lu, G., Zhan, K., Li, X., and Qiu, B. Bigdatabench: A big data benchmark suite from internet services. In 20th IEEE International Symposium on High Performance Computer Architecture, HPCA (2014), pp. 488--499.
[31]
Yasin, A. A top-down method for performance analysis and counters architecture. In 2014 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS (2014).
[32]
Yasin, A., Ben-Asher, Y., and Mendelson, A. Deep-dive analysis of the data analytics workload in cloudsuite. In Workload Characterization (IISWC), IEEE International Symposium on (Oct 2014), pp. 202--211.
[33]
Yoo, R. M., Romano, A., and Kozyrakis, C. Phoenix rebirth: Scalable mapreduce on a large-scale shared-memory system. In Proceedings of IEEE International Symposium on Workload Characterization (IISWC) (2009), pp. 198--207.
[34]
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M. J., Shenker, S., and Stoica, I. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Presented as part of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12) (San Jose, CA, 2012), pp. 15--28.
[35]
Zhang, K., Chen, R., and Chen, H. Numa-aware graph-structured analytics. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2015), ACM, pp. 183--193.

Cited By

View all
  • (2021)SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00031(263-276)Online publication date: Feb-2021
  • (2020)Enabling Hardware Affinity in JVM-Based Applications: A Case Study for Big DataComputational Science – ICCS 202010.1007/978-3-030-50371-0_3(31-44)Online publication date: 15-Jun-2020
  • (2018)Exploratory Analysis of Spark Structured StreamingCompanion of the 2018 ACM/SPEC International Conference on Performance Engineering10.1145/3185768.3186360(141-146)Online publication date: 2-Apr-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
BDCAT '16: Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies
December 2016
373 pages
ISBN:9781450346177
DOI:10.1145/3006299
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 December 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. NUMA
  2. SMT
  3. spark

Qualifiers

  • Research-article

Funding Sources

  • Education, Audiovisual and Culture Executive Agency (EACEA) of the European Commission

Conference

UCC '16
Sponsor:

Acceptance Rates

Overall Acceptance Rate 27 of 93 submissions, 29%

Upcoming Conference

BDCAT '24

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2021)SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00031(263-276)Online publication date: Feb-2021
  • (2020)Enabling Hardware Affinity in JVM-Based Applications: A Case Study for Big DataComputational Science – ICCS 202010.1007/978-3-030-50371-0_3(31-44)Online publication date: 15-Jun-2020
  • (2018)Exploratory Analysis of Spark Structured StreamingCompanion of the 2018 ACM/SPEC International Conference on Performance Engineering10.1145/3185768.3186360(141-146)Online publication date: 2-Apr-2018
  • (2018)Performance Characterization of Spark Workloads on Shared NUMA Systems2018 IEEE Fourth International Conference on Big Data Computing Service and Applications (BigDataService)10.1109/BigDataService.2018.00015(41-48)Online publication date: Mar-2018
  • (2017)Identifying the potential of near data processing for apache sparkProceedings of the International Symposium on Memory Systems10.1145/3132402.3132427(60-67)Online publication date: 2-Oct-2017
  • (2017)Jointly optimizing task granularity and concurrency for in-memory mapreduce frameworks2017 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2017.8257921(130-140)Online publication date: Dec-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media