short-paper

On Exploring the Optimum Configuration of Apache Spark Framework in Heterogeneous Clusters

Authors:

Ioannis Ballas,

Vassilios Tsakanikas,

Evaggelos Pefanis,

Vassilios TampakasAuthors Info & Claims

PCI '21: Proceedings of the 25th Pan-Hellenic Conference on Informatics

Pages 250 - 253

https://doi.org/10.1145/3503823.3503870

Published: 22 February 2022 Publication History

Get Access

Abstract

During the previous decade, both industry and academia have started to apply the Big Data paradigm, exploring the value of data. As the volume of the collected data increases, the required computational infrastructures need to increase their capacities in order to be able to process the data. This work proposes a model for assessing the optimal configuration parameters in a heterogeneous Spark Cluster which is validated against two different use cases. The performed experiments have shown that the proposed model can successfully estimate the optimal Spark configuration parameters, both for memory-intensive and CPU-intensive applications.

References

[1]

J. Nishank and G. Dharanipragada, "Sparker: Optimizing Spark for Heterogeneous Clusters," in IEEE International Conference on Cloud Computing Technology and Science, 2018.

Google Scholar

[2]

T. White, Hadoop: The definitive guide, O'Reilly Media, Inc., 2012.

Digital Library

Google Scholar

[3]

K. Aziz, "Leveraging resource management for efficient performance of Apache Spark," J. of Big Data, p. 6:78, 2019.

Crossref

Google Scholar

[4]

Z. Tang, A. Zeng, X. Zhang and L. Yang, "Dynamic memory-aware scheduling in spark computing environment," Journal of Parallel and Distributed Computing, vol. 141, pp. 10-22, 2020.

Crossref

Google Scholar

[5]

X. Huang, L. Chunlin and L. Youlong, "Optimized Speculative Execution Strategy for Different Workload Levels in Heterogeneous Spark Cluster," in ICBDC 2019, Guangzhou, China, 2019.

Google Scholar

[6]

Y. Zhiwei, "Adaptive Task Scheduling Strategy for Heterogeneous Spark Cluster," Computer Engineering, vol. 42, no. 1, pp. 31-35, 2016.

Google Scholar

[7]

J. Dean, "MapReduce: simplified data processing on large clusters," Communications of the ACM, vol. 51, no. 1, pp. 107-113, 2008.

Digital Library

Google Scholar

[8]

G. Ananthanarayanan, S. Kandula and A. e. a. Greenberg, "Reining in the outliers in map-reduce clusters using Mantri," in Usenix Conference on Operating Systems Design and Implementation, 2010.

Google Scholar

[9]

S. Brin and L. Page, "The anatomy of a large-scale hypertextual Web search engine," Computer Networks and ISDN Systems, vol. 30, 1998.

Google Scholar

Index Terms

On Exploring the Optimum Configuration of Apache Spark Framework in Heterogeneous Clusters

Index terms have been assigned to the content through auto-classification.

Recommendations

Performance comparison of Apache Hadoop and Apache Spark
ICAICR '19: Proceedings of the Third International Conference on Advanced Informatics for Computing Research

The term 'Big Data' is a broad term used for the data sets, which is enormous and traditional data processing applications find it hard to process. Both Apache Spark and Apache Hadoop are one of the significant parts of the big data family. Some of the ...
A comparative between hadoop mapreduce and apache Spark on HDFS
IML '17: Proceedings of the 1st International Conference on Internet of Things and Machine Learning

Data is growing now in a very high speed with a large volume, Spark and MapReduce¹ both provide a processing model for analyzing and managing this large data -Big Data- stored on HDFS. In this paper, we discuss a comparative between Apache Spark and ...
Learning Apache Spark 2.0

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

PCI '21: Proceedings of the 25th Pan-Hellenic Conference on Informatics

November 2021

499 pages

ISBN:9781450395557

DOI:10.1145/3503823

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper
Research
Refereed limited

Funding Sources

Operational Program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH ? CREATE ? INNOVATE

Conference

PCI 2021

PCI 2021: 25th Pan-Hellenic Conference on Informatics

November 26 - 28, 2021

Volos, Greece

Acceptance Rates

Overall Acceptance Rate 190 of 390 submissions, 49%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
54
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)1

Reflects downloads up to 13 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Abstract

References

Index Terms

Recommendations

Performance comparison of Apache Hadoop and Apache Spark

A comparative between hadoop mapreduce and apache Spark on HDFS

Learning Apache Spark 2.0

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

HTML Format

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations