More Web Proxy on the site http://driver.im/

research-article

Big Data Solutions Proposed for Cluster Computing Systems Challenges: A survey

Authors:

Fatima Es-Sabery,

Abdellatif HairAuthors Info & Claims

NISS '20: Proceedings of the 3rd International Conference on Networking, Information Systems & Security

Article No.: 7, Pages 1 - 7

https://doi.org/10.1145/3386723.3387826

Published: 18 May 2020 Publication History

Abstract

CCS (Cluster Computing System) is coming to solve the problems of standard technology. Whose, objective is to improve the performance/power efficiency of a single processor for storing and mining the large data sets, using the parallel programming to read and process the massive data sets on multiple disks and CPUs. The thing which makes these systems somewhat performant than the standard technology is the physical organization of computing nodes in the cluster. Currently, this kind of cluster does not entirely solve the problem because it comes with its challenges, which are Node failures, Computations, Network Bottleneck, and Distributed programming. All these problems are coming when we are mining and storing the massive volume of data using cluster computing. To solve these challenges, Google invented a new Big Data framework of data processing called MapReduce, to manage large scale data processing across large clusters of commodity servers. The paper outlines the running of CCS and presents its challenges in this era of Big Data. Moreover, it introduces the most popular Big Data solutions proposed to overcome the CCS challenges. Also, it shows how Big Data technologies solve CCS issues. Generally, the main goal of this work is to provide a better understanding of the challenges of CCS and identify the essential big data solutions in this increasingly important area.

References

[1]

K. Shvachko, H. Kuang, S. Radia, and R. Chansler, The Hadoop distributed file system, in Mass storage systems and technologies (MSST), 2010 IEEE 26th symposium on. IEEE, 2010, pp. 110.

[2]

A. S. Tanenbaum and M. Van Steen, Distributed systems: principles and paradigms. Prentice-Hall, 2007.

Digital Library

[3]

M. Alian, D. Kim, and N. Sung Kim, pd-gem5: Simulation Infrastructure for Parallel/Distributed Computer Systems, IEEE Computer Architecture Letters, vol. 15, no. 1, pp. 4144, Jan. 2016.

Digital Library

[4]

L.-Y. Ho, J.-J. Wu, and P. Liu, Optimal Algorithms for Cross-Rack Communication Optimization in MapReduce Framework. IEEE, Jul. 2011, pp. 420427.

[5]

A. Patti, A. Shah, S. Gaikwad, A. A. Mishra, S. S. Kohli, and S. Dhage, Fault Tolerance in Cluster Computing System. IEEE, Oct. 2011, pp. 408412.

[6]

R. Buyya, High Performance Cluster Computing: Architectures and Systems, Volume I, Prentice Hall, Upper SaddleRiver, NJ, USA, vol. 1, p. 999, 1999.

Digital Library

[7]

L. A. Barroso, J. Dean, and U. Holzle, Web search for a planet: The Google cluster architecture, IEEE micro, vol. 23, no. 2, pp. 2228, 2003.

[8]

S. R and S. K. R, Data Mining with Big Data, in 2017 11th International Conference on Intelligent Systems and Control (ISCO), Jan. 2017, pp. 246250.

[9]

What is a Network Bottleneck? Techopedia, available at https://www.techopedia.com/definition/24819/networkbottleneck.

[10]

L. Zhengyou and C. Tao, A Distributed Parallel Algorithm for Web Page Inverted Indexes Construction on the Cluster Computing Systems, vol. 2, May 2009, pp. 3336.

[11]

5 Google Projects That Changed Big Data Forever MapR, available at https://mapr.com/blog/5-google-projects-changed-big-data-forever/

[12]

J. Dean and S. Ghemawat, Mapreduce: Simplified data processing on large clusters, Commun. ACM, vol. 51, no. 1, pp. 107113, Jan. 2008

Digital Library

[13]

J. Bent, D. Thain, A. C. Arpaci-Dusseau, R. H. Arpaci-Dusseau, and M. Livny, Explicit Control in the Batch-Aware Distributed File System. in NSDI, vol. 4, 2004, pp. 365378.

[14]

M. Vaidya and S. Deshpande, Comparative analysis of various distributed file systems performance evaluation using map reduce implementation, in 2016 International Conference on Recent Advances and Innovations in Engineering (ICRAIE), Dec. 2016, pp. 16

[15]

S. Wu, G. Chen, K. Chen, F. Li, and L. Shou, HM: A Column-Oriented MapReduce System on Hybrid Storage, IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 12, pp. 33043317, Dec. 2015.

Digital Library

[16]

Google File System, available at https://en.wikipedia.org/w/index.php?Title=GoogleFileSystemoldid=769106631.

[17]

S. Ghemawat, H. Gobioff, and S.-T. Leung, The google file system, SIGOPS Oper. Syst. Rev., vol.37, no. 5, pp. 2943, Oct. 2003.

Digital Library

[18]

M. R, Tejus, C. R. K, and B. S, A Big Data MapReduce Hadoop distribution architecture for processing input splits to solve the small data problem, in 2016 2nd International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT), Jul. 2016, pp. 480487.

[19]

Apache Hadoop, available at https://en.wikipedia.org/w/index.php?title=ApacheHadoopoldid=802593380.

[20]

W. Fan and A. Bifet, Mining big data: Current status, and forecast to the future, SIGKDD Explor. Newsl. vol. 14, no. 2, pp. 15, Apr. 2013.

Digital Library

[21]

J. Kim, T. K. A. Kumar, K. M. George, and N. Park, performance evaluation and tuning for MapReduce computing in Hadoop distributed file system, in 2015 IEEE 13th International Conference on Industrial Informatics (INDIN), Jul. 2015, pp. 62 68.

[22]

A. Kumar, R. Shankar, A. Choudhary, and L. S. Thakur, A big data MapReduce framework for fault diagnosis in cloud-based manufacturing, International Journal of Production Research, vol. 54, no. 23, pp. 70607073, Dec. 2016.

[23]

J. Dean. Experiences with mapreduce, an abstraction for Large-scale computation. In PACT, volume 6, pages 11, 2006.

Digital Library

[24]

Z. Xiao and Y. Xiao. Accountable MapReduce in cloud computing. In Proc. IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS11), pages 10821087, 2011

[25]

H. Asaadi, D. Khaldi, and B. Chapman. "A Comparative Survey of the HPC and Big Data Paradigms: Analysis and Experiments," in 2016 IEEE International Conference on Cluster Computing (CLUSTER), 2016, pp. 423--432

[26]

A. Chiesa, E. Tromer, and M. Virza, "Cluster Computing in Zero Knowledge," Berlin, Heidelberg, 2015, pp. 371--403.

[27]

A. Lamba, S. Singh, N. Dutta, S. S. R. Muni, "Uses of cluster computing techniques to perform BigData analytics for smart grid automation system", International Journal For Technological Research In Engineering V. 1, March-2014

[28]

I.S. Bajwa, A. A. Chaudhri, M. A. "Naeem Processing Large Data Sets using a Cluster Computing Framework," Australian Journal of Basic and Applied Sciences, V.6, 1614-1618pp, 2011.

[29]

C. K. K. Reddy, K. E. B. Chandrudu, P. R. Anisha and G. V. S. Raju, high Performance Computing Cluster System and its Future Aspects in Processing Big Data," 2015 International Conference on Computational Intelligence and Communication Networks (CICN), Jabalpur, 2015, pp. 881--885.

[30]

H. Singh and G. Singh, "A Survey Paper on Task Scheduling Methods in Cluster Computing Environment for High Performance," 2015 Fifth International Conference on Advanced Computing & Communication Technologies.

Cited By

Es-sabery FEs-sabery IQadir JSainz-de-Abajo BGarcia-Zapirain B(2024)A hybrid Hadoop-based sentiment analysis classifier for tweets associated with COVID-19 utilizing two machine learning algorithms: CNN, and fuzzy C4.5Journal of Big Data10.1186/s40537-024-01014-411:1Online publication date: 18-Dec-2024
https://doi.org/10.1186/s40537-024-01014-4
Phan TDo P(2023)A novel framework to enhance the performance of training distributed deep neural networksIntelligent Data Analysis10.3233/IDA-22671027:3(753-768)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.3233/IDA-226710
Es-Sabery FEs-Sabery IHair ASainz-De-Abajo BGarcia-Zapirain B(2022)Emotion Processing by Applying a Fuzzy-Based Vader Lexicon and a Parallel Deep Belief Network Over Massive DataIEEE Access10.1109/ACCESS.2022.320038910(87870-87899)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3200389
Show More Cited By

Index Terms

Big Data Solutions Proposed for Cluster Computing Systems Challenges: A survey
1. Computing methodologies
  1. Modeling and simulation
    1. Simulation types and techniques
      1. Massively parallel and high-performance simulations
2. Information systems
  1. Data management systems
    1. Database management system engines

Recommendations

Challenges for MapReduce in Big Data
SERVICES '14: Proceedings of the 2014 IEEE World Congress on Services

In the Big Data community, MapReduce has been seen as one of the key enabling approaches for meeting continuously increasing demands on computing resources imposed by massive data sets. The reason for this is the high scalability of the MapReduce ...
Parallel Programming Paradigms and Frameworks in Big Data Era

With Cloud Computing emerging as a promising new approach for ad-hoc parallel data processing, major companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for customers to access these ...
Can we analyze big data inside a DBMS?
DOLAP '13: Proceedings of the sixteenth international workshop on Data warehousing and OLAP

Relational DBMSs remain the main data management technology, despite the big data analytics and no-SQL waves. On the other hand, for data analytics in a broad sense, there are plenty of non-DBMS tools including statistical languages, matrix packages, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

NISS '20: Proceedings of the 3rd International Conference on Networking, Information Systems & Security

March 2020

528 pages

ISBN:9781450376341

DOI:10.1145/3386723

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 May 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

NISS2020

NISS2020: The 3rd International Conference on Networking, Information Systems & Security

March 31 - April 2, 2020

Marrakech, Morocco

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
174
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)1

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Es-sabery FEs-sabery IQadir JSainz-de-Abajo BGarcia-Zapirain B(2024)A hybrid Hadoop-based sentiment analysis classifier for tweets associated with COVID-19 utilizing two machine learning algorithms: CNN, and fuzzy C4.5Journal of Big Data10.1186/s40537-024-01014-411:1Online publication date: 18-Dec-2024
https://doi.org/10.1186/s40537-024-01014-4
Phan TDo P(2023)A novel framework to enhance the performance of training distributed deep neural networksIntelligent Data Analysis10.3233/IDA-22671027:3(753-768)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.3233/IDA-226710
Es-Sabery FEs-Sabery IHair ASainz-De-Abajo BGarcia-Zapirain B(2022)Emotion Processing by Applying a Fuzzy-Based Vader Lexicon and a Parallel Deep Belief Network Over Massive DataIEEE Access10.1109/ACCESS.2022.320038910(87870-87899)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3200389
Es-sabery FEs-sabery KEl Akraoui BHair A(2022)Optimization Focused on Parallel Fuzzy Deep Belief Neural Network for Opinion MiningBusiness Intelligence10.1007/978-3-031-06458-6_1(3-28)Online publication date: 13-May-2022
https://doi.org/10.1007/978-3-031-06458-6_1
Syed SRasul AJaved TRizwan MSingh ADev KMagarini M(2021)Performance Analysis of Cloud Computing for Distributed Data Center using Cloud-Sim2021 IEEE International Conference on Communications Workshops (ICC Workshops)10.1109/ICCWorkshops50388.2021.9473876(1-6)Online publication date: Jun-2021
https://doi.org/10.1109/ICCWorkshops50388.2021.9473876
Es-Sabery FEs-Sabery KQadir JSainz-De-Abajo BHair AGarcia-Zapirain BDe La Torre-Diez I(2021)A MapReduce Opinion Mining for COVID-19-Related Tweets Classification Using Enhanced ID3 Decision Tree ClassifierIEEE Access10.1109/ACCESS.2021.30732159(58706-58739)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3073215
Es-Sabery FHair AQadir JSainz-De-Abajo BGarcia-Zapirain BTorre-Diez I(2021)Sentence-Level Classification Using Parallel Fuzzy Deep Learning ClassifierIEEE Access10.1109/ACCESS.2021.30539179(17943-17985)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3053917
Es-sabery FEs-sabery KGarmani HHair A(2021)Sentiment Analysis of Covid19 Tweets Using A MapReduce Fuzzified Hybrid Classifier Based On C4.5 Decision Tree and Convolutional Neural NetworkE3S Web of Conferences10.1051/e3sconf/202129701052297(01052)Online publication date: 22-Sep-2021
https://doi.org/10.1051/e3sconf/202129701052
Es-sabery FEs-sabery KHair A(2021)A MapReduce Improved ID3 Decision Tree for Classifying Twitter DataBusiness Intelligence10.1007/978-3-030-76508-8_13(160-182)Online publication date: 16-May-2021
https://doi.org/10.1007/978-3-030-76508-8_13
Es-sabery FHair A(2020)A MapReduce C4.5 Decision Tree Algorithm Based on Fuzzy Rule-Based SystemFuzzy Information and Engineering10.1080/16168658.2020.175609911:4(446-473)Online publication date: 23-Jun-2020
https://doi.org/10.1080/16168658.2020.1756099

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten