[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3386723.3387826acmotherconferencesArticle/Chapter ViewAbstractPublication PagesnissConference Proceedingsconference-collections
research-article

Big Data Solutions Proposed for Cluster Computing Systems Challenges: A survey

Published: 18 May 2020 Publication History

Abstract

CCS (Cluster Computing System) is coming to solve the problems of standard technology. Whose, objective is to improve the performance/power efficiency of a single processor for storing and mining the large data sets, using the parallel programming to read and process the massive data sets on multiple disks and CPUs. The thing which makes these systems somewhat performant than the standard technology is the physical organization of computing nodes in the cluster. Currently, this kind of cluster does not entirely solve the problem because it comes with its challenges, which are Node failures, Computations, Network Bottleneck, and Distributed programming. All these problems are coming when we are mining and storing the massive volume of data using cluster computing. To solve these challenges, Google invented a new Big Data framework of data processing called MapReduce, to manage large scale data processing across large clusters of commodity servers. The paper outlines the running of CCS and presents its challenges in this era of Big Data. Moreover, it introduces the most popular Big Data solutions proposed to overcome the CCS challenges. Also, it shows how Big Data technologies solve CCS issues. Generally, the main goal of this work is to provide a better understanding of the challenges of CCS and identify the essential big data solutions in this increasingly important area.

References

[1]
K. Shvachko, H. Kuang, S. Radia, and R. Chansler, The Hadoop distributed file system, in Mass storage systems and technologies (MSST), 2010 IEEE 26th symposium on. IEEE, 2010, pp. 110.
[2]
A. S. Tanenbaum and M. Van Steen, Distributed systems: principles and paradigms. Prentice-Hall, 2007.
[3]
M. Alian, D. Kim, and N. Sung Kim, pd-gem5: Simulation Infrastructure for Parallel/Distributed Computer Systems, IEEE Computer Architecture Letters, vol. 15, no. 1, pp. 4144, Jan. 2016.
[4]
L.-Y. Ho, J.-J. Wu, and P. Liu, Optimal Algorithms for Cross-Rack Communication Optimization in MapReduce Framework. IEEE, Jul. 2011, pp. 420427.
[5]
A. Patti, A. Shah, S. Gaikwad, A. A. Mishra, S. S. Kohli, and S. Dhage, Fault Tolerance in Cluster Computing System. IEEE, Oct. 2011, pp. 408412.
[6]
R. Buyya, High Performance Cluster Computing: Architectures and Systems, Volume I, Prentice Hall, Upper SaddleRiver, NJ, USA, vol. 1, p. 999, 1999.
[7]
L. A. Barroso, J. Dean, and U. Holzle, Web search for a planet: The Google cluster architecture, IEEE micro, vol. 23, no. 2, pp. 2228, 2003.
[8]
S. R and S. K. R, Data Mining with Big Data, in 2017 11th International Conference on Intelligent Systems and Control (ISCO), Jan. 2017, pp. 246250.
[9]
What is a Network Bottleneck? Techopedia, available at https://www.techopedia.com/definition/24819/networkbottleneck.
[10]
L. Zhengyou and C. Tao, A Distributed Parallel Algorithm for Web Page Inverted Indexes Construction on the Cluster Computing Systems, vol. 2, May 2009, pp. 3336.
[11]
5 Google Projects That Changed Big Data Forever MapR, available at https://mapr.com/blog/5-google-projects-changed-big-data-forever/
[12]
J. Dean and S. Ghemawat, Mapreduce: Simplified data processing on large clusters, Commun. ACM, vol. 51, no. 1, pp. 107113, Jan. 2008
[13]
J. Bent, D. Thain, A. C. Arpaci-Dusseau, R. H. Arpaci-Dusseau, and M. Livny, Explicit Control in the Batch-Aware Distributed File System. in NSDI, vol. 4, 2004, pp. 365378.
[14]
M. Vaidya and S. Deshpande, Comparative analysis of various distributed file systems performance evaluation using map reduce implementation, in 2016 International Conference on Recent Advances and Innovations in Engineering (ICRAIE), Dec. 2016, pp. 16
[15]
S. Wu, G. Chen, K. Chen, F. Li, and L. Shou, HM: A Column-Oriented MapReduce System on Hybrid Storage, IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 12, pp. 33043317, Dec. 2015.
[16]
Google File System, available at https://en.wikipedia.org/w/index.php?Title=GoogleFileSystemoldid=769106631.
[17]
S. Ghemawat, H. Gobioff, and S.-T. Leung, The google file system, SIGOPS Oper. Syst. Rev., vol.37, no. 5, pp. 2943, Oct. 2003.
[18]
M. R, Tejus, C. R. K, and B. S, A Big Data MapReduce Hadoop distribution architecture for processing input splits to solve the small data problem, in 2016 2nd International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT), Jul. 2016, pp. 480487.
[19]
Apache Hadoop, available at https://en.wikipedia.org/w/index.php?title=ApacheHadoopoldid=802593380.
[20]
W. Fan and A. Bifet, Mining big data: Current status, and forecast to the future, SIGKDD Explor. Newsl. vol. 14, no. 2, pp. 15, Apr. 2013.
[21]
J. Kim, T. K. A. Kumar, K. M. George, and N. Park, performance evaluation and tuning for MapReduce computing in Hadoop distributed file system, in 2015 IEEE 13th International Conference on Industrial Informatics (INDIN), Jul. 2015, pp. 62 68.
[22]
A. Kumar, R. Shankar, A. Choudhary, and L. S. Thakur, A big data MapReduce framework for fault diagnosis in cloud-based manufacturing, International Journal of Production Research, vol. 54, no. 23, pp. 70607073, Dec. 2016.
[23]
J. Dean. Experiences with mapreduce, an abstraction for Large-scale computation. In PACT, volume 6, pages 11, 2006.
[24]
Z. Xiao and Y. Xiao. Accountable MapReduce in cloud computing. In Proc. IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS11), pages 10821087, 2011
[25]
H. Asaadi, D. Khaldi, and B. Chapman. "A Comparative Survey of the HPC and Big Data Paradigms: Analysis and Experiments," in 2016 IEEE International Conference on Cluster Computing (CLUSTER), 2016, pp. 423--432
[26]
A. Chiesa, E. Tromer, and M. Virza, "Cluster Computing in Zero Knowledge," Berlin, Heidelberg, 2015, pp. 371--403.
[27]
A. Lamba, S. Singh, N. Dutta, S. S. R. Muni, "Uses of cluster computing techniques to perform BigData analytics for smart grid automation system", International Journal For Technological Research In Engineering V. 1, March-2014
[28]
I.S. Bajwa, A. A. Chaudhri, M. A. "Naeem Processing Large Data Sets using a Cluster Computing Framework," Australian Journal of Basic and Applied Sciences, V.6, 1614-1618pp, 2011.
[29]
C. K. K. Reddy, K. E. B. Chandrudu, P. R. Anisha and G. V. S. Raju, high Performance Computing Cluster System and its Future Aspects in Processing Big Data," 2015 International Conference on Computational Intelligence and Communication Networks (CICN), Jabalpur, 2015, pp. 881--885.
[30]
H. Singh and G. Singh, "A Survey Paper on Task Scheduling Methods in Cluster Computing Environment for High Performance," 2015 Fifth International Conference on Advanced Computing & Communication Technologies.

Cited By

View all
  • (2024)A hybrid Hadoop-based sentiment analysis classifier for tweets associated with COVID-19 utilizing two machine learning algorithms: CNN, and fuzzy C4.5Journal of Big Data10.1186/s40537-024-01014-411:1Online publication date: 18-Dec-2024
  • (2023)A novel framework to enhance the performance of training distributed deep neural networksIntelligent Data Analysis10.3233/IDA-22671027:3(753-768)Online publication date: 1-Jan-2023
  • (2022)Emotion Processing by Applying a Fuzzy-Based Vader Lexicon and a Parallel Deep Belief Network Over Massive DataIEEE Access10.1109/ACCESS.2022.320038910(87870-87899)Online publication date: 2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
NISS '20: Proceedings of the 3rd International Conference on Networking, Information Systems & Security
March 2020
528 pages
ISBN:9781450376341
DOI:10.1145/3386723
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 May 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Big Data
  2. CCS
  3. Challenges
  4. Distributed File System
  5. MapReduce

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

NISS2020

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A hybrid Hadoop-based sentiment analysis classifier for tweets associated with COVID-19 utilizing two machine learning algorithms: CNN, and fuzzy C4.5Journal of Big Data10.1186/s40537-024-01014-411:1Online publication date: 18-Dec-2024
  • (2023)A novel framework to enhance the performance of training distributed deep neural networksIntelligent Data Analysis10.3233/IDA-22671027:3(753-768)Online publication date: 1-Jan-2023
  • (2022)Emotion Processing by Applying a Fuzzy-Based Vader Lexicon and a Parallel Deep Belief Network Over Massive DataIEEE Access10.1109/ACCESS.2022.320038910(87870-87899)Online publication date: 2022
  • (2022)Optimization Focused on Parallel Fuzzy Deep Belief Neural Network for Opinion MiningBusiness Intelligence10.1007/978-3-031-06458-6_1(3-28)Online publication date: 13-May-2022
  • (2021)Performance Analysis of Cloud Computing for Distributed Data Center using Cloud-Sim2021 IEEE International Conference on Communications Workshops (ICC Workshops)10.1109/ICCWorkshops50388.2021.9473876(1-6)Online publication date: Jun-2021
  • (2021)A MapReduce Opinion Mining for COVID-19-Related Tweets Classification Using Enhanced ID3 Decision Tree ClassifierIEEE Access10.1109/ACCESS.2021.30732159(58706-58739)Online publication date: 2021
  • (2021)Sentence-Level Classification Using Parallel Fuzzy Deep Learning ClassifierIEEE Access10.1109/ACCESS.2021.30539179(17943-17985)Online publication date: 2021
  • (2021)Sentiment Analysis of Covid19 Tweets Using A MapReduce Fuzzified Hybrid Classifier Based On C4.5 Decision Tree and Convolutional Neural NetworkE3S Web of Conferences10.1051/e3sconf/202129701052297(01052)Online publication date: 22-Sep-2021
  • (2021)A MapReduce Improved ID3 Decision Tree for Classifying Twitter DataBusiness Intelligence10.1007/978-3-030-76508-8_13(160-182)Online publication date: 16-May-2021
  • (2020)A MapReduce C4.5 Decision Tree Algorithm Based on Fuzzy Rule-Based SystemFuzzy Information and Engineering10.1080/16168658.2020.175609911:4(446-473)Online publication date: 23-Jun-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media