Article

A Performance Analysis of MapReduce Task with Large Number of Files Dataset in Big Data Using Hadoop

Authors:

Amrit Pal,

Kunal Jain,

Pinki Agrawal,

Sanjay AgrawalAuthors Info & Claims

CSNT '14: Proceedings of the 2014 Fourth International Conference on Communication Systems and Network Technologies

Pages 587 - 591

https://doi.org/10.1109/CSNT.2014.124

Published: 07 April 2014 Publication History

Abstract

Big Data is a huge amount of data that cannot be managed by the traditional data management system. Hadoop is a technological answer to Big Data. Hadoop Distributed File System (HDFS) and MapReduce programming model is used for storage and retrieval of the big data. The Tera Bytes size file can be easily stored on the HDFS and can be analyzed with MapReduce. This paper provides introduction to Hadoop HDFS and MapReduce for storing large number of files and retrieve information from these files. In this paper we present our experimental work done on Hadoop by applying a number of files as input to the system and then analyzing the performance of the Hadoop system. We have studied the amount of bytes written and read by the system and by the MapReduce. We have analyzed the behavior of the map method and the reduce method with increasing number of files and the amount of bytes written and read by these tasks.

Cited By

View all

Sidhu RSaroa C(2016)Efficient Batch Processing of Related Big Data Tasks using Persistent MapReduce TechniqueProceedings of the Third International Symposium on Computer Vision and the Internet10.1145/2983402.2983431(106-109)Online publication date: 21-Sep-2016
https://dl.acm.org/doi/10.1145/2983402.2983431

Recommendations

A Time Based Analysis of Data Processing on Hadoop Cluster
CICN '14: Proceedings of the 2014 International Conference on Computational Intelligence and Communication Networks

Data when it becomes in that much amount that it cannot be managed by the traditional database management system then it is Big data. It is difficult to manage this much amount of the data. Hadoop is a technological answer to the Big Data. Data storage ...
Efficient Batch Processing of Related Big Data Tasks using Persistent MapReduce Technique
VisionNet'16: Proceedings of the Third International Symposium on Computer Vision and the Internet

The data generated by today's enterprises has been increasing at exponential rates in size from most recent couple of years. Also, the need to process and break down the substantial volumes of data has likewise expanded. In order to handle this enormous ...
Managing Variant Calling Files the Big Data Way: Using HDFS and Apache Parquet
BDCAT '17: Proceedings of the Fourth IEEE/ACM International Conference on Big Data Computing, Applications and Technologies

Big Data has been seen as a remedy for the efficient management of the ever-increasing genomic data. In this paper, we investigate the use of Apache Spark to store and process Variant Calling Files (VCF) on a Hadoop cluster. We demonstrate Tomatula, a ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

CSNT '14: Proceedings of the 2014 Fourth International Conference on Communication Systems and Network Technologies

April 2014

1199 pages

ISBN:9781479930708

Publisher

IEEE Computer Society

United States

Publication History

Published: 07 April 2014

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Sidhu RSaroa C(2016)Efficient Batch Processing of Related Big Data Tasks using Persistent MapReduce TechniqueProceedings of the Third International Symposium on Computer Vision and the Internet10.1145/2983402.2983431(106-109)Online publication date: 21-Sep-2016
https://dl.acm.org/doi/10.1145/2983402.2983431

Abstract

Cited By

Recommendations

A Time Based Analysis of Data Processing on Hadoop Cluster

Efficient Batch Processing of Related Big Data Tasks using Persistent MapReduce Technique

Managing Variant Calling Files the Big Data Way: Using HDFS and Apache Parquet

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations