Abstract
Sentiment classification plays a significant role in everyday life, in political activities, in activities relating to commodity production, and commercial activities. Finding a solution for the accurate and timely classification of emotions is a challenging task. In this research, we propose a new model for big data sentiment classification in the parallel network environment. Our proposed model uses the Fuzzy C-Means (FCM) method for English sentiment classification with Hadoop MAP (M) /REDUCE (R) in Cloudera. Cloudera is a parallel network environment. Our proposed model can classify the sentiments of millions of English documents in the parallel network environment. We tested our model using the testing data set (which comprised 25,000 English reviews, 12,500 being positive and 12,500 negative) and achieved 60.2 % accuracy. Our English training data set has 60,000 English sentences, comprising 30,000 positive English sentences and 30,000 negative English sentences.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Large movie review dataset (2016) http://ai.stanford.edu/~amaas/data/sentiment/
Singh V K, Singh V K (2015) Vector space model: an information retrieval system. International Journal of Advanced Engineering Research and Studies
Carrera-Trejo V, Sidorov G, Miranda-Jiménez S, Moreno Ibarra M, Cadena Martínez R (2015) Latent Dirichlet allocation complement in the vector space model for multi-label text classification. International Journal of Combinatorial Optimization Problems and Informatics 6(1):7–19
Soucy P, Mineau G W (2005) Beyond TFIDF weighting for text categorization in the vector space model. In: Proceedings of the 19th international joint conference on Artificial intelligence, USA, pp 1130–1135
Hadoop (2016). http://hadoop.apache.org
Apache (2016). http://apache.org
Cloudera (2016). http://www.cloudera.com
Ghaffari M, Ghadiri N (2016) Ambiguity-driven fuzzy C-means clustering: how to detect uncertain clustered records. Applied Intelligence (APIN):1–12
RJ Hathaway J C, Bezdek Y H u (2000) Generalized fuzzy c-means clustering strategies using L/sub p/ norm distances. IEEE Trans Fuzzy Syst 8(5):576–582
Tsao E C -K, Bezdek J C, Pal N R (1994) Fuzzy Kohonen clustering networks. Pattern Recogn 27 (5):757–764
Hathaway R J, Bezdek J C (2001) Fuzzy c-means clustering of incomplete data. IEEE Trans Syst Man Cybern B (Cybern) 31(5):735–744
Lim Y W, Lee S U (1990) On the color image segmentation algorithm based on the thresholding and the fuzzy c-means techniques. Pattern Recogn 23(9):935–952
Bezdek J C, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203
Pal N R, Bezdek J C (2002) On cluster validity for the fuzzy c-means model. IEEE Trans Fuzzy Syst 3 (3):370–379
Pal N R, Pal K, Keller J M, Bezdek J C (2005) A possibilistic fuzzy c-means clustering algorithm. IEEE Trans Fuzzy Syst 13(4):517–530
Ahmed M N, Yamany S M, Mohamed N, Farag A A (2002) A modified fuzzy c-means algorithm for bias field estimation and segmentation of MRI data. IEEE Trans Med Imaging 21(3):193–199
Cannon R L, Dave J V, Bezdek J C (2009) Efficient implementation of the fuzzy c-means clustering algorithms. IEEE Trans Pattern Anal Mach Intell 8(2):248–255
Bezdek J C, Hathaway R J, Sabin M J, Tucker W T (1987) Convergence theory for fuzzy c-means: Counterexamples and repairs. IEEE Trans Syst Man Cybern 17(5):873–877
Hathaway R J, Bezdek J C (1994) Nerf c-means: non-euclidean relational fuzzy clustering. Pattern Recogn 27(3):429–437
D-Q Zhang S -C, Chen A (2004) Novel kernelized fuzzy C-means algorithm with application in medical image segmentation. Artif Intell Med 32(1):37–50
Hathaway R J, Davenport J W, Bezdek J C (1989) Relational duals of the c-means clustering algorithms. Pattern Recogn 22(2):205–212
Chuang K-S, Tzeng H -L, Chena S, Wu J, Chen T -J (2006) Fuzzy c-means clustering with spatial information for image segmentation. Comput Med Imaging Graph 30(1):9–15
Bahrampour S, Moshiri B, Salahshoor K (2011) Weighted and constrained possibilistic C-means clustering for online fault detection and isolation. Appl Intell (APIN) 35(2):269–284
Zhang D-Q, Chen S -C (2003) Clustering incomplete data using kernel-based fuzzy c-means algorithm. Neural Process Lett 18(3):155–162
Hall L O, Bensaid A M, Clarke L P, Velthuizen R P (2002) A comparison of neural network and fuzzy clustering techniques in segmenting magnetic resonance images of the brain. IEEE Trans Neural Netw 3(5):672–682
Kuo R J, Ho L M, Hu C M (2002) Integration of self-organizing feature map and K-means algorithm for market segmentation. Comput Oper Res 29(11):1475–1493
Kwok T, Smith K, Lozano S, Taniar D (2002) Parallel Fuzzy c-Means Clustering for Large Data Sets, Euro-Par 2002 Parallel Processing, Volume 2400 of the series Lecture Notes in Computer Science, pp 365–374
Xylogiannopoulos K F, Karampelas P, Alhajj R (2016) Repeated patterns detection in big data using classification and parallelism on LERP Reduced Suffix Arrays. Appl Intell (APIN):1–31
Carns P H, Ligon III W B, Ross R B, Thakur R (2000) PVFS: A parallel file system for linux clusters. In: Proceedings of the extreme linux track: 4th annual linux showcase and conference
Moyer S A, Sunderam V S (1994) PIOUS: a scalable parallel I/o system for distributed computing environments. In: Proceedings of the scalable high-performance computing conference
Shirazi B A, Kavi K M, Hurson A R (1995) Scheduling and load balancing in parallel and distributed systems, scheduling and load balancing in parallel and distributed systems, USA
Andrews G R (1999) Foundations of parallel and distributed programming. In: Foundations of parallel and distributed programming 1st, USA
Gropp W, Lusk E, Doss N, Skjellum A (1996) A high-performance, portable implementation of the MPI message passing interface standard. Parallel Comput 22(6):789–828
Yu Y, Isard M, Fetterly D, Budiu M, Erlingsson Ú, Gunda P K, Currey J (2008) dryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language symposium on operating system design and implementation (OSDI)
Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system
Guerrero J M, Matas J, Garcia de Vicuna L, Castilla M, Miret J (2007) Decentralized control for parallel operation of distributed generation inverters using resistive output impedance. IEEE Trans Ind Electron 54:2
van Steen M, Homburg P, Tanenbaum A S (1999) Globe: a wide-area distributed system. IEEE Concurr 7(1):70–78
Shende S S, Malony A D (2006) The tau parallel performance system. Int J High Perform Comput Appl 20(2):287–311
Bagrodia R, Meyer R, Takai M, Chen Y -A, Zeng X, Martin J, Song H Y (1998) Parsec: a parallel simulation environment for complex systems. Computer 31(10):77–85
RumelHart D E, Hinton G E, McClelland J L (1986) A general framework for parallel distributed processing. In: Parallel distributed processing: explorations in the microstructure of cognition, USA, vol 1, pp 45–76
Ikudome K, Fox G C, Kolawa A, Flower J W (1990) An automatic and symbolic parallelization system for distributed memory parallel computers. In: Proceedings of the fifth distributed memory computing conference
Wang H O, Tanaka K, Griffin M (1995) Parallel distributed compensation of nonlinear systems by Takagi-Sugeno fuzzy model
Poria S, Gelbukh A, Cambria E, Hussain A, Huang G -B (2014) EmoSenticSpace: a novel framework for affective common-sense reasoning. Knowl-Based Syst 69:108–123
Poria S, Gelbukh A, Das D, Bandyopadhyay S (2013) Fuzzy clustering for semi-supervised learning – case study: construction of an emotion lexicon. In: Advances in artificial intelligence, volume 7629 of the series lecture notes in computer science, pp 73–86
Vinchurkar S V, Nirkhi S M (2012) feature extraction of product from customer feedback through blog. International Journal of Emerging Technology and Advanced Engineering 2(1):2250–2459
IndiraPriya P, Ghosh D K (2013) A Survey on Different Clustering Algorithms in Data Mining Technique. International Journal of Modern Engineering Research (IJMER) 3(1):267–274
Ghasemi J, Ghaderi R, Karami Mollaei M R, Hojjatoleslami S A (2013) A novel fuzzy Dempster–Shafer inference system for brain MRI segmentation. Inf Sci 223:205–220
Sheeba J I, Vivekanandan K (2014) A fuzzy logic based on sentiment classification. International Journal of Data Mining & Knowledge Management Process (IJDKP) 4(4)
Liu C-L, Chang T -H, Li H -H (2013) Clustering documents with labeled and unlabeled documents using fuzzy semi-Kmeans. Fuzzy Sets Syst 221:48–64
Manek A S, Deepa Shenoy P, Chandra Mohan M, Venugopal K R (2016) Aspect term extraction for sentiment analysis in large movie reviews using gini index feature selection method and SVM classifier. World wide web, 1–20. doi:10.1007/s11280-015-0381-x. Print ISSN1386-145x, US
Agarwal B, Mittal N (2016) Machine learning approach for sentiment analysis. Prominent feature extraction for sentiment analysis, 21–45. doi:10.1007/978-3-319-25343-5_3. Print ISBN 978-3-319-25341-1
Agarwal B, Mittal N (2016) Semantic orientation-based approach for sentiment analysis. Prominent feature extraction for sentiment analysis, 77–88. doi:10.1007/978-3-319-25343-5_6. Print ISBN 978-3-319-25341-1
Canuto S, André M, Gonçalves F B (2016) Exploiting new sentiment-based meta-level features for effective sentiment analysis. In: Proceedings of the ninth ACM international conference on web search and data mining (WSDM ’16), New York, USA, pp 53–62
Ahmed S, Danti A (2016) Effective sentimental analysis and opinion mining of web reviews using rule based classifiers. Computational Intelligence in Data Mining 1:171–179. doi:10.1007/978-81-322-2734-2_18. Print ISBN 978-81-322-2732-8, India
Phu V N, Tuoi P T (2014) Sentiment classification using enhanced contextual valence shifters. In: International Conference on Asian Language Processing (IALP), pp 224–229
Tran V T N, Phu V N, Tuoi P T (2014) Learning more chi square feature selection to improve the fastest and most accurate sentiment classification. In: The third asian conference on information systems (ACIS 2014)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Phu, V.N., Dat, N.D., Ngoc Tran, V.T. et al. Fuzzy C-means for english sentiment classification in a distributed system. Appl Intell 46, 717–738 (2017). https://doi.org/10.1007/s10489-016-0858-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-016-0858-z