Abstract
While there has been a recent progress in the area of Arabic Sentiment Analysis, most of the resources in this area are either of limited size, domain specific or not publicly available. In this paper, we address this problem by generating large multi-domain datasets for Sentiment Analysis in Arabic. The datasets were scrapped from different reviewing websites and consist of a total of 33K annotated reviews for movies, hotels, restaurants and products. Moreover we build multi-domain lexicons from the generated datasets. Different experiments have been carried out to validate the usefulness of the datasets and the generated lexicons for the task of sentiment classification. From the experimental results, we highlight some useful insights addressing: the best performing classifiers and feature representation methods, the effect of introducing lexicon based features and factors affecting the accuracy of sentiment classification in general. All the datasets, experiments code and results have been made publicly available for scientific purposes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abdul-Mageed, M., Diab, M.: AWATIF: A Multi-Genre Corpus for Modern Standard Arabic Subjectivity and Sentiment Analysis. In: LREC, pp. 3907–3914 (2012)
Abdul-mageed, M., Diab, M.: SANA: A Large Scale Multi-Genre, Multi-Dialect Lexicon for Arabic Subjectivity and Sentiment Analysis. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), pp. 1162–1169 (2014)
Abdul-Mageed, M., Diab, M.: Toward Building a Large-Scale Arabic Sentiment Lexicon. In: Proceedings of the 6th International Global WordNet Conference, pp. 18–22 (2012)
Aly, M., Atiya, A.: LABR: A Large Scale Arabic Book Reviews Dataset, pp. 494–498. Aclweb.Org. (2013)
Baccianella, S., et al.: SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), pp. 2200–2204 (2010)
Badaro, G., et al.: A Large Scale Arabic Sentiment Lexicon for Arabic Opinion Mining. In: ANLP 2014, pp. 176–184 (2014)
El-Beltagy, S., Ali, A.: Open Issues in the Sentiment Analysis of Arabic Social Media: A Case Study. In: Proceedings of 9th International Conference on Innovations in Information Technology (IIT), pp. 215–220 (2013)
ElSahar, H., El-Beltagy, S.R.: A Fully Automated Approach for Arabic Slang Lexicon Extraction from Microblogs. In: Gelbukh, A. (ed.) CICLing 2014, Part I. LNCS, vol. 8403, pp. 79–91. Springer, Heidelberg (2014)
Jerry, B., Osgood, C.: The pollyanna hypothesis. J. Verbal Learning Verbal Behav. 8(1), 1–8 (1969)
Maamouri, M., et al.: The penn arabic treebank: Building a large-scale annotated arabic corpus. In: NEMLAR Conference on Arabic Language Resources and Tools, pp. 102–109 (2004)
Martineau, J., et al.: Delta TFIDF: An Improved Feature Space for Sentiment Analysis. In: Proc. Second Int. Conf. Weblogs Soc. Media (ICWSM), vol. 29, pp. 490–497 (2008)
Nabil, M., et al.: LABR: A Large Scale Arabic Book Reviews Dataset. arXiv Prepr. arXiv1411.6718 (2014)
Ng, A.: Feature selection, L 1 vs. L 2 regularization, and rotational invariance. In: ICML (2004)
Pang, B., et al.: Thumbs up? Sentiment Classification using Machine Learning Techniques. In: Conf. Empir. Methods Nat. Lang. Process. (EMNLP 2002), pp. 79–86 (2002)
Pang, B., Lee, L.: Opinion Mining and Sentiment Analysis. In: The 42nd annual meeting on Association for Computational Linguistics, pp. 271–278 (2004)
Pang, B., Lee, L.: Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales 1 (2005)
Pang, B., Lee, L.: Thumbs up? Sentiment classification using machine learning techniques. In: Proc. Conf. Empir. Methods Nat. Lang. Process., Philadephia, Pennsylvania, USA, July 6-7, pp. 79–86 (2002)
Rushdi-Saleh, M., Martin-Valdivia, T.: OCA: Opinion corpus for Arabic. J. Am. Soc. Inf. Sci. Technol. 62(10), 2045–2054 (2011)
Taboada, M., et al.: Lexicon-Based Methods for Sentiment Analysis (2011)
Turney, P.D.: Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 417–424. Association for Computational Linguistics (2002)
Zhu, J., et al.: 1 -norm Support Vector Machines. Advances in Neural Information Processing Systems 16(1), 49–56 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
ElSahar, H., El-Beltagy, S.R. (2015). Building Large Arabic Multi-domain Resources for Sentiment Analysis. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9042. Springer, Cham. https://doi.org/10.1007/978-3-319-18117-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-18117-2_2
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18116-5
Online ISBN: 978-3-319-18117-2
eBook Packages: Computer ScienceComputer Science (R0)