Abstract
Sentiment Analysis is one of the most active research areas in natural language processing and an extensively studied problem in data mining, web mining and text mining for English language. With the proliferation of social media these days, data is widely increasing in regional languages along with English. Telugu is one such regional language with abundant data available in social media, but it’s hard to find a labeled training set as human annotation is time-consuming and cost-ineffective. To address this issue, in this paper the practicality of active learning for Telugu sentiment analysis is investigated. We built a hybrid approach by combining different query selection strategy frameworks to increase more accurate training data instances with limited labeled data. Using a set of classifiers like SVM, XGBoost, and Gradient Boosted Trees (GBT), we achieved promising results with minimal error rate.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Settles, B.: Active learning literature survey. Technical report (2010)
Lewis, D.D.: A sequential algorithm for training text classifiers: corrigendum and additional data, pp. 13–19 (1995)
Kolar Rajagopal, A., Subramanian, R., Ricci, E., Vieriu, R.L., Lanz, O., Kalpathi, R., Sebe, N.: Exploring transfer learning approaches for head pose classification from multi-view surveillance images. Int. J. Comput. Vision 109, 146–167 (2014)
Huang, S.J., Jin, R., Zhou, Z.H.: Active learning by querying informative and representative examples. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems, pp. 892–900 (2010)
Settles, B., Craven, M.: An analysis of active learning strategies for sequence labeling tasks. In: EMNLP 2008, pp. 1070–1079. Association for Computational Linguistics (2008)
Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 287–294 (1992)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016)
Ganjisaffar, Y., Caruana, R., Lopes, C.V.: Bagging gradient-boosted trees for high precision, low variance ranking models. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 85–94. ACM (2011)
Motlani, R., Lalwani, H., Shrivastava, M., Sharma, D.M.: Developing part-of-speech tagger for a resource poor language: Sindhi
Gad-Elrab, M.H., Yosef, M.A., Weikum, G.: Named entity disambiguation for resource-poor languages. In: Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval, ESAIR 2015, pp. 29–34 (2015)
Gasser, M.: Expanding the lexicon for a resource-poor language using a morphological analyzer and a web crawler. In: Proceedings of the International Conference on Language Resources and Evaluation, LREC 2010, Valletta, Malta, 17–23 May 2010
Sravanthi, M.C., Prathyusha, K., Mamidi, R.: A Dialogue System for Telugu, a Resource-Poor Language, pp. 364–374 (2015)
Settles, B.: Active learning. Synth. Lect. Artif. Intell. Mach. Learn. 6(1), 1–114 (2012)
Freund, Y., Seung, H.S., Shamir, E., Tishby, N.: Selective sampling using the query by committee algorithm. Mach. Learn. 28, 133–168 (1997)
Cohn, D., Atlas, L., Ladner, R.: Improving generalization with active learning. Mach. Learn. 15, 201–221 (1994)
Chu, W., Zinkevich, M., Li, L., Thomas, A., Tseng, B.: Unbiased online active learning in data streams. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 195–203 (2011)
Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45–66 (2001)
Campbell, C., Cristianini, N., Smola, A., et al.: Query learning with large margin classifiers. In: ICML, pp. 111–118 (2000)
Kremer, J., Steenstrup Pedersen, K., Igel, C.: Active learning with support vector machines. Wiley Interdiscip. Rev. Data Min. Knowl. Disc. 4(4), 313–326 (2014)
Fu, Y., Zhu, X., Li, B.: A survey on instance selection for active learning (2012)
Reitmaier, T., Sick, B.: Let us know your decision: pool-based active training of a generative classifier with the selection strategy 4DS. Inf. Sci. 230, 106–131 (2013)
Kottke, D., Krempl, G., Spiliopoulou, M.: Probabilistic active learning in datastreams. In: Fromont, E., Bie, T., Leeuwen, M. (eds.) IDA 2015. LNCS, vol. 9385, pp. 145–157. Springer, Cham (2015). doi:10.1007/978-3-319-24465-5_13
Settles, B.: Curious machines: active learning with structured instances. ProQuest (2008)
Zhou, S., Chen, Q., Wang, X.: Active deep learning method for semi-supervised sentiment classification. Neurocomputing 120, 536–546 (2013)
Li, S., Ju, S., Zhou, G., Li, X.: Active learning for imbalanced sentiment classification. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Association for Computational Linguistics, pp. 139–148 (2012)
Boiy, E., Moens, M.F.: A machine learning approach to sentiment analysis in multilingual web texts. Inf. Retrieval 12(5), 526–558 (2009)
Mukku, S.S., Choudhary, N., Mamidi, R.: Enhanced sentiment classification of Telugu text using ML techniques. In: Proceedings of 25th International Joint Conference on Artificial Intelligence, p. 29 (2016)
Patra, B.G., Das, D., Das, A., Prasath, R.: Shared task on sentiment analysis in Indian languages (SAIL) tweets - an overview. In: Prasath, R., Vuppala, A.K., Kathirvalavakumar, T. (eds.) MIKE 2015. LNCS, vol. 9468, pp. 650–655. Springer, Cham (2015). doi:10.1007/978-3-319-26832-3_61
Gupta, R., Goyal, P., Diwakar, S.: Transliteration among Indian languages using WX notation. In: KONVENS, pp. 147–150 (2010)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
Krishnamurti, B., Gwynn, J.P.L.: A Grammar of Modern Telugu. Oxford University Press, New York (1985)
Krishnamurthi, B.: Telugu verbal bases: a comparative and descriptive study (1961)
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents
Mahajan, A., Teneketzis, D.: Multi-armed bandit problems. In: Hero, A.O., Castañón, D.A., Cochran, D., Kastella, K. (eds.) Foundations and Applications of Sensor Management, pp. 121–151. Springer, Boston (2008)
Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, pp. 233–240 (2006)
Seewald, A.K.: Meta-learning for stacked classification. Audiology 24(226), 69
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Mukku, S.S., Oota, S.R., Mamidi, R. (2017). Tag Me a Label with Multi-arm: Active Learning for Telugu Sentiment Analysis. In: Bellatreche, L., Chakravarthy, S. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2017. Lecture Notes in Computer Science(), vol 10440. Springer, Cham. https://doi.org/10.1007/978-3-319-64283-3_26
Download citation
DOI: https://doi.org/10.1007/978-3-319-64283-3_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64282-6
Online ISBN: 978-3-319-64283-3
eBook Packages: Computer ScienceComputer Science (R0)