Abstract
The rise of social media has provided new opportunities to study human emotions through self-reported information such as text, emojis/emoticons, and geo-locations. Research has shown that hybrid models which integrate lexicons and machine learning methods can improve the accuracy of sentiment prediction. We propose the Normalized Difference Sentiment Index (NDSI) to identify frequently-occurring words that are predictive of positive or negative sentiments. Furthermore, we propose e-senti, a new hybrid model which combines 3 attributes (lexicons, a new NDSI word rank list, and tweet features) into a random forest classifier. We contribute to the methodology of sentiment analysis by introducing a model that is easy to implement, efficient, and accurate. We compare four widely used lexicons and find the AFINN lexicon most effective and efficient for our model. We test the e-senti model based on the sentiment140 data and tweets from Los Angeles County, California. Our results show that the maximum accuracy for the sentiment140 data is 86.1% and for our Los Angeles County data is 74.6%, outperforming most existing methods. Our future work will link the geo-tagged sentiment data to land use data to reveal how emotions and the built environment are connected.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agarwal, A., Xie, B., Vovsha, I., Rambow, O., Passonneau, R.: Sentiment analysis of Twitter data. In: Proceedings of the Workshop on Languages in Social Media, pp. 30–38. Association for Computational Linguistics (2011)
Alessia, D., Ferri, F., Grifoni, P., Guzzo, T.: Approaches, tools and applications for sentiment analysis implementation. Int. J. Comput. Appl. 125(3), 26–33 (2015)
Barbera, P.: StreamR: access to Twitter streaming API via R (2014). R package version 0.2.1. https://CRAN.R-project.org/package=streamR
Barbosa, L., Feng, J.: Robust sentiment detection on Twitter from biased and noisy data. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 36–44. Association for Computational Linguistics (2010)
Bradley, M.M., Lang, P.J.: Affective norms for English words (ANEW): instruction manual and affective ratings. Technical report C-1, The Center for Research in Psychophysiology, University of Florida (1999)
Cambria, E., Schuller, B., Xia, Y., Havasi, C.: New avenues in opinion mining and sentiment analysis. IEEE Intell. Syst. 28(2), 15–21 (2013)
Cambria, E., Schuller, B., Xia, Y., Havasi, C.: New avenues in opinion mining and sentiment analysis. IEEE Intell. Syst. 2, 15–21 (2013)
Chanel, G., Kronegg, J., Grandjean, D., Pun, T.: Emotion assessment: arousal evaluation using EEG’s and peripheral physiological signals. In: Gunsel, B., Jain, A.K., Tekalp, A.M., Sankur, B. (eds.) MRCS 2006. LNCS, vol. 4105, pp. 530–537. Springer, Heidelberg (2006). doi:10.1007/11848035_70
Dadvar, M., Hauff, C., de Jong, F.: Scope of negation detection in sentiment analysis. In: Proceedings of the Dutch-Belgian Information Retrieval Workshop, Amsterdam, pp. 16–20 (2011)
Davidov, D., Tsur, O., Rappoport, A.: Enhanced sentiment learning using Twitter hashtags and smileys. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 241–249. Association for Computational Linguistics (2010)
Davidov, D., Tsur, O., Rappoport, A.: Semi-supervised recognition of sarcastic sentences in Twitter and Amazon. In: Proceedings of the Fourteenth Conference on Computational Natural Language Learning, pp. 107–116. Association for Computational Linguistics (2010)
Ding, X., Liu, B., Yu, P.S.: A holistic lexicon-based approach to opinion mining. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, pp. 231–240. ACM (2008)
Galavotti, L., Nardi, V.J., Sebastiani, F., Simi, M.: Feature selection and negative evidence in automated text categorization. In: Proceedings of the 4th European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2000) (2000)
Ghiassi, M., Skinner, J., Zimbra, D.: Twitter brand sentiment analysis: a hybrid system using n-gram analysis and dynamic artificial neural network. Expert Syst. Appl. 40(16), 6266–6282 (2013)
Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, 1, 12 (2009)
Gupte, A., Joshi, S., Gadgul, P., Kadam, A.: Comparative study of classification algorithms used in sentiment analysis. Int. J. Comput. Sci. Inf. Technol. 5(5), 6261–6264 (2014)
Lima, A.C.E., de Castro, L.N., Corchado, J.M.: A polarity analysis framework for Twitter messages. Appl. Math. Comput. 270, 756–767 (2015)
Lin, C., He, Y.: Joint sentiment/topic model for sentiment analysis. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 375–384. ACM (2009)
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing, vol. 999. MIT Press, Cambridge (1999)
Maynard, D., Funk, A.: Automatic detection of political opinions in tweets. In: García-Castro, R., Fensel, D., Antoniou, G. (eds.) ESWC 2011. LNCS, vol. 7117, pp. 88–99. Springer, Heidelberg (2012). doi:10.1007/978-3-642-25953-1_8
Mohammad, S., Turney, P.: Crowdsourcing a word-emotion association lexicon. Comput. Intell. 29(3), 436–465 (2013)
Mohammad, S.M.: Sentiment analysis: detecting valence, emotions, and other affectual states from text. In: Emotion Measurement (2015)
Mudinas, A., Zhang, D., Levene, M.: Combining lexicon and learning based approaches for concept-level sentiment analysis. In: Proceedings of the 1st International Workshop on Issues of Sentiment Discovery and Opinion Mining, p. 5 (2012)
Nielsen, F.: A new anew: evaluation of a word list for sentiment analysis in microblogs. arXiv:1103.2903 (2011)
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1–2), 1–135 (2008)
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86 (2002)
Poria, S., Cambria, E., Winterstein, G., Huang, G.B.: Sentic patterns: dependency-based rules for concept-level sentiment analysis. Knowl. Based Syst. 69, 45–63 (2014)
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Prabowo, R., Thelwall, M.: Sentiment analysis: a combined approach. J. Informetr. 3(2), 143–157 (2009)
Cambridge University Press: Cambridge online dictionary. Accessed 1 Mar 2017
Ravi, K., Ravi, V.: A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl. Based Syst. 89, 14–46 (2015)
Sabatinelli, D., Keil, A., Frank, D.W., Lang, P.J.: Emotional perception: correspondence of early and late event-related potentials with cortical and subcortical functional MRI. Biol. Psychol. 92(3), 513–519 (2013)
Saif, H., Fernandez, M., He, Y., Alani, H.: Alleviating data sparsity for Twitter sentiment analysis. In: 1st Interantional Workshop on Emotion and Sentiment in Social and Expressive Media: Approaches and Perspectives from AI (ESSEM 2013) (2013)
Saif, H., He, Y., Alani, H.: Semantic smoothing for twitter sentiment analysis. In: Proceeding of the 10th International Semantic Web Conference (ISWC) (2011)
Saif, H., He, Y., Alani, H.: Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold. In: CEUR Workshop Proceedings, vol. 838 (2012)
Speriosu, M., Sudan, N., Upadhyay, S., Baldridge, J.: Twitter polarity classification with label propagation over lexical links and the follower graph. In: Proceedings of the First workshop on Unsupervised Learning in NLP, pp. 53–63 (2011)
Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M.: Lexicon-based methods for sentiment analysis. Comput. Linguist. 37(2), 267–307 (2011)
Tan, P.N., Steinbach, M., Kumar, V., et al.: Introduction to Data Mining, vol. 1. Pearson Addison Wesley, Boston (2006)
Thayer, J.F., Åhs, F., Fredrikson, M., Sollers, J.J., Wager, T.D.: A meta-analysis of heart rate variability and neuroimaging studies: implications for heart rate variability as a marker of stress and health. Neurosci. Biobehav. Rev. 36(2), 747–756 (2012)
Turney, P.D.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 417–424 (2002)
Valstar, M.F., Mehu, M., Jiang, B., Pantic, M., Scherer, K.: Meta-analysis of the first facial expression recognition challenge. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 42(4), 966–979 (2012)
Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 347–354. Association for Computational Linguistics (2005)
Xia, R., Zong, C., Li, S.: Ensemble of feature sets and classification algorithms for sentiment classification. Inf. Sci. 181(6), 1138–1152 (2011)
Xiang, B., Zhou, L., Reuters, T.: Improving Twitter sentiment analysis with topic-based mixture modeling and semi-supervised training. In: ACL, Maryland, pp. 434–439 (2014)
Zhou, H., Chen, L., Shi, F., Huang, D.: Learning bilingual sentiment word embeddings for cross-language sentiment classification. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pp. 430–440 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Huang, A., Ebert, D., Rider, P. (2017). You Are What You Tweet: A New Hybrid Model for Sentiment Analysis. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2017. Lecture Notes in Computer Science(), vol 10358. Springer, Cham. https://doi.org/10.1007/978-3-319-62416-7_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-62416-7_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62415-0
Online ISBN: 978-3-319-62416-7
eBook Packages: Computer ScienceComputer Science (R0)