Abstract
Web 2.0 has brought a huge amount of user-generated, social media data that contains rich information about people’s opinions and ideas towards various products, services, and ongoing social and political events. Nowadays, many companies start to look into and try to leverage this new type of data to understand their customers in order to make better business strategies and services. As a nation with rapid economic growth in recently years, China has become visible and started to play an important role in the global business and economy. Also, with the large number of Chinese Internet users, a considerable amount of options about Chinese business and market have been expressed in social media sites. Thus, it will be of interest to explore and understand those user-generated contents in Chinese. In this study, we develop an integrated framework to analyze user sentiments from Chinese social media sites by leveraging sentiment analysis techniques. Based on the framework, we conduct experiments on two popular Chinese Web forums, both related to business and marketing. By utilizing Elastic Net together with a rich body of feature representations, we achieve the highest F-measures of 84.4 and 86.7 % for the two data sets, respectively. We also demonstrate the interpretability of Elastic Net by discussing the top-ranked features with positive or negative sentiments.
Similar content being viewed by others
References
O’Reilly T (2005) What is web 2.0? Design patterns and business models for the next generation of software. http://wwworeillynetcom/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20html
Subrahmanian VS (2009) Mining online opinions. Computer 42(7):88–90
Cheong C, Lee V (2011) A microblogging-based approach to terrorism informatics: exploration and chronicling civilian sentiment and response to terrorism events via Twitter. Inf Syst Frontier 13:45–49
Hu N, Liu L, Zhang JJ (2008) Do online reviews affect product sales? The role of reviewer characteristics and temporal effects. Inf Technol Manag 9(3):201–214
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 1(1–2):1–135
Tweedie FJ, Baayen RH (1998) How variable may a constant be? measures of lexical richness in perspective. Comput Hum 32(5):323–352
Li N, Wu DD (2010) Using text mining and sentiment analysis for online forums hotspot detection and forecast. Decis Support Syst 48(2):354–368
Ye Q, Shi W, Li Y (2006) Sentiment classification for movie reviews in chinese by improved semantic oriented approach. Paper presented at the Proceedings of the 39th Hawaii International Conference on System Sciences (HICSS’06)
Abbasi A, Chen H, Salem A (2008) Sentiment analysis in multiple languages: feature selection for opinion classification in Web forums. ACM Trans Inf Syst 26(3):1–34
Tan S, Zhang J (2008) An empirical study of sentiment analysis for Chinese documents. Expert Syst Appl 34(4):2622–2629
Esuli A, Sebastiani F SentiWordNet: A Publicly Available Lexical Resource for Opinion Mining. In: Proceedings of LREC-06, the 5th Conference on Language Resources and Evaluation, Genova, IT, 2006. pp 417–422
Zagibalov T (2007) Kinds of Features for Chinese Opinionated Information Retrieval. Paper presented at the 45th Annual Meeting of the Association for Computational Linguistics (ACL’ 07)
Zheng R, Li J, Chen H, Huang Z (2006) A framework for authorship identification of online messages: writing-style features and classification techniques. J Am Soc Inf Sci Technol (JASIST) 57(3):378–393
Abbasi A, Chen H (2009) A comparison of fraud cues and classification methods for fake escrow website detection. Inf Technol Manag 10(2–3):83–101
Benamara F, Cesarano C, Reforgiato D Sentiment analysis: adjectives and adverbs are better than adjectives alone. In: Proceedings of the International Conference on Weblogs and Social Media (ICWSM-2007), Boulder, CO, 2007. pp 203–206
Glance N, Hurst M, Nigam K (2008) Deriving marketing intelligence from online discussion. Paper presented at the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08)
Zeng D, Wei D, Chau M, Wang F (2011) Domain-specific Chinese word segmentation using suffix tree and mutual information. Inf Syst Frontier 13:115–125
Tan S, Wang Y, Cheng X (2008) combining learn-based and lexicon-based techniques for sentiment detection without using labeled examples. Paper presented at the ACM Conference on research and development in information retrieval (sigir’08)
Zhang C, Zeng D, Li J, Wang F, Zuo W (2009) Sentiment analysis of Chinese documents: from sentence to document level. J Am Soc Inf Sci Technol (JASIST) 60(12):2474–2487
Li L, Sun M (2007) Experimental study on sentiment classification of chinese review using machine learning techniques. Paper presented at the International Conference on Natural Language Processing and Knowledge Engineering
Haasdonk B (2005) Feature space interpretation of SVMs with indefinite kernels. IEEE Trans Pattern Anal Mach Intell 27(4):482–492
Stets JE (ed) (2006) Emotions and sentiments. Handbook of Social Psychology, Springer US
Yuan G-X, Chang K-W, Hsieh C-J, Lin C-J (2010) A Comparison of optimization methods and software for large-scale L1-regularized linear classification. Journal of Machine Learning Research 11: J Mach Learn Res
Genkin A, Lewis DD, Madigan D (2007) Large-scale bayesian logistic regression for text categorization. Technometrics 49(3):291–305
Zhang T, Oles FJ (2001) Text categorization based on regularized linear classification methods. Inf Retrieval 4(1):5–31
Ifrim G, Bakir G, Weikum G (2008) Fast logistic regression for text categorization with variable-length N-grams. Paper presented at the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08)
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J Roy Stat Soc 67(2):301–320
Tseng H, Chang P, Andrew G, Jurafsky D, Manning C A Conditional Random Field Word Segmenter for Sighan Bakeoff 2005. In: Fourth SIGHAN Workshop on Chinese Language Processing 2005
Wei Z, Miao D, Chauchat JH (2008) Feature selection on Chinese text classification using character N-Grams. Paper presented at the The 3rd International Conference on Rough Sets and Knowledge Technology
Zhai Z, Xu H, Kang B (2011) Exploiting effective features for Chinese sentiment classification. Expert Syst Appl 38(8):9139–9146
Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal: An Int J 1(3):131–156
Piramuthu S (2005) Feature selection for reduction of tabular knowledge-based systems. Inf Technol Manag 6(4):351–362
Sikora R, Piramuthu S (2005) Efficient genetic algorithm based data mining using feature selection with hausdorff distance. Inf Technol Manag 6(4):315–331
Das SR, Chen MY (2007) Yahoo! for amazon: sentiment extraction from small talk on the web. Manag Sci 53(9):1357–1388
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explorations 11(1):10–18
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22
Acknowledgment
This work is supported by the NSF Computer and Network Systems (CNS) Program, “(CRI: CRD) Developing a Dark Web Collection and Infrastructure for Computational and Social Sciences,” (CNS-0709338).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Fan, L., Zhang, Y., Dang, Y. et al. Analyzing sentiments in Web 2.0 social media data in Chinese: experiments on business and marketing related Chinese Web forums. Inf Technol Manag 14, 231–242 (2013). https://doi.org/10.1007/s10799-013-0160-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10799-013-0160-2