[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

A language-model-based approach for subjectivity detection

Published: 01 June 2017 Publication History

Abstract

The rapid growth of opinionated text on the Web increases the demand for efficient methods for detecting subjective texts. In this paper, a subjectivity detection method is proposed which utilizes a language-model-based structure to define a subjectivity score for each document where the topic relevance of documents does not affect the subjectivity scores. In order to overcome the limited content in short documents, we further propose an expansion method to better estimate the language models. Since the lack of linguistic resources in resource-lean languages like Persian makes subjectivity detection difficult in these languages, the method is proposed in two versions: a semi-supervised version for resource-lean languages and a supervised version. Experimental evaluations on five datasets in two languages, English and Persian, demonstrate that the method performs well in distinguishing subjective documents from objective ones in both languages.

References

[1]
{1} Liu B, Zhang L. A survey of opinion mining and sentiment analysis. In: Aggarwal CC, Zhai CX eds Mining Text Data. New York, NY: Springer, 2012, pp. pp.415-–463.
[2]
{2} Liu B. Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies2012; Volume 5 Issue 1: pp.1-–167.
[3]
{3} Mishne G. Multiple ranking strategies for opinion retrieval in blogs. In: Proceedings of the fifteenth Text REtrieval Conference TREC, 2006.
[4]
{4} Zhang W, Yu C, Meng W. Opinion retrieval from blogs. In: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management CIKM, New York, NY: ACM, 2007, pp. pp.831-–840.
[5]
{5} Gerani S, Carman M.J, Crestani F. Proximity-based opinion retrieval. In: Proceedings of the 33th annual international ACM SIGIR conference on Research and development in information retrieval SIGIR, 2010, pp. pp.403-–410.
[6]
{6} Yang K, Yu N, Zhang H. Combining lexicon based methods to detect opinionated blogs. In: Proceedings of the sixteenth Text REtrieval Conference TREC, 2007, pp. pp.1-–12.
[7]
{7} Pang B, Lee L. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of ACL, 2004, pp. pp.271-–278.
[8]
{8} Dave K, Lawrence S, Pennock DM. Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of WWW, 2003, pp. pp.519-–528.
[9]
{9} Wiebe J, Wilson T, Bell M. Identifying collocations for recognizing opinions. In: Proceedings of ACL'01 Workshop on Collocation: Computational Extraction, Analysis, and Exploitation, 2001, pp. pp.24-–31.
[10]
{10} Hu M, Liu B. Mining and summarizing customer reviews. In: Proceedings of KDD'04, 2004, pp. pp.168-–177.
[11]
{11} Karimzadehgan M, Zhai C. Estimation of statistical translation models based on mutual information for ad hoc information retrieval. In: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval SIGIR, New York, NY: ACM, 2010, pp. pp.323-–330.
[12]
{12} Yang K, Yu N, Valerio A, Zhang H. WIDIT in TREC-2006 Blog track. In: Voorhees E, Buckland L eds Proceedings of the 15th Text Retrieval Conference TREC. Gaithersburg, MD, November, 2006.
[13]
{13} Liao X, Cao D, Tan S, Liu Y, Ding G, Cheng X. Combing language model with sentiment analysis for opinion retrieval of blog-post. In: Online Proceedings of TREC, 2006, pp. pp.211-–213.
[14]
{14} Yi J, Nasukawa T, Bunescu R, Niblack W. Sentiment analyzer: Extracting sentiments about a given topic using natural language processing techniques. In: Proceedings of ICDM, 2003, p. pp.427.
[15]
{15} Na S, Lee Y, Nam S, Lee J. Improving opinion retrieval based on query-specific sentiment lexicon. In: Proceedings of ECIR, 2009, pp.pp.734-–738.
[16]
{16} Qiu G, Zhang F, Bu J, Chen C. Domain specific opinion retrieval. In: Proceedings of AIRS, 2009, pp. pp.318-–329.
[17]
{17} Huang X, Croft W.B. A unified relevance model for opinion retrieval. In: Proceedings of CIKM, 2009, pp. pp.947-–956.
[18]
{18} García-Moya L, Anaya-Sanchez H, Berlanga-Llavori R. A language model approach for retrieving product features and opinions from customer reviews. IEEE Intelligent Systems2013; Volume 28 Issue 3: pp.19-–27.
[19]
{19} Cambria E, Olsher D, Rajagopal D. SenticNet 3: A common and common-sense knowledge base for cognition-driven sentiment analysis. In: AAAI. Quebec, Canada: AAAI, 2014, pp. pp.1515-–1521.
[20]
{20} Qiu G, Liu B, Bu J, Chen C. Opinion word expansion and target extraction through double propagation. Computational Linguistics2011; Volume 37 Issue 1: pp.9-–27.
[21]
{21} Turney PD. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the Association for Computational Linguistics ACL. Morristown, NJ: ACL, 2002, pp. pp.417-–424.
[22]
{22} Pang B, Lee L, Vaithyanathan S. Thumbs up? sentiment classification using machine learning techniques. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing EMNLP. 2002, pp. pp.79-–86.
[23]
{23} Dave K, Lawrence S, Pennock DM. Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of WWW, 2003, pp. pp.519-–528.
[24]
{24} Mullen T, Collier N. Sentiment analysis using support vector machines with diverse information sources. In: Proceedings of EMNLP, 2004, pp. pp.412-–418.
[25]
{25} Gamon M. Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis. In: Proceedings of COLING, 2004, p. pp.841.
[26]
{26} Zhang W, Yu C. UIC at TREC 2006 Blog Track. In: Online Proceedings of TREC, 2006.
[27]
{27} Zhang W, Jia L, Yu C, Meng W. Improve the effectiveness of the opinion retrieval and opinion polarity classification. In: Proceedings of CIKM, 2008, pp. pp.1415-–1416.
[28]
{28} Balahur A, Mihalcea R, Montoyo A. Computational approaches to subjectivity and sentiment analysis: Present and envisaged methods and applications. Computer Speech & Language2014; Volume 28 Issue 1: pp.1-–6.
[29]
{29} Balahur A, Turchi M. Comparative experiments using supervised learning and machine translation for multilingual sentiment analysis. Computer Speech and Language2014; Volume 28 Issue 1: pp.56-–75.
[30]
{30} Abdul-Mageed M, Kübler S, Diab M. SAMAR: a system for subjectivity and sentiment analysis of Arabic social media. In: Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis. Jeju, Republic of Korea, 2012, pp. pp.19-–28.
[31]
{31} Orimaye S, Alhashmi S, Siew E. Performance and trends in recent opinion retrieval techniques. Knowledge Engineering Review2015; Volume 30 : pp.76-–105.
[32]
{32} Stone P, Dunphy D, Smith M, Ogilvie D. The General Inquirer: A Computer Approach to Content Analysis. Cambridge, MA: MIT Press, 1966.
[33]
{33} Gangemi A, Presutti A, Recupero DR. Frame-based detection of opinion holders and topics: A model and a tool. IEEE Comp Int Mag2014; Volume 9 Issue 1: pp.20-–30.
[34]
{34} Cambria E, Havasi C, Hussain A. SenticNet 2: A semantic and affective resource for opinion mining and sentiment analysis. In: AAAI FLAIRS, Marco, Island, 2012, pp. pp.202-–207.
[35]
{35} Cambria E. An introduction to concept-level sentiment analysis. In: MICAI, Mexico City, 2013, pp. pp.478-–483.
[36]
{36} Cambria E, Song E, Wang H, Hussain A. Isanette: A common and common sense knowledge base for opinion mining. In: IEEE ICDM, Vancouver, Canada, 2011, pp. pp.315-–322.
[37]
{37} Reforgiato D, Cambria E. ESWC'14 Challenge on Concept-Level Sentiment Analysis. In: Diniz Junqueira Barbosa S, Chen P, Du X. eds Communications in Computer and Information Science. New York, NY: Springer, 2014, pp. pp.3-–20.
[38]
{38} Liu B. Sentiment analysis and subjectivity. In: Indurkhya N, Damerau FJ eds Handbook of Natural Language Processing. 2nd ed. Chapman and Hall/CRC, 2010, pp. pp.627-–666.
[39]
{39} Wiebe J, Wilson T, Bruce R, Bell M, Martin M. Learning subjective language. Computational Linguistics2004; Volume 30 :pp.227-–308.
[40]
{40} Banea C, Mihalcea R, Wiebe J. Sense-level subjectivity in a multilingual setting. Computer Speech & Language2014; Volume 28 : pp.7-–19.
[41]
{41} Banea C, Mihalcea R, Wiebe J. Multilingual subjectivity: are more languages better? In: Proceedings of the International Conference on Computational Linguistics COLING, Beijing, China, 2010, pp. pp.28-–36.
[42]
{42} Banea C, Mihalcea R, Wiebe J, Hassan S. Multilingual subjectivity analysis using machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing EMNLP '08, Morristown, NJ: Association for Computational Linguistics, 2008, pp. pp.127-–135.
[43]
{43} Bader B, Kegelmeyer W, Chew P. Multilingual sentiment analysis using latent semantic indexing and machine learning. In: IEEE 11th International Conference on Data Mining Workshops ICDMW, 2011, pp. pp.45-–52.
[44]
{44} Banea C, Mihalcea R, Wiebe J. Multilingual sentiment and subjectivity. Multilingual Natural Language Processing2011; Volume 6 : pp.1-–19.
[45]
{45} Lu B, Tan C, Cardie C, Tsou BK. Joint bilingual sentiment classification with unlabeled parallel corpora. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, HLT '11. Stroudsburg, PA: Association for Computational Linguistics, 2011, pp. pp.320-–330.
[46]
{46} Prettenhofer P, Stein B. Cross-language text classification using structural correspondence learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, 2010, pp. pp.1118-–1127.
[47]
{47} Hajmohammadi MS, Ibrahim R, Selamat A. Bi-view semi-supervised active learning for cross-lingual sentiment classification. Information Processing & Management2014; Volume 50 : pp.718-–732
[48]
{48} Mihalcea R, Banea C, Wiebe J. Learning multilingual subjective language via cross-lingual projections. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, 2007, pp. pp.976-–983.
[49]
{49} Wan X. Co-training for cross-lingual sentiment classification. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Suntec, Singapore, 2009, pp. pp.235-–243.
[50]
{50} Lafferty J, Zhai C. Document Language Models, Query Models, and Risk Minimization for Information Retrieval. In: Croft WB, Harper DJ, Kraft DH, eds SIGIR. SIGIR '01. New York, NY: ACM, 2001, pp. pp.111-–119.
[51]
{51} Zhai C, Lafferty J. A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems2004; Volume 22 : pp.179-–214.
[52]
{52} Rijsbergen CJV. Information retrieval. London: Butterworths, 1979.
[53]
{53} Shams M. Opinion Mining and Sentiment Analysis in Persian Documents. Master's Thesis. Tehran: University of Tehran, 2012.
[54]
{54} AleAhmad A, Amiri H, Darrudi E, Rahgozar M, Oroumchian F. Hamshahri: A standard Persian text collection. Journal of Knowledge-Based Systems2009; Volume 22 : pp.382-–387.
[55]
{55} Dehdarbehbahani I, Shakery A, Faili H. Semi-supervised word polarity identification in resource-lean languages. Neural Networks Journal2014; Volume 58 : pp.50-–59.
[56]
{56} Tromp E. Multilingual Sentiment Analysis on Social Media. PhD. Thesis. Eindhoven: Eindhoven University of Technology, 2011.
[57]
{57} Ortega R, Fonseca A, Gutiérrez Y, Montoyo A. Improving subjectivity detection using unsupervised subjectivity word sense disambiguation. Procesamiento del Lenguaje Natural2013; Volume 51 : pp.179-–186.
[58]
{58} Miller GA. WordNet: a lexical database for English. Communications of the ACM1995; Volume 38 : pp.39-–41.

Cited By

View all
  • (2021)Online news media website ranking using user-generated contentJournal of Information Science10.1177/016555151989492847:3(340-358)Online publication date: 1-Jun-2021
  • (2020)Mutual information and sensitivity analysis for feature selection in customer targetingJournal of Information Science10.1177/016555151877096745:1(53-67)Online publication date: 18-Jun-2020
  • (2020)Humor Detection in Product Question Answering SystemsProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401077(519-528)Online publication date: 25-Jul-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of Information Science
Journal of Information Science  Volume 43, Issue 3
6 2017
141 pages

Publisher

Sage Publications, Inc.

United States

Publication History

Published: 01 June 2017

Author Tags

  1. Language model
  2. opinion mining
  3. opinion retrieval
  4. sentiment lexicon
  5. subjectivity detection

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Online news media website ranking using user-generated contentJournal of Information Science10.1177/016555151989492847:3(340-358)Online publication date: 1-Jun-2021
  • (2020)Mutual information and sensitivity analysis for feature selection in customer targetingJournal of Information Science10.1177/016555151877096745:1(53-67)Online publication date: 18-Jun-2020
  • (2020)Humor Detection in Product Question Answering SystemsProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401077(519-528)Online publication date: 25-Jul-2020
  • (2019)A survey of sentiment analysis in social mediaKnowledge and Information Systems10.1007/s10115-018-1236-460:2(617-663)Online publication date: 1-Aug-2019

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media