[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/2107653.2107659dlproceedingsArticle/Chapter ViewAbstractPublication PageswassaConference Proceedingsconference-collections
research-article
Free access

Developing robust models for favourability analysis

Published: 24 June 2011 Publication History

Abstract

Locating documents carrying positive or negative favourability is an important application within media analysis. This paper presents some empirical results on the challenges facing a machine-learning approach to this kind of opinion mining. Some of the challenges include: the often considerable imbalance in the distribution of positive and negative samples; changes in the documents over time; and effective training and quantification procedures for reporting results. This paper begins with three datasets generated by a media-analysis company, classifying documents in two ways: detecting the presence of favourability, and assessing negative vs. positive favourability. We then evaluate a machine-learning approach to automate the classification process. We explore the effect of using five different types of features, the robustness of the models when tested on data taken from a later time period, and the effect of balancing the input data by undersampling. We find varying choices for the optimum classifier, feature set and training strategy depending on the task and dataset.

References

[1]
A. Balahur, R. Steinberger, M. Kabadjov, V. Zavarella, E. Van Der Goot, M. Halkia, B. Pouliquen, and J. Belyaeva. 2010. Sentiment analysis in the news. In Proceedings of LREC.
[2]
A. L. Blum and P. Langley. 1997. Selection of relevant features and examples in machine learning. Artificial intelligence, 97:245--271.
[3]
N. V. Chawla, N. Japkowicz, and A. Kotcz. 2004. Editorial: special issue on learning from imbalanced data sets. ACM SIGKDD Explorations Newsletter, 6:1--6.
[4]
G. Forman. 2003. An extensive empirical study of feature selection metrics for text classification. The Journal of Machine Learning Research, 3:1289--1305.
[5]
M. Gamon, A. Aue, S. Corston-Oliver, and E. Ringger. 2005. Pulse: Mining customer opinions from free text. Advances in Intelligent Data Analysis VI, pages 121--132.
[6]
N. Godbole, M. Srinivasaiah, and S. Skiena. 2007. Large-scale sentiment analysis for news and blogs. In Proceedings of the International Conference on Weblogs and Social Media (ICWSM).
[7]
P. D. Green, P. C. R. Lane, A. W. Rainer, and S. Scholz. 2010. Selecting measures in origin analysis. In Proceedings of AI-2010, The Thirtieth SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence, pages 379--392.
[8]
M. Koppel and J. Schler. 2006. The importance of neutral examples for learning sentiment. Computational Intelligence, 22:100--109.
[9]
K. Krippendorff. 2004. Content analysis: An introduction to its methodology. Sage Publications, Inc.
[10]
M. Kubat, R. C. Holte, and S. Matwin. 1998. Machine learning for the detection of oil spills in satellite radar images. Machine Learning, 30:195--215.
[11]
P. C. R. Lane, C. Lyon, and J. A. Malcolm. 2006. Demonstration of the Ferret plagiarism detector. In Proceedings of the 2nd International Plagiarism Conference.
[12]
T. Li, V. Sindhwani, C. Ding, and Y. Zhang. 2009. Knowledge transformation for cross-domain sentiment classification. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 716--717. ACM.
[13]
P. Melville, W. Gryc, and R. D. Lawrence. 2009. Sentiment analysis of blogs by combining lexical knowledge with text classification. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '09, pages 1275--1284, New York, NY, USA. ACM.
[14]
D. Mladenić. 1998. Feature subset selection in text-learning. Machine Learning: ECML-98, pages 95--100.
[15]
T. Mullen and N. Collier. 2004. Sentiment analysis using support vector machines with diverse information sources. In Proceedings of EMNLP, volume 4, pages 412--418.
[16]
B. Pang and L. Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1-2):1--135.
[17]
B. Pang, L. Lee, and S. Vaithyanathan. 2002. Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10, pages 79--86. Association for Computational Linguistics.
[18]
R. Prabowo and M. Thelwall. 2009. Sentiment analysis: A combined approach. Journal of Informetrics, 3:143--157.
[19]
M. Rogati and Y. Yang. 2002. High-performing feature selection for text classification. In Proceedings of the eleventh international conference on Information and knowledge management, pages 659--661. ACM.
[20]
G. Tatzl and C. Waldhauser. 2010. Aggregating opinions: Explorations into Graphs and Media Content Analysis. ACL 2010, page 93.
[21]
P. D. Turney. 2002. Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 417--424. Association for Computational Linguistics.
[22]
X. Wan. 2009. Co-training for cross-lingual sentiment classification. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1-Volume 1, pages 235--243. Association for Computational Linguistics.
[23]
I. H. Witten and E. Frank. 2005. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann.

Cited By

View all
  • (2019)Role of Discourse Information in Urdu Sentiment ClassificationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/330005018:4(1-37)Online publication date: 21-May-2019

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
WASSA '11: Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis
June 2011
207 pages
ISBN:9781937284060

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 24 June 2011

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)35
  • Downloads (Last 6 weeks)4
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Role of Discourse Information in Urdu Sentiment ClassificationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/330005018:4(1-37)Online publication date: 21-May-2019

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media