More Web Proxy on the site http://driver.im/

research-article

Free access

Developing robust models for favourability analysis

Authors:

Paul HenderAuthors Info & Claims

WASSA '11: Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis

Pages 44 - 52

Published: 24 June 2011 Publication History

Abstract

Locating documents carrying positive or negative favourability is an important application within media analysis. This paper presents some empirical results on the challenges facing a machine-learning approach to this kind of opinion mining. Some of the challenges include: the often considerable imbalance in the distribution of positive and negative samples; changes in the documents over time; and effective training and quantification procedures for reporting results. This paper begins with three datasets generated by a media-analysis company, classifying documents in two ways: detecting the presence of favourability, and assessing negative vs. positive favourability. We then evaluate a machine-learning approach to automate the classification process. We explore the effect of using five different types of features, the robustness of the models when tested on data taken from a later time period, and the effect of balancing the input data by undersampling. We find varying choices for the optimum classifier, feature set and training strategy depending on the task and dataset.

References

[1]

A. Balahur, R. Steinberger, M. Kabadjov, V. Zavarella, E. Van Der Goot, M. Halkia, B. Pouliquen, and J. Belyaeva. 2010. Sentiment analysis in the news. In Proceedings of LREC.

[2]

A. L. Blum and P. Langley. 1997. Selection of relevant features and examples in machine learning. Artificial intelligence, 97:245--271.

Digital Library

[3]

N. V. Chawla, N. Japkowicz, and A. Kotcz. 2004. Editorial: special issue on learning from imbalanced data sets. ACM SIGKDD Explorations Newsletter, 6:1--6.

Digital Library

[4]

G. Forman. 2003. An extensive empirical study of feature selection metrics for text classification. The Journal of Machine Learning Research, 3:1289--1305.

Digital Library

[5]

M. Gamon, A. Aue, S. Corston-Oliver, and E. Ringger. 2005. Pulse: Mining customer opinions from free text. Advances in Intelligent Data Analysis VI, pages 121--132.

Digital Library

[6]

N. Godbole, M. Srinivasaiah, and S. Skiena. 2007. Large-scale sentiment analysis for news and blogs. In Proceedings of the International Conference on Weblogs and Social Media (ICWSM).

[7]

P. D. Green, P. C. R. Lane, A. W. Rainer, and S. Scholz. 2010. Selecting measures in origin analysis. In Proceedings of AI-2010, The Thirtieth SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence, pages 379--392.

[8]

M. Koppel and J. Schler. 2006. The importance of neutral examples for learning sentiment. Computational Intelligence, 22:100--109.

[9]

K. Krippendorff. 2004. Content analysis: An introduction to its methodology. Sage Publications, Inc.

[10]

M. Kubat, R. C. Holte, and S. Matwin. 1998. Machine learning for the detection of oil spills in satellite radar images. Machine Learning, 30:195--215.

Digital Library

[11]

P. C. R. Lane, C. Lyon, and J. A. Malcolm. 2006. Demonstration of the Ferret plagiarism detector. In Proceedings of the 2nd International Plagiarism Conference.

[12]

T. Li, V. Sindhwani, C. Ding, and Y. Zhang. 2009. Knowledge transformation for cross-domain sentiment classification. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 716--717. ACM.

Digital Library

[13]

P. Melville, W. Gryc, and R. D. Lawrence. 2009. Sentiment analysis of blogs by combining lexical knowledge with text classification. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '09, pages 1275--1284, New York, NY, USA. ACM.

Digital Library

[14]

D. Mladenić. 1998. Feature subset selection in text-learning. Machine Learning: ECML-98, pages 95--100.

Digital Library

[15]

T. Mullen and N. Collier. 2004. Sentiment analysis using support vector machines with diverse information sources. In Proceedings of EMNLP, volume 4, pages 412--418.

[16]

B. Pang and L. Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1-2):1--135.

Digital Library

[17]

B. Pang, L. Lee, and S. Vaithyanathan. 2002. Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10, pages 79--86. Association for Computational Linguistics.

Digital Library

[18]

R. Prabowo and M. Thelwall. 2009. Sentiment analysis: A combined approach. Journal of Informetrics, 3:143--157.

[19]

M. Rogati and Y. Yang. 2002. High-performing feature selection for text classification. In Proceedings of the eleventh international conference on Information and knowledge management, pages 659--661. ACM.

Digital Library

[20]

G. Tatzl and C. Waldhauser. 2010. Aggregating opinions: Explorations into Graphs and Media Content Analysis. ACL 2010, page 93.

Digital Library

[21]

P. D. Turney. 2002. Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 417--424. Association for Computational Linguistics.

Digital Library

[22]

X. Wan. 2009. Co-training for cross-lingual sentiment classification. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1-Volume 1, pages 235--243. Association for Computational Linguistics.

Digital Library

[23]

I. H. Witten and E. Frank. 2005. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann.

Digital Library

Cited By

Awais DShoaib D(2019)Role of Discourse Information in Urdu Sentiment ClassificationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/330005018:4(1-37)Online publication date: 21-May-2019
https://dl.acm.org/doi/10.1145/3300050

Index Terms

Developing robust models for favourability analysis
1. Applied computing
  1. Arts and humanities
    1. Language translation
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

On developing robust models for favourability analysis: Model choice, feature sets and imbalanced data

Locating documents carrying positive or negative favourability is an important application within media analysis. This article presents some empirical results on the challenges facing a machine-learning approach to this kind of opinion mining. Some of ...
Robust supervised classification with mixture models: Learning from data with uncertain labels

In the supervised classification framework, human supervision is required for labeling a set of learning data which are then used for building the classifier. However, in many applications, human supervision is either imprecise, difficult or expensive. ...
Developing empirically based student personality profiles for affective feedback models
ITS'10: Proceedings of the 10th international conference on Intelligent Tutoring Systems - Volume Part I

The impact of affect on learning has been the subject of increasing attention. Because of the differential effects of students' affective states on learning outcomes, there is a growing recognition of the important role that intelligent tutoring systems ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings

WASSA '11: Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis

June 2011

207 pages

ISBN:9781937284060

Program Chairs:
Alexandra Balahur
University of Alicante, Spain
,
Ester Boldrini
University of Alicante, Spain
,
Andrés Montoyo
University of Alicante, Spain
,
Patricio Martínez-Barco
University of Alicante, Spain

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 24 June 2011

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
129
Total Downloads

Downloads (Last 12 months)35
Downloads (Last 6 weeks)4

Reflects downloads up to 14 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Awais DShoaib D(2019)Role of Discourse Information in Urdu Sentiment ClassificationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/330005018:4(1-37)Online publication date: 21-May-2019
https://dl.acm.org/doi/10.1145/3300050

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents