More Web Proxy on the site http://driver.im/

research-article

Semi-supervised correction of biased comment ratings

Authors:

Abhinav Mishra,

Rajeev RastogiAuthors Info & Claims

WWW '12: Proceedings of the 21st international conference on World Wide Web

Pages 181 - 190

https://doi.org/10.1145/2187836.2187862

Published: 16 April 2012 Publication History

Abstract

In many instances, offensive comments on the internet attract a disproportionate number of positive ratings from highly biased users. This results in an undesirable scenario where these offensive comments are the top rated ones. In this paper, we develop semi-supervised learning techniques to correct the bias in user ratings of comments. Our scheme uses a small number of comment labels in conjunction with user rating information to iteratively compute user bias and unbiased ratings for unlabeled comments. We show that the running time of each iteration is linear in the number of ratings, and the system converges to a unique fixed point. To select the comments to label, we devise an active learning algorithm based on empirical risk minimization. Our active learning method incrementally updates the risk for neighboring comments each time a comment is labeled, and thus can easily scale to large comment datasets. On real-life comments from Yahoo! News, our semi-supervised and active learning algorithms achieve higher accuracy than simple baselines, with few labeled examples.

References

[1]

D. Agarwal, B.-C. Chen, and B. Pang. Personalized recommendation of user comments via factor models. In EMNLP, 2011.

Digital Library

[2]

M. Bilgic, L. Mihalkova, and L. Getoor. Active learning for networked data. In ICML, 2010.

[3]

A. Blum and S. Chawla. Learning from labeled and unlabeled data using graph mincuts. In ICML, 2001.

Digital Library

[4]

S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Computer Networks, 30(1--7):107--117, 1998.

Digital Library

[5]

N. Cesa-Bianchi, C. Gentile, F. Vitale, and G. Zappella. Active learning on trees and graphs. In COLT, 2010.

[6]

B.-C. Chen, J. Guo, B. Tseng, and J. Yang. User reputation in a comment rating environment. In KDD, 2011.

Digital Library

[7]

N. Diakopoulos and M. Naaman. Topicality, time and sentiment in online news comments. In CHI EA '11, 2011.

Digital Library

[8]

N. Diakopoulos and M. Naaman. Towards quality discourse in online news comments. In CSCW, 2011.

Digital Library

[9]

A. Guillory and J. Bilmes. Label Selection on Graphs. In NIPS. 2009.

[10]

C.-F. Hsu, E. Khabiri, and J. Caverlee. Ranking comments on the social web. In ICCSE, 2009.

Digital Library

[11]

S. Kiesler, J. Siegel, and W. McGuire, Timothy. Computer-supported cooperative work: a book of readings. chapter Social psychological aspects of computer-mediated communication (Reprint), pages 657--682. Morgan Kaufmann Publishers Inc., 1988.

Digital Library

[12]

J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604--632, 1999.

Digital Library

[13]

Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In KDD, 2008.

Digital Library

[14]

C. Lampe and P. Resnick. Slash(dot) and burn: distributed moderation in a large online conversation space. In CHI, 2004.

Digital Library

[15]

C. A. Lampe, E. Johnston, and P. Resnick. Follow the reader: filtering comments on slashdot. In CHI, 2007.

Digital Library

[16]

H. W. Lauw, E.-P. Lim, and K. Wang. Bias and controversy: beyond the statistical deviation. In KDD, 2006.

Digital Library

[17]

H. W. Lauw, E.-P. Lim, and K. Wang. Summarizing review scores of "unequal" reviewers. In SDM, 2007.

[18]

S. A. Macskassy. Using graph-based metrics with empirical risk minimization to speed up active learning on networked data. In KDD, 2009.

Digital Library

[19]

K. Purcell, L. Purcell, A. Mitchell, T. Rosenstiel, and K. Olmstead. Understanding the participatory news consumer. Pew Internet and American Life Project, 2010.

[20]

N. Roy and A. McCallum. Toward optimal active learning through sampling estimation of error reduction. In ICML, 2001.

Digital Library

[21]

B. Settles. Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin-Madison, 2009.

[22]

S. Siersdorfer, S. Chelaru, W. Nejdl, and J. S. Pedro. How useful are your comments?: analyzing and predicting youtube comments and comment ratings. In WWW, 2010.

Digital Library

[23]

G. W. Stewart. Matrix Algorithms: Volume 1, Basic Decompositions. Society for Industrial Mathematics, 1998.

[24]

D. M. Strong, Y. W. Lee, and R. Y. Wang. Data quality in context. Commun. ACM, 40:103--110, May 1997.

Digital Library

[25]

V. G. V. Vydiswaran, C. Zhai, and D. Roth. Content-driven trust propagation framework. In KDD, 2011.

Digital Library

[26]

X. Yin, J. Han, and P. S. Yu. Truth discovery with multiple conflicting information providers on the web. IEEE Trans. Knowl. Data Eng., 20(6):796--808, 2008.

Digital Library

[27]

A. X. Zheng, A. Y. Ng, and M. I. Jordan. Stable algorithms for link analysis. In SIGIR, 2001.

Digital Library

[28]

X. Zhu, Z. Ghahramani, and J. D. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In ICML, 2003.

Digital Library

[29]

X. Zhu, J. Lafferty, and Z. Ghahramani. Combining active learning and semi-supervised learning using gaussian fields and harmonic functions. In ICML 2003 workshop on The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, pages 58--65, 2003.

Cited By

Balayn AYang JSzlavik ZBozzon A(2021)Automatic Identification of Harmful, Aggressive, Abusive, and Offensive Language on the Web: A Survey of Technical Biases Informed by Psychology LiteratureACM Transactions on Social Computing10.1145/34791584:3(1-56)Online publication date: 25-Oct-2021
https://dl.acm.org/doi/10.1145/3479158
Xie SHu QZhang JYu P(2015)An effective and economic bi-level approach to ranking and rating spam detection2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA)10.1109/DSAA.2015.7344794(1-10)Online publication date: Oct-2015
https://doi.org/10.1109/DSAA.2015.7344794
Siersdorfer SChelaru SPedro JAltingovde INejdl W(2014)Analyzing and Mining Comments and Comment Ratings on the Social WebACM Transactions on the Web10.1145/26284418:3(1-39)Online publication date: 8-Jul-2014
https://dl.acm.org/doi/10.1145/2628441
Show More Cited By

Index Terms

Semi-supervised correction of biased comment ratings
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Combining active learning and semi-supervised for improving learning performance
ISABEL '11: Proceedings of the 4th International Symposium on Applied Sciences in Biomedical and Communication Technologies

In many learning tasks, there are abundant unlabeled samples but the number of labeled training samples is limited, because labeling the samples requires the efforts of human annotators and expertise. There are three major techniques for labeling the ...
Consistency-Based Semi-supervised Active Learning: Towards Minimizing Labeling Cost
Computer Vision – ECCV 2020
Abstract
Active learning (AL) combines data labeling and model training to minimize the labeling cost by prioritizing the selection of high value data that can best improve model performance. In pool-based active learning, accessible unlabeled data are not ...
Semi-supervised Dictionary Active Learning for Pattern Classification
Pattern Recognition and Computer Vision
Abstract
Gathering labeled data is one of the most time-consuming and expensive tasks in supervised machine learning. In practical applications, there are usually quite limited labeled training samples but abundant unlabeled data that is easy to collect. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '12: Proceedings of the 21st international conference on World Wide Web

April 2012

1078 pages

ISBN:9781450312295

DOI:10.1145/2187836

General Chairs:
Alain Mille
Université de Lyon, France
,
Fabien Gandon
INRIA, France
,
Jacques Misselis
HP, France
,
Program Chairs:
Michael Rabinovich
Case Western Reserve University, USA
,
Steffen Staab
University of Koblenz-Landau, Germany

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Univ. de Lyon: Universite de Lyon

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 April 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW 2012

Sponsor:

Univ. de Lyon

WWW 2012: 21st World Wide Web Conference 2012

April 16 - 20, 2012

Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
383
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 23 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Balayn AYang JSzlavik ZBozzon A(2021)Automatic Identification of Harmful, Aggressive, Abusive, and Offensive Language on the Web: A Survey of Technical Biases Informed by Psychology LiteratureACM Transactions on Social Computing10.1145/34791584:3(1-56)Online publication date: 25-Oct-2021
https://dl.acm.org/doi/10.1145/3479158
Xie SHu QZhang JYu P(2015)An effective and economic bi-level approach to ranking and rating spam detection2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA)10.1109/DSAA.2015.7344794(1-10)Online publication date: Oct-2015
https://doi.org/10.1109/DSAA.2015.7344794
Siersdorfer SChelaru SPedro JAltingovde INejdl W(2014)Analyzing and Mining Comments and Comment Ratings on the Social WebACM Transactions on the Web10.1145/26284418:3(1-39)Online publication date: 8-Jul-2014
https://dl.acm.org/doi/10.1145/2628441
Kantchelian AMa JHuang LAfroz SJoseph ATygar JYu TVenkatakrishan VKapadia A(2012)Robust detection of comment spam using entropy rateProceedings of the 5th ACM workshop on Security and artificial intelligence10.1145/2381896.2381907(59-70)Online publication date: 19-Oct-2012
https://dl.acm.org/doi/10.1145/2381896.2381907

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents