[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2187836.2187862acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Semi-supervised correction of biased comment ratings

Published: 16 April 2012 Publication History

Abstract

In many instances, offensive comments on the internet attract a disproportionate number of positive ratings from highly biased users. This results in an undesirable scenario where these offensive comments are the top rated ones. In this paper, we develop semi-supervised learning techniques to correct the bias in user ratings of comments. Our scheme uses a small number of comment labels in conjunction with user rating information to iteratively compute user bias and unbiased ratings for unlabeled comments. We show that the running time of each iteration is linear in the number of ratings, and the system converges to a unique fixed point. To select the comments to label, we devise an active learning algorithm based on empirical risk minimization. Our active learning method incrementally updates the risk for neighboring comments each time a comment is labeled, and thus can easily scale to large comment datasets. On real-life comments from Yahoo! News, our semi-supervised and active learning algorithms achieve higher accuracy than simple baselines, with few labeled examples.

References

[1]
D. Agarwal, B.-C. Chen, and B. Pang. Personalized recommendation of user comments via factor models. In EMNLP, 2011.
[2]
M. Bilgic, L. Mihalkova, and L. Getoor. Active learning for networked data. In ICML, 2010.
[3]
A. Blum and S. Chawla. Learning from labeled and unlabeled data using graph mincuts. In ICML, 2001.
[4]
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Computer Networks, 30(1--7):107--117, 1998.
[5]
N. Cesa-Bianchi, C. Gentile, F. Vitale, and G. Zappella. Active learning on trees and graphs. In COLT, 2010.
[6]
B.-C. Chen, J. Guo, B. Tseng, and J. Yang. User reputation in a comment rating environment. In KDD, 2011.
[7]
N. Diakopoulos and M. Naaman. Topicality, time and sentiment in online news comments. In CHI EA '11, 2011.
[8]
N. Diakopoulos and M. Naaman. Towards quality discourse in online news comments. In CSCW, 2011.
[9]
A. Guillory and J. Bilmes. Label Selection on Graphs. In NIPS. 2009.
[10]
C.-F. Hsu, E. Khabiri, and J. Caverlee. Ranking comments on the social web. In ICCSE, 2009.
[11]
S. Kiesler, J. Siegel, and W. McGuire, Timothy. Computer-supported cooperative work: a book of readings. chapter Social psychological aspects of computer-mediated communication (Reprint), pages 657--682. Morgan Kaufmann Publishers Inc., 1988.
[12]
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604--632, 1999.
[13]
Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In KDD, 2008.
[14]
C. Lampe and P. Resnick. Slash(dot) and burn: distributed moderation in a large online conversation space. In CHI, 2004.
[15]
C. A. Lampe, E. Johnston, and P. Resnick. Follow the reader: filtering comments on slashdot. In CHI, 2007.
[16]
H. W. Lauw, E.-P. Lim, and K. Wang. Bias and controversy: beyond the statistical deviation. In KDD, 2006.
[17]
H. W. Lauw, E.-P. Lim, and K. Wang. Summarizing review scores of "unequal" reviewers. In SDM, 2007.
[18]
S. A. Macskassy. Using graph-based metrics with empirical risk minimization to speed up active learning on networked data. In KDD, 2009.
[19]
K. Purcell, L. Purcell, A. Mitchell, T. Rosenstiel, and K. Olmstead. Understanding the participatory news consumer. Pew Internet and American Life Project, 2010.
[20]
N. Roy and A. McCallum. Toward optimal active learning through sampling estimation of error reduction. In ICML, 2001.
[21]
B. Settles. Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin-Madison, 2009.
[22]
S. Siersdorfer, S. Chelaru, W. Nejdl, and J. S. Pedro. How useful are your comments?: analyzing and predicting youtube comments and comment ratings. In WWW, 2010.
[23]
G. W. Stewart. Matrix Algorithms: Volume 1, Basic Decompositions. Society for Industrial Mathematics, 1998.
[24]
D. M. Strong, Y. W. Lee, and R. Y. Wang. Data quality in context. Commun. ACM, 40:103--110, May 1997.
[25]
V. G. V. Vydiswaran, C. Zhai, and D. Roth. Content-driven trust propagation framework. In KDD, 2011.
[26]
X. Yin, J. Han, and P. S. Yu. Truth discovery with multiple conflicting information providers on the web. IEEE Trans. Knowl. Data Eng., 20(6):796--808, 2008.
[27]
A. X. Zheng, A. Y. Ng, and M. I. Jordan. Stable algorithms for link analysis. In SIGIR, 2001.
[28]
X. Zhu, Z. Ghahramani, and J. D. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In ICML, 2003.
[29]
X. Zhu, J. Lafferty, and Z. Ghahramani. Combining active learning and semi-supervised learning using gaussian fields and harmonic functions. In ICML 2003 workshop on The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, pages 58--65, 2003.

Cited By

View all
  • (2021)Automatic Identification of Harmful, Aggressive, Abusive, and Offensive Language on the Web: A Survey of Technical Biases Informed by Psychology LiteratureACM Transactions on Social Computing10.1145/34791584:3(1-56)Online publication date: 25-Oct-2021
  • (2015)An effective and economic bi-level approach to ranking and rating spam detection2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA)10.1109/DSAA.2015.7344794(1-10)Online publication date: Oct-2015
  • (2014)Analyzing and Mining Comments and Comment Ratings on the Social WebACM Transactions on the Web10.1145/26284418:3(1-39)Online publication date: 8-Jul-2014
  • Show More Cited By

Index Terms

  1. Semi-supervised correction of biased comment ratings

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '12: Proceedings of the 21st international conference on World Wide Web
    April 2012
    1078 pages
    ISBN:9781450312295
    DOI:10.1145/2187836
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • Univ. de Lyon: Universite de Lyon

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 April 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. active learning
    2. bias
    3. iterative technique
    4. semi-supervised learning

    Qualifiers

    • Research-article

    Conference

    WWW 2012
    Sponsor:
    • Univ. de Lyon
    WWW 2012: 21st World Wide Web Conference 2012
    April 16 - 20, 2012
    Lyon, France

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 23 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Automatic Identification of Harmful, Aggressive, Abusive, and Offensive Language on the Web: A Survey of Technical Biases Informed by Psychology LiteratureACM Transactions on Social Computing10.1145/34791584:3(1-56)Online publication date: 25-Oct-2021
    • (2015)An effective and economic bi-level approach to ranking and rating spam detection2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA)10.1109/DSAA.2015.7344794(1-10)Online publication date: Oct-2015
    • (2014)Analyzing and Mining Comments and Comment Ratings on the Social WebACM Transactions on the Web10.1145/26284418:3(1-39)Online publication date: 8-Jul-2014
    • (2012)Robust detection of comment spam using entropy rateProceedings of the 5th ACM workshop on Security and artificial intelligence10.1145/2381896.2381907(59-70)Online publication date: 19-Oct-2012

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media