[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1651587.1651596acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Automatic seed set expansion for trust propagation based anti-spamming algorithms

Published: 02 November 2009 Publication History

Abstract

Seed sets are of significant importance for trust propagation based anti-spamming algorithms, e.g., TrustRank. Conventional approaches require manual evaluation to construct a seed set, which restricts the seed set to be small in size, since it would cost too much and may even be impossible to construct a very large seed set manually. The small-sized seed set can cause detrimental effect on the final ranking results. Thus, it is desirable to automatically expand an initial seed set to a much larger one. In this paper, we propose the first automatic seed set expansion algorithm (ASE), which expands a small seed set by selecting reputable seeds that are found and guaranteed to be reputable through a joint recommendation link structure. Experimental results on the WEBSPAM-2007 dataset show that with the same manual evaluation efforts, ASE can automatically obtain a large number of reputable seeds with high precision, thus significantly improving the performance of the baseline algorithm in terms of both reputable site promotion and spam site demotion.

References

[1]
L. Adamic and E. Adar. Friends and neighbors on the web. Social Networks, 25(3): 211--230. 2003.
[2]
L. Becchetti, C. Castillo, D. Donatol, S. Leonardi, and R. Baeza-Yates. Using rank propagation and probabilistic counting for link-based spam detection. In Proc. of WebKDD, 2006.
[3]
A. A. Benczur, K. Csalogany, T. Sarlos, and M. Uher. SpamRank-fully automatic link spam detection. In Workshop of AIRWeb, pages 25--38, 2005.
[4]
P. Berkhin. A survey on PageRank computing. Internet Mathematics, 2(1): 73--120, 2005.
[5]
P. A. Chirita, J. Diederich, and W. Nejdl. MailRank: using ranking for spam detection. In Proc. of CIKM, pages 373--380, 2005.
[6]
N. Dai, B.D. Davison and X. Qi. Looking into the past to better classify web spam. In Proc. of AIRWeb, pages 1--8, 2009.
[7]
Z. Gyongyi, P. Berkhin, H. Garcia-Molina, and J. Pedersen. Link spam detection based on mass estimation. In Proc. of VLDB, pages 439--450, 2006.
[8]
Z. Gyongyi, H. Garcia-Molina. Web spam taxonomy. In Workshop of AIRWeb, pages 39--47, 2005.
[9]
Z. Gyongyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with TrustRank. In Proc. of VLDB, pages 576--587, 2004.
[10]
M. R. Henzinger, R. Motwani and C. Silverstein. Challenges in web search engines. SIGIR Forum, 36(1): 11--22, 2002.
[11]
Q. Jiang, L. Zhang, Y.Z. Zhu, and Y. Zhang. Larger is better: seed selection in link-based anti-spamming algorithm. In Proc. of WWW Poster, pages 1065--1066, 2008.
[12]
V. Krishnan and R. Raj. Web spam detection with anti-trust rank. In Workshop of AIRWeb, pages 37--40, 2006.
[13]
Y.T. Liu, B. Gao, T.Y. Liu, Y. Zhang, Z. M. Ma, S. Y. He, and H. Li. BrowseRank: letting web users vote for page importance. In Proc. of SIGIR. pages 451--458, 2008
[14]
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: bringing order to the web. Technical Report, Stanford University, 1998.
[15]
B. N. Wu, V. Goel, and B. D. Davison, Topical TrustRank: using topicality to combat web spam. In Proc. of WWW, pages 63--72, 2006.
[16]
B. N. Wu and B. D. Davison. Identifying link farm pages. In Proc. of WWW, pages 820--829, 2005.
[17]
B. N. Wu and K. Chellapilla. Extracting link spam using biased random walks from spam seed set. In Workshop of AIRWeb, 2007.
[18]
B. N. Wu, V. Goel, and B. D. Davison. Propagating trust and distrust to demote web spam. In Proc. of MTW, 2006.
[19]
H. X. Yang, I. King and M. R. Lyu. DiffusionRank: a possible penicillin for web spamming. In Proc. of SIGIR, pages 431--438, 2007.
[20]
B. Zhou, J. Pei, and Z. Tang. A spamicity approach to web spam detection. In Proc. of SDM, pages 24--26, 2008.
[21]
Yahoo! Research: "Web Spam Collections". http://barcelona.research.yahoo.net/WEBSPAM-2007/datasets/ Crawled by the Laboratory of Web Algorithmics, University of Milan, http://law.dsi.unimi.it/. URLs retrieved 05, 2007.

Cited By

View all

Index Terms

  1. Automatic seed set expansion for trust propagation based anti-spamming algorithms

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WIDM '09: Proceedings of the eleventh international workshop on Web information and data management
    November 2009
    104 pages
    ISBN:9781605588087
    DOI:10.1145/1651587
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 November 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. link analysis
    2. seed expansion
    3. seed selection
    4. spam
    5. trustrank

    Qualifiers

    • Research-article

    Conference

    CIKM '09
    Sponsor:

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 11 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)Distrust seed set propagation algorithm to detect web spamJournal of Intelligent Information Systems10.1007/s10844-016-0439-y49:2(213-235)Online publication date: 28-Dec-2018
    • (2017)Privacy-preserving trust management for unwanted traffic controlFuture Generation Computer Systems10.1016/j.future.2016.06.03672:C(305-318)Online publication date: 1-Jul-2017
    • (2016)Unwanted Traffic Detection and Control Based on Trust ManagementInformation Fusion for Cyber-Security Analytics10.1007/978-3-319-44257-0_4(77-109)Online publication date: 22-Oct-2016
    • (2015)PSNControllerACM Transactions on Multimedia Computing, Communications, and Applications10.1145/280820612:1s(1-23)Online publication date: 21-Oct-2015
    • (2015)TruSMSFuture Generation Computer Systems10.1016/j.future.2014.06.01049:C(77-93)Online publication date: 1-Aug-2015
    • (2014)Trust Management for Unwanted Traffic ControlTrust Management in Mobile Environments10.4018/978-1-4666-4765-7.ch005(94-129)Online publication date: 2014
    • (2014)Propagating Both Trust and Distrust with Target Differentiation for Combating Link-Based Web SpamACM Transactions on the Web10.1145/26284408:3(1-33)Online publication date: 8-Jul-2014
    • (2014)Query recommendation in the information domain of childrenJournal of the Association for Information Science and Technology10.1002/asi.2305565:7(1368-1384)Online publication date: 1-Jul-2014
    • (2013)Unwanted Content Control via Trust Management in Pervasive Social NetworkingProceedings of the 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications10.1109/TrustCom.2013.29(202-209)Online publication date: 16-Jul-2013
    • (2013)Implementation of an SMS Spam Control System Based on Trust ManagementProceedings of the 2013 IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing10.1109/GreenCom-iThings-CPSCom.2013.155(887-894)Online publication date: 20-Aug-2013
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media