[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1244408.1244424acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesiea-aeiConference Proceedingsconference-collections
Article

Web spam detection via commercial intent analysis

Published: 08 May 2007 Publication History

Abstract

We propose a number of features for Web spam filtering based on the occurrence of keywords that are either of high advertisement value or highly spammed. Our features include popular words from search engine query logs as well as high cost or volume words according to Google AdWords. We also demonstrate the spam filtering power of the Online Commercial Intention (OCI) value assigned to an URL in a Microsoft adCenter Labs Demonstration and the Yahoo! Mindset classification of Web pages as either commercial or non-commercial as well as metrics based on the occurrence of Google ads on the page. We run our tests on the WEBSPAM-UK2006 dataset recently compiled by Castillo et al. as a standard means of measuring the performance of Web spam detection algorithms. Our features improve the classification accuracy of the publicly available WEBSPAM-UK2006 features by 3%.

References

[1]
A. A. Benczúr, K. Csalogány, E. Friedman, D. Fogars, T. Sarlós, M. Uher, and E. Windhager. Searching a small national domain---preliminary report. In Proc. WWW, 2003.
[2]
A. A. Benczúr, K. Csalogány, and T. Sarlós. Link-based similarity search to fight web spam. In Proc. AIRWeb, 2006.
[3]
A. A. Benczúr, K. Csalogány, and T. Sarlós, and M. Uher. SpamRank -- Fully automatic link spam detection. In Proc. AIRWeb, 2005.
[4]
A. Z. Broder. taxonomy of web search. SIGIR Forum, 36(2):3--10, 2002.
[5]
C. Castillo, D. Donato, L. Becchetti, P. Boldi, S. Leonardi, M. Santini, and S. Vigna. A reference collection for web spam. SIGIR Forum, 40(2), 2006.
[6]
C. Castillo, D. Donato, A. Gionis, V. Murdock, and F. Silvestri. Know your neighbors: Web spam detection using the web topology. DELIS Technical report TR-0458, 2006.
[7]
K. Chellapilla and D. M. Chickering. Improving cloaking detection using search query popularity and monetaizability. In Proc. AIRWeb, pages 17--24, 2006.
[8]
H. K. Dai, L. Zhao, Z. Nie, J.-R. Wen, L. Wang, and Y. Li. Detecting online commercial intention (OCI). In Proc. WWW, pages 829--837, 2006.
[9]
I. Drost and T. Scheffer. Thwarting the nigritude ultramarine: Learning to identify link spam. In Proc. ECML, volume 3720 of LNAI, pages 233--243, 2005.
[10]
N. Eiron, K. S. McCurley, and J. A. Tomlin. Ranking the web frontier. In Proc. WWW, pages 309--318, 2004.
[11]
R. Fagin, R. Kumar, K. S. McCurley, J. Novak, D. Sivakumar, J. A. Tomlin, and D. P. Williamson. Searching the workplace web. In Proc. WWW, 2003.
[12]
D. Fetterly, M. Manasse, and M. Najork. Spam, damn spam, and statistics - Using statistical analysis to locate spam web pages. In Proc. WebDB, 2004.
[13]
Z. Gyöngyi and H. Garcia-Molina. Spam: It's not just for inboxes anymore. IEEE Computer Magazine, 38(10):28--34, 2005.
[14]
Z. Gyöngyi and H. Garcia-Molina. Web spam taxonomy. In Proc. AIRWeb, 2005.
[15]
Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with Trust Rank. In Proc. VLDB, pages 576--587, 2004.
[16]
M. R. Henzinger, R. Motwani, and C. Silverstein. Challenges in web search engines. SIGIR Forum, 36(2):11--22, 2002.
[17]
A. Ntoulas, M. Najork, M. Manasse, and D. Fetterly. Detecting spam web pages through content analysis. In Proc. WWW, pages 83--92, 2006.
[18]
Y.-M. Wang, M. Ma, Y. Niu, and H. Chen. Spam double-funnel: Connecting web spammers with advertisers. In Proc. WWW, 2007.
[19]
I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques, Second edition. Morgan Kaufmann, 2005.
[20]
B. Wu, V. Goel, and B. D. Davison. Propagating trust and distrust to demote web spam. In Workshop on Models of Trust for the Web, 2006.

Cited By

View all
  • (2022)A Study on Types, Classification and Mechanism for Optimization in Search EngineInternational Journal of Advanced Research in Science, Communication and Technology10.48175/IJARSCT-3260(278-284)Online publication date: 23-Apr-2022
  • (2022)Towards Forecasting Internet Financial Frauds based on Advertising2022 8th International Conference on Big Data and Information Analytics (BigDIA)10.1109/BigDIA56350.2022.9874049(5-11)Online publication date: 24-Aug-2022
  • (2021)An Improved Framework for Content- and Link-Based Web-Spam Detection: A Combined ApproachComplexity10.1155/2021/66257392021(1-18)Online publication date: 15-Nov-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
AIRWeb '07: Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
May 2007
98 pages
ISBN:9781595937322
DOI:10.1145/1244408
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 May 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. commercial intent
  2. query monetizability
  3. query popularity

Qualifiers

  • Article

Conference

AIRWeb'07

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)A Study on Types, Classification and Mechanism for Optimization in Search EngineInternational Journal of Advanced Research in Science, Communication and Technology10.48175/IJARSCT-3260(278-284)Online publication date: 23-Apr-2022
  • (2022)Towards Forecasting Internet Financial Frauds based on Advertising2022 8th International Conference on Big Data and Information Analytics (BigDIA)10.1109/BigDIA56350.2022.9874049(5-11)Online publication date: 24-Aug-2022
  • (2021)An Improved Framework for Content- and Link-Based Web-Spam Detection: A Combined ApproachComplexity10.1155/2021/66257392021(1-18)Online publication date: 15-Nov-2021
  • (2020)Taxonomy of Link Based web Spammers using Mining Optimized PageRank Algorithm for e-Governance2020 International Conference on Intelligent Engineering and Management (ICIEM)10.1109/ICIEM48762.2020.9160317(155-159)Online publication date: Jun-2020
  • (2020)The application of statistical methods in the development of Cyrillic-Latin converter for Tatar language2020 13th International Conference on Developments in eSystems Engineering (DeSE)10.1109/DeSE51703.2020.9450742(293-298)Online publication date: 14-Dec-2020
  • (2019)Bi-lingual Intent Classification of Twitter Posts: A RoadmapProceedings of 6th International Conference in Software Engineering for Defence Applications10.1007/978-3-030-14687-0_1(1-9)Online publication date: 19-Mar-2019
  • (2017)Opinioned Post Detection in Sina WeiboIEEE Access10.1109/ACCESS.2017.26792275(7263-7271)Online publication date: 2017
  • (2016)Detecting spam web pages using content and link-based techniquesSadhana10.1007/s12046-015-0460-941:2(193-202)Online publication date: 10-Mar-2016
  • (2013)Towards linking buyers and sellersProceedings of the 22nd International Conference on World Wide Web10.1145/2487788.2488009(629-632)Online publication date: 13-May-2013
  • (2013)Web Spam Detection: New Approach with Hidden Markov ModelsInformation Retrieval Technology10.1007/978-3-642-45068-6_21(239-250)Online publication date: 2013
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media