More Web Proxy on the site http://driver.im/

Article

Adversarial classification

Authors:

Pedro Domingos,

Deepak VermaAuthors Info & Claims

KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 99 - 108

https://doi.org/10.1145/1014052.1014066

Published: 22 August 2004 Publication History

Abstract

Essentially all data mining algorithms assume that the data-generating process is independent of the data miner's activities. However, in many domains, including spam detection, intrusion detection, fraud detection, surveillance and counter-terrorism, this is far from the case: the data is actively manipulated by an adversary seeking to make the classifier produce false negatives. In these domains, the performance of a classifier can degrade rapidly after it is deployed, as the adversary learns to defeat it. Currently the only solution to this is repeated, manual, ad hoc reconstruction of the classifier. In this paper we develop a formal framework and algorithms for this problem. We view classification as a game between the classifier and the adversary, and produce a classifier that is optimal given the adversary's optimal strategy. Experiments in a spam detection domain show that this approach can greatly outperform a classifier learned in the standard way, and (within the parameters of the problem) automatically adapt the classifier to the adversary's evolving manipulations.

References

[1]

P. Domingos. MetaCost: A general method for making classifiers cost-sensitive. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 155--164, San Diego, CA, 1999. ACM Press.

Digital Library

[2]

P. Domingos and M. Pazzani. On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29:103--130, 1997.

Digital Library

[3]

R. B. Doorenbos, O. Etzioni, and D. S. Weld. A scalable comparison-shopping agent for the World-Wide Web. In Proceedings of the First International Conference on Autonomous Agents, pages 39--48, Marina del Rey, CA, 1997. ACM Press.

Digital Library

[4]

T. Fawcett. "In vivo" spam filtering: A challenge problem for KDD. SIGKDD Explorations, 5(2):140--148, 2003.

Digital Library

[5]

T. Fawcett and F. Provost. Adaptive fraud detection. Data Mining and Knowledge Discovery, 1(3):291--316, 1997.

Digital Library

[6]

D. Fudenberg and D. Levine. The Theory of Learning in Games. MIT Press, Cambridge, MA, 1999.

[7]

D. Fudenberg and J. Tirole. Game Theory. MIT Press, Cambridge, MA, 1991.

[8]

M. R. Garey and D. S. Johnson. Computers and Intractability. Freeman, New York, NY, 1979.

Digital Library

[9]

L. Guernsey. Retailers rise in Google rankings as rivals cry foul. New York Times, November 20, 2003.

[10]

G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 97--106, San Francisco, CA, 2001. ACM Press.

Digital Library

[11]

D. Jensen, M. Rattigan, and H. Blau. Information awareness: A prospective technical assessment. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 378--387, Washington, DC, 2003. ACM Press.

Digital Library

[12]

M. Kearns. Computational game theory. Tutorial, Department of Computer and Information Sciences, University of Pennsylvania, Philadelphia, PA, 2002. http://www.cis.upenn.edu/~mkearns/nips02tutorial/.

[13]

R. Kohavi and G. John. Wrappers for feature subset selection. Artificial Intelligence, 97(1-2):273--324, 1997.

Digital Library

[14]

B. Krebs. Online piracy spurs high-tech arms race. Washington Post, June 26, 2003.

[15]

M. L. Littman. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the Eleventh International Conference on Machine Learning, pages 157--163, New Brunswick, NJ, 1994. Morgan Kaufmann.

[16]

B. Lloyd. Been gazumped by Google? Trying to make sense of the "Florida" update. Search Engine Guide, November 25, 2003.

[17]

M. V. Mahoney and P. K. Chan. Learning nonstationary models of normal network traffic for detecting novel attacks. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 376--385, Edmonton, Canada, 2002. ACM Press.

Digital Library

[18]

A. McCallum and K. Nigam. A comparison of event models for Naive Bayes text classification. In Proceedings of the AAAI-98 Workshop on Learning for Text Categorization, Madison, WI, 1998. AAAI Press.

[19]

F. Nielsen. Email data, 2003. http://www.imm.dtu.dk/~rem/datasets/.

[20]

F. Provost and T. Fawcett. Robust classification for imprecise environments. Machine Learning, 42:203--231, 2001.

Digital Library

[21]

J. Rennie. Ifile spam classifier, 2003. http://www.nongnu.org/ifile/.

[22]

P. Robertson and J. M. Brady. Adaptive image analysis for aerial surveillance. IEEE Intelligent Systems, 14(3):30--36, 1999.

Digital Library

[23]

M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz. A Bayesian approach to filtering junk e-mail. In Proceedings of the AAAI-98 Workshop on Learning for Text Categorization, Madison, WI, 1998. AAAI Press.

[24]

G. Sakkis, I. Androutsopoulos, G. Paliouras, V. Karkaletsis, C.D. Spyropoulos, and P. Stamatopoulos. A memory-based approach to anti-spam filtering for mailing lists. In Information Retrieval, volume 6, pages 49--73. Kluwer, 2003.

Digital Library

[25]

T. Senator. Ongoing management and application of discovered knowledge in a large regulatory organization: A case study of the use and impact of NASD regulation's advanced detection system (ADS). In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 44--53, Boston, MA, 2000. ACM Press.

Digital Library

[26]

J. M. Smith. Evolution and the Theory of Games. Cambridge University Press, Cambridge, UK, 1982.

[27]

P. Turney. Cost-sensitive learning bibliography. Online bibliography, NRC Institute for Information Technology, Ottawa, Canada, 1998. http://members.rogers.com/peter.turney/bibliographies/cost-sensitive.html.

[28]

WordNet 2.0: A lexical database for the English language, 2003. http://www.cogsci.princeton.edu/~wn/.

Cited By

Hu YChen JZhu Q(2025)Game-Theoretic Neyman-Pearson Detection to Combat Strategic EvasionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.351583420(516-530)Online publication date: 2025
https://doi.org/10.1109/TIFS.2024.3515834
Shamshiri SLiu HSohn I(2025)Adversarial robust image processing in medical digital twinInformation Fusion10.1016/j.inffus.2024.102728115(102728)Online publication date: Mar-2025
https://doi.org/10.1016/j.inffus.2024.102728
Astorino AGaudioso MGorgone EManca B(2025)Difference of Convex programming in adversarial SVMJournal of Computational and Applied Mathematics10.1016/j.cam.2024.116201454(116201)Online publication date: Jan-2025
https://doi.org/10.1016/j.cam.2024.116201
Show More Cited By

Recommendations

A Combined Classification Algorithm Based on C4.5 and NB
ISICA '08: Proceedings of the 3rd International Symposium on Advances in Computation and Intelligence

When our learning task is to build a model with accurate classification, C4.5 and NB are two very important algorithms for achieving this task because of their simplicity and high performance. In this paper, we present a combined classification ...
Cost-sensitive matrixized classification learning with information entropy
Abstract
Classifier design is one of the most significant fields in pattern recognition. Most classifiers are measured by classification accuracy, which assumes that all the misclassification cost are the same. In the real world, different ...
Highlights
- The CsMatMHKS combine the matrixized learning framework with cost-sensitive learning.
Learning Naive Bayes Classifiers for Music Classification and Retrieval
ICPR '10: Proceedings of the 2010 20th International Conference on Pattern Recognition

In this paper, we explore the use of naive Bayes classifiers for music classification and retrieval. The motivation is to employ all audio features extracted from local windows for classification instead of just using a single song-level feature vector ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining

August 2004

874 pages

ISBN:1581138881

DOI:10.1145/1014052

General Chairs:
Won Kim
Cyber Database Solutions
,
Ronny Kohavi
Amazon.com
,
Program Chairs:
Johannes Gehrke
Cornell University
,
William DuMouchel
AT&T Labs Research

Copyright © 2004 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 August 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

KDD04

Sponsor:

KDD04: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 22 - 25, 2004

WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

625
Total Citations
View Citations
3,815
Total Downloads

Downloads (Last 12 months)207
Downloads (Last 6 weeks)29

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hu YChen JZhu Q(2025)Game-Theoretic Neyman-Pearson Detection to Combat Strategic EvasionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.351583420(516-530)Online publication date: 2025
https://doi.org/10.1109/TIFS.2024.3515834
Shamshiri SLiu HSohn I(2025)Adversarial robust image processing in medical digital twinInformation Fusion10.1016/j.inffus.2024.102728115(102728)Online publication date: Mar-2025
https://doi.org/10.1016/j.inffus.2024.102728
Astorino AGaudioso MGorgone EManca B(2025)Difference of Convex programming in adversarial SVMJournal of Computational and Applied Mathematics10.1016/j.cam.2024.116201454(116201)Online publication date: Jan-2025
https://doi.org/10.1016/j.cam.2024.116201
Indira MMohanasundaram KSaranya M(2024)A Survey of Machine Learning and Cryptography AlgorithmsInnovative Machine Learning Applications for Cryptography10.4018/979-8-3693-1642-9.ch006(105-118)Online publication date: 12-Apr-2024
https://doi.org/10.4018/979-8-3693-1642-9.ch006
Imam N(2024)Adversarial Examples on XAI-Enabled DT for Smart Healthcare SystemsSensors10.3390/s2421689124:21(6891)Online publication date: 27-Oct-2024
https://doi.org/10.3390/s24216891
Gallego VNaveiro RRedondo ARíos Insua DRuggeri F(2024)Protecting Classifiers from AttacksStatistical Science10.1214/24-STS92239:3Online publication date: 1-Aug-2024
https://doi.org/10.1214/24-STS922
Cen SAlur R(2024)From Transparency to Accountability and Back: A Discussion of Access and Evidence in AI AuditingProceedings of the 4th ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization10.1145/3689904.3694711(1-14)Online publication date: 29-Oct-2024
https://dl.acm.org/doi/10.1145/3689904.3694711
Khadka ASthapit SEpiphaniou GMaple C(2024)Resilient Machine Learning: Advancement, Barriers, and Opportunities in the Nuclear IndustryACM Computing Surveys10.1145/364860856:9(1-29)Online publication date: 24-Apr-2024
https://dl.acm.org/doi/10.1145/3648608
Liu JXing YShi XSong FXu ZMing Z(2024)Abstraction and Refinement: Towards Scalable and Exact Verification of Neural NetworksACM Transactions on Software Engineering and Methodology10.1145/364438733:5(1-35)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3644387
Woo SYoon DGim YPark EHong JPark J(2024)RAAD: Reinforced Adversarial Anomaly DetectorProceedings of the 39th ACM/SIGAPP Symposium on Applied Computing10.1145/3605098.3635920(883-891)Online publication date: 8-Apr-2024
https://dl.acm.org/doi/10.1145/3605098.3635920
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents