[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1166160.1166191acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
Article

Content based SMS spam filtering

Published: 10 October 2006 Publication History

Abstract

In the recent years, we have witnessed a dramatic increment in the volume of spam email. Other related forms of spam are increasingly revealing as a problem of importance, specially the spam on Instant Messaging services (the so called SPIM), and Short Message Service (SMS) or mobile spam.Like email spam, the SMS spam problem can be approached with legal, economic or technical measures. Among the wide range of technical measures, Bayesian filters are playing a key role in stopping email spam. In this paper, we analyze to what extent Bayesian filtering techniques used to block email spam, can be applied to the problem of detecting and stopping mobile spam. In particular, we have built two SMS spam test collections of significant size, in English and Spanish. We have tested on them a number of messages representation techniques and Machine Learning algorithms, in terms of effectiveness. Our results demonstrate that Bayesian filtering techniques can be effectively transferred from email to SMS spam.

References

[1]
Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Paliouras, G., Spyropoulos, C.D. An Evaluation of Naive Bayesian Anti-spam Filtering. Proceedings of the Workshop on Machine Learning in the New Information Age, 11th European Conference on Machine Learning (ECML 2000), pp. 9--17, 2000.
[2]
Christine E. Drakeand Jonathan J. Oliver, Eugene J. Koontz. Anatomy of a Phishing Email. Proceedings of the First Conference on Email and Anti-spam (CEAS), 2004.
[3]
Graham, Paul. Better Bayesian Filtering. Proceedings of the 2003 Spam Conference, January 2003.
[4]
Gómez, J.M., Maña-López, M., Puertas, E. Combining Text and Heuristics for Cost-Sensitive spam Filtering. 4 One of the strengths of the ROCCH method is that it is able to detect that a specific classifier for a given cost may me optimal for other cost distributions, given that the class distribution affects also classifiers learning and performance. Proceedings of the Fourth Computational Natural Language Learning Workshop, CoNLL-2000, Association for Computational Linguistics, 2000.
[5]
Domingos, P. 1999. Metacost: A general method for making classifiers cost-sensitive. Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining.
[6]
Drucker, H, Vapnik, V., Wu, D. Support Vector Machines for spam Categorization. IEEE Transactions on Neural Networks, 10(5), pp. 1048--1054, 1999.
[7]
Frank, E., I.H. Witten. 1998. Generating accurate rule sets without global optimization. Machine Learning: Proceedings of the Fifteenth International Conference.
[8]
Gómez, J.M. 2002. Evaluating cost-sensitive unsolicited bulk email categorization. Proceedings of the ACM Symposium on Applied Computing.
[9]
Joachims, T. 2001. A statistical learning model of text classification with support vector machines. En Proceedings of the 24th ACM International Conference on Research and Development in Information Retrieval. ACM Press.
[10]
Lewis, D.D. 1998. Naive (Bayes) at forty: The independence assumption in information retrieval. En Proceedings of the 10th European Conference on Machine Learning. Springer Verlag.
[11]
Provost, F., T. Fawcett. 2001. Robust classification for imprecise environments. Machine Learning Journal, 42(3):203--231.
[12]
Quinlan, J.R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann.
[13]
Salton, G. 1989. Automatic text processing: the transformation, analysis and retrieval of information by computer. Addison-Wesley.
[14]
Sebastiani, F., 2002. Machine learning in automated text categorization. ACM Computing Surveys, 34(1):1--47.
[15]
Ting, K.M. 1998. Inducing cost-sensitive trees via instance weighting. En Proceedings of the 2nd European Symposium on Principles of Data Mining and Knowledge Discovery, 139--147.
[16]
Witten, I.H., E. Frank. 1999. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann.
[17]
Xiang,Y., Chowdhury, M., Ali, S. Filtering Mobile spam by Support Vector Machine. Proceedings of CSITeA-04, ISCA Press, December 27--29, 2004.
[18]
Yang, Y. 1999. An evaluation of statistical approaches to text categorization. Information Retrieval, 1(1/2):69--90.
[19]
Yang, Y., J.O. Pedersen. 1997. A comparative study on feature selection in text categorization. En Proceedings of the 14th International Conference on Machine Learning.
[20]
Bratko, A, B. Filipic. Spam Filtering using Character-level Markov Models: Experiments for the TREC 2005 Spam Track. Proceedings of the 2005 Text Retrieval Conference, 2005.
[21]
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E. A bayesian approach to filtering junk e-mail. In Learning for Text Categorization: Papers from the 1998 Workshop, Madison, Wisconsin, 1998. AAAI Technical Report WS-98-05.
[22]
Dwork, C., Goldberg A., Naor M. On memory-bound functions for fighting spam. In Proceedings of the 23rd Annual International Cryptology Conference (CRYPTO 2003), August 2003.
[23]
R.J. Hall. How to avoid unwanted email. Communications of the ACM, March 1998.
[24]
Golbeck, J., Hendler, J. Reputation network analysis for email filtering. In Proceedings of the First Conference on Email and Anti-Spam (CEAS), 2004.
[25]
Tompkins T., Handley D. Giving e-mail back to the users: Using digital signatures to solve the spam problem. First Monday, 8(9), September 2003.

Cited By

View all
  • (2024)Korean Voice Phishing Detection Applying NER With Key Tags and Sentence-Level N-GramIEEE Access10.1109/ACCESS.2024.338702712(52951-52962)Online publication date: 2024
  • (2024)Investigating Evasive Techniques in SMS Spam Filtering: A Comparative Analysis of Machine Learning ModelsIEEE Access10.1109/ACCESS.2024.336467112(24306-24324)Online publication date: 2024
  • (2024)An optimal feature selection method for text classification through redundancy and synergy analysisMultimedia Tools and Applications10.1007/s11042-024-19736-1Online publication date: 28-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
DocEng '06: Proceedings of the 2006 ACM symposium on Document engineering
October 2006
232 pages
ISBN:1595935150
DOI:10.1145/1166160
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bayesian filter
  2. junk
  3. receiver operating characteristic
  4. spam

Qualifiers

  • Article

Conference

DocEng06
Sponsor:
DocEng06: ACM Symposium on Document Engineering
October 10 - 13, 2006
Amsterdam, The Netherlands

Acceptance Rates

Overall Acceptance Rate 194 of 564 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)46
  • Downloads (Last 6 weeks)2
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Korean Voice Phishing Detection Applying NER With Key Tags and Sentence-Level N-GramIEEE Access10.1109/ACCESS.2024.338702712(52951-52962)Online publication date: 2024
  • (2024)Investigating Evasive Techniques in SMS Spam Filtering: A Comparative Analysis of Machine Learning ModelsIEEE Access10.1109/ACCESS.2024.336467112(24306-24324)Online publication date: 2024
  • (2024)An optimal feature selection method for text classification through redundancy and synergy analysisMultimedia Tools and Applications10.1007/s11042-024-19736-1Online publication date: 28-Jun-2024
  • (2024)Semantic similarity-aware feature selection and redundancy removal for text classification using joint mutual informationKnowledge and Information Systems10.1007/s10115-024-02143-166:10(6187-6212)Online publication date: 13-Jun-2024
  • (2024)A hybrid feature selection method for text classification using a feature-correlation-based genetic algorithmSoft Computing10.1007/s00500-024-10386-x28:23-24(13567-13593)Online publication date: 19-Nov-2024
  • (2023)Robust weak supervision with variational auto-encodersProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619840(34394-34408)Online publication date: 23-Jul-2023
  • (2023)BTSAMAInternational Journal of Ambient Computing and Intelligence10.4018/IJACI.32735114:1(1-23)Online publication date: 31-Jul-2023
  • (2023)Rotational Invariance Using Gabor Convolution Neural Network and Color Space for Image ProcessingInternational Journal of Ambient Computing and Intelligence10.4018/IJACI.32379814:1(1-11)Online publication date: 23-May-2023
  • (2023)Traffic Light System With Embedded GPS (Global Positioning System) and GSM (Global System for Mobile Communications) ShieldInternational Journal of Ambient Computing and Intelligence10.4018/IJACI.32319614:1(1-13)Online publication date: 12-May-2023
  • (2023)Cyber Security Using Machine Learning TechniquesProceedings of the International Conference on Applications of Machine Intelligence and Data Analytics (ICAMIDA 2022)10.2991/978-94-6463-136-4_59(680-701)Online publication date: 1-May-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media