[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1242572.1242660acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
Article

Learning to detect phishing emails

Published: 08 May 2007 Publication History

Abstract

Each month, more attacks are launched with the aim of making web users believe that they are communicating with a trusted entity for the purpose of stealing account information, logon credentials, and identity information in general. This attack method, commonly known as "phishing," is most commonly initiated by sending out emails with links to spoofed websites that harvest information. We present a method for detecting these attacks, which in its most general form is an application of machine learning on a feature set designed to highlight user-targeted deception in electronic communication. This method is applicable, with slight modification, to detection of phishing websites, or the emails used to direct victims to these sites. We evaluate this method on a set of approximately 860 such phishing emails, and 6950 non-phishing emails, and correctly identify over 96% of the phishing emails while only mis-classifying on the order of 0.1% of the legitimate emails. We conclude with thoughts on the future for such techniques to specifically identify deception, specifically with respect to the evolutionary nature of the attacks and information available.

References

[1]
K. Albrecht, N. Burri, and R. Wattenhofer. Spamato - An Extendable Spam Filter System. In 2nd Conference on Email and Anti-Spam (CEAS), Stanford University, Palo Alto, California, USA, July 2005.
[2]
A. Alsaid and C. J. Mitchell. Installing fake root keys in a pc. In EuroPKI, pages 227--239, 2005.
[3]
Anti-Phishing Working Group. Phishing activity trends report, Jan. 2005. http://www.antiphishing.org/reports/apwg_report_jan_2006.pdf.
[4]
Apache Software Foundation. Spamassassin homepage, 2006. http://spamassassin.apache.org/.
[5]
Apache Software Foundation. Spamassassin public corpus, 2006. http://spamassassin.apache.org/publiccorpus/.
[6]
L. Breiman. Random forests. Mach. Learn., 45(1):5--32, 2001.
[7]
M. Chandrasekaran, K. Karayanan, and S. Upadhyaya. Towards phishing e-mail detection based on their structural properties. In New York State Cyber Security Conference, 2006.
[8]
N. Chou, R. Ledesma, Y. Teraguchi, and J. C. Mitchell. Client-side defense against web-based identity theft. In NDSS, 2004.
[9]
W. Cohen. Learning to classify English text with ILP methods. In L. De Raedt, editor, Advances in Inductive Logic Programming, pages 124--143. IOS Press, 1996.
[10]
L. Cranor, S. Egelman, J. Hong, and Y. Zhang. Phinding phish: An evaluation of anti-phishing toolbars. Technical report, Carnegie Mellon University, Nov. 2006.
[11]
N. Cristianini and J. Shawe-Taylor. An introduction to support Vector Machines: and other kernel-based learning methods. Cambridge University Press, New York, NY, USA, 2000.
[12]
FDIC. Putting an end to account-hijacking identity theft, Dec. 2004. http://www.fdic.gov/consumers/consumer/idtheftstudy/identity_theft.pdf.
[13]
I. Fette, N. Sadeh, and A. Tomasic. Learning to detect phishing emails. Technical Report CMU-ISRI-06-112, Institute for Software Research, Carnegie Mellon University, June 2006. http://reports-archive.adm.cs.cmu.edu/anon/isri2006/abstracts/06-112.html.
[14]
F. L. Gandon and N. M. Sadeh. Semantic web technologies to reconcile privacy and context awareness. Journal of Web Semantics, 1(3):241--260, 2004.
[15]
Gilby Productions. Tinyurl, 2006. http://www.tinyurl.com/.
[16]
P. Graham. Better bayesian filtering. In Proceedings of the 2003 Spam Conference, Jan 2003.
[17]
B. Leiba and N. Borenstein. A multifaceted approach to spam reduction. In Proceedings of the First Conference on Email and Anti-Spam (CEAS), 2004.
[18]
T. Meyer and B. Whateley. Spambayes: Effective open-source, bayesian based, email classification system. In Proceedings of the First Conference on Email and Anti-Spam (CEAS), 2004.
[19]
Microsoft. Sender ID framework, 2006. http://www.microsoft.com/senderid.
[20]
T. M. Mitchell. Machine Learning. McGraw-Hill Higher Education, 1997.
[21]
Mozilla. Mozilla thunderbird, 2006. http://www.mozilla.com/thunderbird/.
[22]
J. Nazario. phishingcorpus homepage, Apr. 2006. http://monkey.org/%7Ejose/wiki/doku.php?id=PhishingCorpus.
[23]
Netcraft Ltd. Netcraft toolbar, 2006. http://toolbar.netcraft.com/.
[24]
V. V. Prakash. Vipul's razor, 2006. http://razor.sourceforge.net.
[25]
M. H. Rachna Dhamija, Doug Tygar. Why phishing works. In CHI '06: Proceedings of the SIGCHI conference on Human Factors in computing systems, pages 581--590. ACM Special Interest Group on Computer-Human Interaction, January 2006.
[26]
I. Rigoutsos and T. Huynh. Chung-kwei: a pattern-discovery-based system for the automatic identification of unsolicited e-mail messages (spam). In Proceedings of the First Conference on Email and Anti-Spam (CEAS), 2004.
[27]
M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz. A bayesian approach to filtering junk e-mail. In Learning for Text Categorization: Papers from the 1998 Workshop, Madison, Wisconsin, 1998. AAAI Technical Report WS-98-05.
[28]
Yahoo. Domainkeys, 2006. http://antispam.yahoo.com/domainkeys.
[29]
Yahoo. Flickr homepage, 2006. http://www.flickr.com/.
[30]
Y. Zhang, J. Hong, and L. Cranor. Cantina: A content-based approach to detecting phishing web sites. In WWW, 2007.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '07: Proceedings of the 16th international conference on World Wide Web
May 2007
1382 pages
ISBN:9781595936547
DOI:10.1145/1242572
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 May 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. email
  2. filtering
  3. learning
  4. phishing
  5. semantic attacks
  6. spam

Qualifiers

  • Article

Conference

WWW'07
Sponsor:
WWW'07: 16th International World Wide Web Conference
May 8 - 12, 2007
Alberta, Banff, Canada

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)404
  • Downloads (Last 6 weeks)38
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Detection of Malicious Websites using Machine LearningInternational Journal of Innovative Science and Research Technology (IJISRT)10.38124/ijisrt/IJISRT24MAR1199(1409-1412)Online publication date: 29-Mar-2024
  • (2024)Analysis and Prevention of AI-Based Phishing Email AttacksElectronics10.3390/electronics1310183913:10(1839)Online publication date: 9-May-2024
  • (2024)Development of cyber security assessment tool for financial institutionsundefined10.20334/2024-023-MOnline publication date: 2024
  • (2024)SİBERUZAMDA SUÇ TİPOLOJİLERİ VE SİBER İLETİŞİM TABANLI ÇÖZÜMLEME MODELİNİN ANALİZİKahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilimleri Dergisi10.17780/ksujes.147711627:4(1375-1400)Online publication date: 3-Dec-2024
  • (2024)VeriSMS: A Message Verification System for Inclusive Patient Outreach against Phishing AttacksProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642027(1-17)Online publication date: 11-May-2024
  • (2024)From WHOIS to RDAP: Are IP Lookup Services Getting any Better?NOMS 2024-2024 IEEE Network Operations and Management Symposium10.1109/NOMS59830.2024.10575906(1-10)Online publication date: 6-May-2024
  • (2024)Phishing Website Detection Using Machine Learning2024 2nd International Conference on Networking and Communications (ICNWC)10.1109/ICNWC60771.2024.10537279(1-5)Online publication date: 2-Apr-2024
  • (2024)Enhancing Cybersecurity with Transformers: Preventing Phishing Emails and Social Media Scams2024 IEEE Conference on Dependable and Secure Computing (DSC)10.1109/DSC63325.2024.00017(31-36)Online publication date: 6-Nov-2024
  • (2024)Look before you leap: Detecting phishing web pages by exploiting raw URL and HTML characteristicsExpert Systems with Applications10.1016/j.eswa.2023.121183236(121183)Online publication date: Feb-2024
  • (2024)Analysing the email data using stylometric method and deep learning to mitigate phishing attackInternational Journal of Information Technology10.1007/s41870-024-01839-5Online publication date: 5-May-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media