[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3205977.3205992acmconferencesArticle/Chapter ViewAbstractPublication PagessacmatConference Proceedingsconference-collections
short-paper
Public Access

"Kn0w Thy Doma1n Name": Unbiased Phishing Detection Using Domain Name Based Features

Published: 07 June 2018 Publication History

Abstract

Phishing websites remain a persistent security threat. Thus far, machine learning approaches appear to have the best potential as defenses. But, there are two main concerns with existing machine learning approaches for phishing detection. The first is the large number of training features used and the lack of validating arguments for these feature choices. The second concern is the type of datasets used in the literature that are inadvertently biased with respect to the features based on the website URL or content. To address these concerns, we put forward the intuition that the domain name of phishing websites is the tell-tale sign of phishing and holds the key to successful phishing detection. Accordingly, we design features that model the relationships, visual as well as statistical, of the domain name to the key elements of a phishing website, which are used to snare the end-users. The main value of our feature design is that, to bypass detection, an attacker will find it very difficult to tamper with the visual content of the phishing website without arousing the suspicion of the end user. Our feature set ensures that there is minimal or no bias with respect to a dataset. Our learning model trains with only seven features and achieves a true positive rate of 98% and a classification accuracy of 97%, on sample dataset. Compared to the state-of-the-art work, our per data instance classification is 4 times faster for legitimate websites and 10 times faster for phishing websites. Importantly, we demonstrate the shortcomings of using features based on URLs as they are likely to be biased towards specific datasets. We show the robustness of our learning algorithm by testing on unknown live phishing URLs and achieve a high detection accuracy of $99.7%$.

References

[1]
Neda Abdelhamid, Fadi A. Thabtah, and Hussein Abdel-jaber. 2017. Phishing Detection: A Recent Intelligent Machine Learning Comparison Based on Models Content and Features. In Proc. of the IEEE Int. Conf. on Intelligence and Security Informatics (ISI). 72--77.
[2]
Mohammed Al-Janabi, Ed de Quincey, and Peter Andras. 2017. Using Supervised Machine Learning Algorithms to Detect Suspicious URLs in Online Social Networks. In Proc. of the IEEE/ACM Int. Conf. on Advances in Social Network Analysis and Mining (ASONAM). 1104--1111.
[3]
Ram B. Basnet, Srinivas Mukkamala, and Andrew H. Sung. 2008. Detection of Phishing Attacks: A Machine Learning Approach. Soft Computing Applications in Industry. Studies in Fuzziness and Soft Computing. Vol. Vol. 226. Springer, 373--383.
[4]
Qian Cui, Guy-Vincent Jourdan, Gregor V Bochmann, Russell Couturier, and Iosif-Viorel Onut. 2017. Tracking Phishing Attacks over Time. In Proc. of the Int. World Wide Web (WWW) Conf. 667--676.
[5]
Z. Dou, I. Khalil, A. Khreishah, A. Al-Fuqaha, and M. Guizani. 2017. Systematization of Knowledge (SoK): A Systematic Review of Software-Based Web Phishing Detection. IEEE Communications Surveys Tutorials Vol. 19, 4 (2017), 2797--2819.
[6]
Sujata Garera, Niels Provos, Monica Chew, and Aviel D Rubin. 2007. A Framework for Detection and Measurement of Phishing Attacks Proc. of the ACM Workshop on Recurring Malcode (WORM). ACM, 1--8.
[7]
R. Gowtham and Ilango Krishnamurthi. 2014. A Comprehensive and Efficacious Architecture for Detecting Phishing Webpages. Computers and Security Vol. 40 (2014), 23--37.
[8]
Ankit Kumar Jain and B. B. Gupta. 2017. Towards Detection of Phishing Websites on Client-side Using Machine Learning Based Approach. Telecommunication Systems (December. 2017), 1--14.
[9]
Justin Ma, Lawrence K. Saul, Stefan Savage, and Geoffrey M. Voelker. 2009. Beyond Blacklists: Learning to Detect Malicious Web Sites from Suspicious URLs Proc. of the ACM Int. Conf. on Knowledge Discovery and Data Mining (KDD). ACM, 1245--1254.
[10]
Samuel Marchal, Giovanni Armano, Tommi Grondahl, Kalle Saari, Nidhi Singh, and N. Asokan. 2017. Off-the-Hook: An Efficient and Usable Client-Side Phishing Prevention Application. IEEE Trans. on Computers Vol. 66, 10 (2017), 1717--1733.
[11]
Samuel Marchal, Kalle Saari, Nidhi Singh, and N Asokan. 2016. Know Your Phish: Novel Techniques for Detecting Phishing Sites and Their Targets Proc. of IEEE Int. Conf. Distributed Computing Systems (ICDCS). IEEE, 323--333.
[12]
Daisuke Miyamoto, Hiroaki Hazeyama, and Youki Kadobayashi. 2008. An Evaluation of Machine Learning-based Methods for Detection of Phishing Sites Proc. of the Int. Conf. on Neural Information Processing (ICONIP). Springer, 539--546.
[13]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, and others. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research Vol. 12, Oct (2011), 2825--2830.
[14]
Routhu Srinivasa Rao and Alwyn Roshan Pais. 2018. Detection of Phishing Websites using an Efficient Feature-based Machine Learning Framework. Neural Computing and Applications (January. 2018).
[15]
Choon Lin Tan, Kang Leng Chiew, KokSheik Wong, and San Nah Sze. 2016. PhishWHO: Phishing webpage detection via identity keywords extraction and target domain name finder. Decision Support Systems Vol. 88, C (2016), 18--27.
[16]
Rakesh Verma and Keith Dyer. 2015. On the Character of Phishing URLs: Accurate and Robust Statistical Learning Classifiers Proc. of ACM Conf. on Data and Applications Security and Privacy (CODASPY). 111--122.
[17]
Guang Xiang, Jason Hong, Carolyn P. Rose, and Lorrie Cranor. 2011. CANTINA
[18]
: A Feature-Rich Machine Learning Framework for Detecting Phishing Web Sites. ACM Trans. Information and Systems Security (TISSEC), Vol. 14, 2 (September. 2011), 1--28.
[19]
Haijun Zhang, Gang Liu, Tommy W. S. Chow, and Wenyin Liu. 2011. Textual and Visual Content-Based Anti-Phishing: A Bayesian Approach. IEEE Trans. on Neural Networks Vol. 22, 10 (2011), 1532--1546.
[20]
Wei Zhang, Qingshan Jiang, Lifei Chen, and Chengming Li. 2017. Two-stage ELM for Phishing Web Pages Detection Using Hybrid Features. World Wide Web, Vol. 20, 4 (2017), 797--813.
[21]
Yue Zhang, Jason I Hong, and Lorrie F Cranor. 2007. Cantina: A Content-based Approach to Detecting Phishing Web Sites Proc. of the World Wide Web (WWW) Conf. ACM, 639--648.
[22]
Rui Zhao, Samantha John, Stacy Karas, Cara Bussell, Jennifer Roberts, Daniel Six, Brandon Gavett, and Chuan Yue. 2017. Design and Evaluation of the Highly Insidious Extreme Phishing Attacks. Computers & Security Vol. 70 (2017), 634 -- 647.

Cited By

View all
  • (2025)Critical Strategies for Phishing Defense and Digital Asset ProtectionCritical Phishing Defense Strategies and Digital Asset Protection10.4018/979-8-3693-8784-9.ch011(221-244)Online publication date: 28-Feb-2025
  • (2024)Phishing URL Detection Using BiLSTM With Attention MechanismMachine Intelligence Applications in Cyber-Risk Management10.4018/979-8-3693-7540-2.ch008(159-184)Online publication date: 22-Nov-2024
  • (2024)AntiPhishStack: LSTM-Based Stacked Generalization Model for Optimized Phishing URL DetectionSymmetry10.3390/sym1602024816:2(248)Online publication date: 17-Feb-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SACMAT '18: Proceedings of the 23nd ACM on Symposium on Access Control Models and Technologies
June 2018
271 pages
ISBN:9781450356664
DOI:10.1145/3205977
  • General Chair:
  • Elisa Bertino,
  • Program Chairs:
  • Dan Lin,
  • Jorge Lobo
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. biased datasets
  2. domain name
  3. machine learning
  4. phishing
  5. phishing detection

Qualifiers

  • Short-paper

Funding Sources

Conference

SACMAT '18
Sponsor:

Acceptance Rates

SACMAT '18 Paper Acceptance Rate 14 of 50 submissions, 28%;
Overall Acceptance Rate 177 of 597 submissions, 30%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)303
  • Downloads (Last 6 weeks)36
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Critical Strategies for Phishing Defense and Digital Asset ProtectionCritical Phishing Defense Strategies and Digital Asset Protection10.4018/979-8-3693-8784-9.ch011(221-244)Online publication date: 28-Feb-2025
  • (2024)Phishing URL Detection Using BiLSTM With Attention MechanismMachine Intelligence Applications in Cyber-Risk Management10.4018/979-8-3693-7540-2.ch008(159-184)Online publication date: 22-Nov-2024
  • (2024)AntiPhishStack: LSTM-Based Stacked Generalization Model for Optimized Phishing URL DetectionSymmetry10.3390/sym1602024816:2(248)Online publication date: 17-Feb-2024
  • (2024)Utilizing Large Language Models with Human Feedback Integration for Generating Dedicated Warning for Phishing EmailsProceedings of the 2nd ACM Workshop on Secure and Trustworthy Deep Learning Systems10.1145/3665451.3665531(35-46)Online publication date: 2-Jul-2024
  • (2024)An Interpretable Fine-Tuned BERT Approach for Phishing URLs Detection: A Superior Alternative to Feature Engineering2024 11th International Conference on Social Networks Analysis, Management and Security (SNAMS)10.1109/SNAMS64316.2024.10883775(138-145)Online publication date: 9-Dec-2024
  • (2024)Mitigating Bias in Machine Learning Models for Phishing Webpage Detection2024 16th International Conference on COMmunication Systems & NETworkS (COMSNETS)10.1109/COMSNETS59351.2024.10427170(430-432)Online publication date: 3-Jan-2024
  • (2024)A State-of-the-Art Review on Phishing Website Detection TechniquesIEEE Access10.1109/ACCESS.2024.351497212(187976-188012)Online publication date: 2024
  • (2024)A Study on Adversarial Sample Resistance and Defense Mechanism for Multimodal Learning-Based Phishing Website DetectionIEEE Access10.1109/ACCESS.2024.343681212(137805-137824)Online publication date: 2024
  • (2024)Phishing URL detection with neural networks: an empirical studyScientific Reports10.1038/s41598-024-74725-614:1Online publication date: 24-Oct-2024
  • (2023)Adversarial Autoencoder Data Synthesis for Enhancing Machine Learning-Based Phishing Detection AlgorithmsIEEE Transactions on Services Computing10.1109/TSC.2023.323480616:4(2411-2422)Online publication date: 1-Jul-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media