short-paper

Public Access

"Kn0w Thy Doma1n Name": Unbiased Phishing Detection Using Domain Name Based Features

Authors:

Hossein Shirazi,

Bruhadeshwar Bezawada,

Indrakshi RayAuthors Info & Claims

SACMAT '18: Proceedings of the 23nd ACM on Symposium on Access Control Models and Technologies

Pages 69 - 75

https://doi.org/10.1145/3205977.3205992

Published: 07 June 2018 Publication History

PDF eReader

Abstract

Phishing websites remain a persistent security threat. Thus far, machine learning approaches appear to have the best potential as defenses. But, there are two main concerns with existing machine learning approaches for phishing detection. The first is the large number of training features used and the lack of validating arguments for these feature choices. The second concern is the type of datasets used in the literature that are inadvertently biased with respect to the features based on the website URL or content. To address these concerns, we put forward the intuition that the domain name of phishing websites is the tell-tale sign of phishing and holds the key to successful phishing detection. Accordingly, we design features that model the relationships, visual as well as statistical, of the domain name to the key elements of a phishing website, which are used to snare the end-users. The main value of our feature design is that, to bypass detection, an attacker will find it very difficult to tamper with the visual content of the phishing website without arousing the suspicion of the end user. Our feature set ensures that there is minimal or no bias with respect to a dataset. Our learning model trains with only seven features and achieves a true positive rate of 98% and a classification accuracy of 97%, on sample dataset. Compared to the state-of-the-art work, our per data instance classification is 4 times faster for legitimate websites and 10 times faster for phishing websites. Importantly, we demonstrate the shortcomings of using features based on URLs as they are likely to be biased towards specific datasets. We show the robustness of our learning algorithm by testing on unknown live phishing URLs and achieve a high detection accuracy of $99.7%$.

References

[1]

Neda Abdelhamid, Fadi A. Thabtah, and Hussein Abdel-jaber. 2017. Phishing Detection: A Recent Intelligent Machine Learning Comparison Based on Models Content and Features. In Proc. of the IEEE Int. Conf. on Intelligence and Security Informatics (ISI). 72--77.

Abstract

References

Cited By

Index Terms

Recommendations

Phishing environments, techniques, and countermeasures

The applicability of a hybrid framework for automated phishing detection

A systematic literature review on phishing website detection techniques

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations