8000 GitHub - 1Hun0ter1/MGCF-Net: MGCF-Net for Phishing URLs Detection
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

1Hun0ter1/MGCF-Net

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ” Secure Cyber Systems β€” Phishing Detection Framework

Python Version License Last Commit Stars Issues Forks


A complete and modular system for phishing detection, combining traditional machine learning, deep learning, and handcrafted URL feature engineering.


πŸ”— Quick Start Website Experience

For a fast hands-on experience, check out the following website where you can test the phishing detection model:

URL Hunter Jack


πŸ“‚ Project Structure

MGCF-Net/
β”‚
β”œβ”€β”€ dataset/               # Balanced dataset (phishing & legitimate URLs)
β”œβ”€β”€ src/                   # Main codebase
β”‚   β”œβ”€β”€ dl.py              # Main script (implement for the architectures)
β”‚   β”œβ”€β”€ ml.py              # Traditional ML models (SVM, RF, NB)
β”‚   β”œβ”€β”€ dl_test.py         # Run pretrained model on test set
β”‚   β”œβ”€β”€ dl_run.py          # running for web
β”‚   └── ...                # Supporting modules
β”œβ”€β”€ requirements.txt       # Python dependencies
└── README.md              # Project overview

βš™οΈ Installation

1. Clone the repository

git clone https://github.com/1Hun0ter1/MGCF-Net.git
cd MGCF-Net

2. Set up environment

Ensure you have Python β‰₯ 3.8 and install dependencies:

conda env create -f environment.yaml -n MGCF-Net

πŸš€ Usage

a. πŸ”¬ Run Deep Learning Model

cd src

CUDA_VISIBLE_DEVICES=0 python dl.py \
    -ep 20 \
    -bs 1000 \
    -arch MGCF_Net \
    -wd 1e-3 \
    -feature "word-level" \
    -lr 'cosine' \
    -nw 50000 \
    -data "balanced_dataset" \
    -enhanced False

πŸ”§ Parameters Explained

Parameter Description
-ep Number of training epochs (e.g., 20)
-bs Batch size for both training and testing (e.g., 1000)
-arch Architecture name, e.g., MGCF_Net, DeepCNN_Light_Hybrid, DeepCNN_Light_V2_2, rnn, brnn, cnn_base, etc.
-wd Weight decay (L2 regularization strength), e.g., 1e-3
-feature Feature extraction method: char-level, word-level, TF-IDF, n-grams
-lr Learning rate scheduler: none, cosine, exponential
-nw Number of words to consider as features (e.g., 50000)
-data Dataset type, e.g., "balanced_dataset"
-enhanced Enable adversarial data enhancement (True/False)

Note: The results will be automatically saved under test_results/custom/... with timestamped folders.

b. βš™οΈ Run Machine Learning Models

cd src

CUDA_VISIBLE_DEVICES=0 python ml.py -model "SVM"

Options for -model:

  • "SVM"
  • "RandomForest"
  • "LogisticRegression"
  • "KNeighbors"

c. 🌐 Run the Website Frontend

cd src

python dl_run.py  

This launches a simple/branch/file-uploaded URL-checking web page with your trained model in the backend.

d. πŸ§ͺ Quick Test Using Saved Model

cd src

python dl_test.py -m path/to/model_all.keras -r path/to/result_dir/

🧠 MGCF-Net: Multi-Granular Context Fusion Network

image-20250415113055225

MGCF-Net is a custom-designed neural architecture for phishing URL detection. It fuses multi-granular textual features with handcrafted and domain-aware signals via a Cross-Attentive Fusion Mechanism.

πŸ” Key Innovations

  • 🧬 Multi-Granular Feature Fusion
    Combine char/word embeddings, manual heuristics, and domain reputation features.

  • 🌐 Local + Global Context Modeling
    Capture semantic signals via:

    • CNN: for local n-gram patterns
    • BiLSTM: for sequential URL dependencies
  • 🎯 Cross-Attention Fusion Layer
    Aligns semantic representations with domain-specific signals to strengthen detection of disguised or adversarial phishing URLs.

  • πŸ›‘οΈ Adversarial Robustness
    Trains with auto-generated attack URLs (e.g., typo-squatting, fake subdomains) for real-world simulation.


🧱 Model Input Format

[ url_sequence , manual_domain_features ]
  • url_sequence: tokenized from raw URL (char-level / word-level)
  • manual_domain_features: 9D feature vector
    β†’ includes heuristic + PageRank-derived metrics

πŸ“₯ Additional Dataset and Checkpoints

You can download the balanced phishing/legitimate URL dataset collected from Common Crawl and URLhaus as well as pretrained deep learning model checkpoints from the following link:

πŸ”— Download Link (Baidu Cloud)

πŸ“¦ Link: https://pan.baidu.com/s/1_4wVWxnYk4OoVasEJDXjnQ?pwd=5evu
πŸ” Access Code: 5evu

🧾 Citation

@program{SSS-CW2025Huang,
  title={Multi-Granular Context Fusion Network for Phishing URLs Detection},
  author={Hao Huang, Chuyu Zhao, Mingshu Tan, Zhuyi Li, Tianshu Wen, Zijie Chen, Yu Meng, Yitong Zhou}
  year={2025}
}

⭐️Star History

Star History Chart

πŸ™ Acknowledgments

If you find this project useful, consider citing or starring 🌟 the repo.

This project also benefits from insights and ideas found in related open-source efforts in the phishing detection community, including prior work on dataset structuring and evaluation pipelines (e.g., dephides).

About

MGCF-Net for Phishing URLs Detection

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  
0