8000 GitHub - GlgApr/Malware-Analyzer: Automation Batch Malware Analyzer using Ghidra headlessAnalyzer
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

GlgApr/Malware-Analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Malware Analysis with Ghidra Headless

A comprehensive framework for automated static malware analysis using Ghidra's headless mode. This project extracts key features from malware binaries to create datasets for machine learning models and malware research.

Malware Source: vx-underground, virussign, malwarebazaar, etc.

Ransomware: https://github.com/Cryakl/Ransomware-Database/

Overview

This project combines Python and Java scripts to analyze malware samples using NSA's Ghidra reverse engineering framework. The system:

  1. Automates batch analysis of multiple malware samples
  2. Extracts key static features from binaries
  3. Generates structured datasets (CSV) for machine learning
  4. Detects suspicious indicators like dangerous APIs, obfuscation techniques, and more

Architecture

The system operates in two modes:

  • GUI Mode: Interactive analysis through Ghidra's graphical interface

Graph CLI

  • Headless Mode: Automated batch processing through command-line

Graph CLI

The workflow consists of:

  1. Python script scans for malware files
  2. For each file, Ghidra headless analyzer is invoked
  3. Java script extracts features during analysis
  4. Results are collected into a CSV dataset

Features Extracted

The system extracts the following features from binary files:

Feature Description
FuncCount Total number of functions
AvgInstrPerFunc Average instructions per function
APIEntropy Entropy score of API calls
SuspStringCount Count of suspicious strings
MaxRefCount Maximum reference count
ObfuscationScore Degree of code obfuscation
IsAnomalous Binary classification of anomalous patterns

Setup Instructions

Prerequisites

  • Ghidra 11.3.1 or higher
  • Python 3.6+
  • Java Runtime Environment

Installation

  1. Clone this repository:

    git clone https://github.com/GlgApr/Malware-Analyzer.git
    cd Malware-Analyzer
  2. Configure paths in malware_analyzer.py:

    GHIDRA_PATH = "/path/to/your/ghidra/"
    PROJECT_PATH = "/path/to/store/ghidra/projects"
    MALWARE_PATH = "/path/to/malware/samples"
    OUTPUT_PATH = "/path/to/output/directory"
  3. Place ExtractMalwareFeatures.java in your Ghidra scripts directory or specify its path in the Python script.

Usage

Running in Headless Mode

python3 malware_analyzer.py

This will:

  1. Scan the malware directory for supported file formats
  2. Process each file with Ghidra headless analyzer
  3. Extract features and save them to a CSV file
  4. Clean up temporary projects to save disk space

Supported File Formats

The analyzer supports multiple file formats including:

  • Windows executables (.exe, .dll)
  • Linux ELF files (.elf)
  • Scripts (.js, .ps1, .hta, .bat)
  • Archive files (.zip)
  • And more

Output

The system generates a CSV file (malware_features.csv) containing extracted features that can be used for:

  • Machine learning model training
  • Static analysis research
  • Malware family clustering
  • Threat intelligence

Example output format:

FileName,FuncCount,AvgInstrPerFunc,APIEntropy,SuspStringCount,MaxRefCount,ObfuscationScore,IsAnomalous
malware1.exe,245,18.7,0.78,12,87,0.65,True
malware2.dll,132,15.2,0.54,5,42,0.32,False

Advantages & Limitations

Advantages

  • Fully automated analysis pipeline
  • Comprehensive feature extraction
  • Integration with Ghidra's powerful decompilation
  • Batch processing of multiple files

Limitations

  • Static analysis only (no runtime behavior)
  • Heavily packed malware requires manual unpacking
  • Resource-intensive for large binaries (>100MB)

Future Developments

  • Parallel processing for faster analysis
  • Integration with dynamic analysis sandboxes
  • Interactive dashboard for visualization
  • Enhanced feature extraction

Data Examples

Sample output data is available in the repository under output/malware_features.csv. Additional logs and feature extraction results can be found in the output directory.

Project Files

  • malware_analyzer.py: Main Python script for batch processing
  • ExtractMalwareFeatures.java: Ghidra script for feature extraction
  • output/: Directory containing analysis results and CSV datasets

Resources

Project resources available at:

License

MIT License

Author

About

Automation Batch Malware Analyzer using Ghidra headlessAnalyzer

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0