Malware Analysis with Ghidra Headless

A comprehensive framework for automated static malware analysis using Ghidra's headless mode. This project extracts key features from malware binaries to create datasets for machine learning models and malware research.

Malware Source: vx-underground, virussign, malwarebazaar, etc.

Ransomware: https://github.com/Cryakl/Ransomware-Database/

Overview

This project combines Python and Java scripts to analyze malware samples using NSA's Ghidra reverse engineering framework. The system:

Automates batch analysis of multiple malware samples
Extracts key static features from binaries
Generates structured datasets (CSV) for machine learning
Detects suspicious indicators like dangerous APIs, obfuscation techniques, and more

Architecture

The system operates in two modes:

GUI Mode: Interactive analysis through Ghidra's graphical interface

Headless Mode: Automated batch processing through command-line

The workflow consists of:

Python script scans for malware files
For each file, Ghidra headless analyzer is invoked
Java script extracts features during analysis
Results are collected into a CSV dataset

Features Extracted

The system extracts the following features from binary files:

Feature	Description
FuncCount	Total number of functions
AvgInstrPerFunc	Average instructions per function
APIEntropy	Entropy score of API calls
SuspStringCount	Count of suspicious strings
MaxRefCount	Maximum reference count
ObfuscationScore	Degree of code obfuscation
IsAnomalous	Binary classification of anomalous patterns

Setup Instructions

Prerequisites

Ghidra 11.3.1 or higher
Python 3.6+
Java Runtime Environment

Installation

Clone this repository:

git clone https://github.com/GlgApr/Malware-Analyzer.git
cd Malware-Analyzer

Configure paths in malware_analyzer.py:

GHIDRA_PATH = "/path/to/your/ghidra/"
PROJECT_PATH = "/path/to/store/ghidra/projects"
MALWARE_PATH = "/path/to/malware/samples"
OUTPUT_PATH = "/path/to/output/directory"

Place ExtractMalwareFeatures.java in your Ghidra scripts directory or specify its path in the Python script.

Usage

Running in Headless Mode

python3 malware_analyzer.py

This will:

Scan the malware directory for supported file formats
Process each file with Ghidra headless analyzer
Extract features and save them to a CSV file
Clean up temporary projects to save disk space

Supported File Formats

The analyzer supports multiple file formats including:

Windows executables (.exe, .dll)
Linux ELF files (.elf)
Scripts (.js, .ps1, .hta, .bat)
Archive files (.zip)
And more

Output

The system generates a CSV file (malware_features.csv) containing extracted features that can be used for:

Machine learning model training
Static analysis research
Malware family clustering
Threat intelligence

Example output format:

FileName,FuncCount,AvgInstrPerFunc,APIEntropy,SuspStringCount,MaxRefCount,ObfuscationScore,IsAnomalous
malware1.exe,245,18.7,0.78,12,87,0.65,True
malware2.dll,132,15.2,0.54,5,42,0.32,False

Advantages & Limitations

Advantages

Fully automated analysis pipeline
Comprehensive feature extraction
Integration with Ghidra's powerful decompilation
Batch processing of multiple files

Limitations

Static analysis only (no runtime behavior)
Heavily packed malware requires manual unpacking
Resource-intensive for large binaries (>100MB)

Future Developments

Parallel processing for faster analysis
Integration with dynamic analysis sandboxes
Interactive dashboard for visualization
Enhanced feature extraction

Data Examples

Sample output data is available in the repository under output/malware_features.csv. Additional logs and feature extraction results can be found in the output directory.

Project Files

malware_analyzer.py: Main Python script for batch processing
ExtractMalwareFeatures.java: Ghidra script for feature extraction
output/: Directory containing analysis results and CSV datasets

Resources

Project resources available at:

License

MIT License

Author

GitHub: GlgApr
Contact: 0xAnarki@proton.me

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
1. Malware		1. Malware
2. Ransomware		2. Ransomware
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Malware Analysis with Ghidra Headless

Overview

Architecture

Features Extracted

Setup Instructions

Prerequisites

Installation

Usage

Running in Headless Mode

Supported File Formats

Output

Advantages & Limitations

Advantages

Limitations

Future Developments

Data Examples

Project Files

Resources

License

Author

About

Uh oh!

Releases

Packages

Languages

GlgApr/Malware-Analyzer

Folders and files

Latest commit

History

Repository files navigation

Malware Analysis with Ghidra Headless

Overview

Architecture

Features Extracted

Setup Instructions

Prerequisites

Installation

Usage

Running in Headless Mode

Supported File Formats

Output

Advantages & Limitations

Advantages

Limitations

Future Developments

Data Examples

Project Files

Resources

License

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages