8000 GitHub - binsec/xyntia: Xyntia, the black-box deobfuscator
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

binsec/xyntia

Repository files navigation

Table of Contents

  1. Requirements
  2. Installation
  3. Usage
    1. Synthesizing functions from sampling files
    2. Synthesizing functions from a binary
    3. Grammar abbreviations
    4. Heuristic abbreviations
    5. Generating SyGUS problems
  4. Experiments
  5. Synthesize all blocks from a execution trace
  6. References

Requirements

System requirements

On debian like systems, run the following command:

sudo apt install libgmp3-dev gcc-multilib gdb python3 python3-pip python3-venv openjdk-17-jdk libgmp-dev pkg-config opam

You must also install Ghidra and add the GHIDRA environment variable with the installation directory of ghidra

export GHIDRA=<ghidra-directory>

Installation

The easiest way to install xyntia 8000 is to create an opam switch. It will automatically install xyntia and its dependencies:

$ cd <xyntia-directory>
$ opam switch create . 4.14.1 -y # or any version >= 4.14.1
$ eval $(opam env)

Usage

The help of xyntia is available through xyntia -help. In the following we will explain the two ways for using xyntia.

Synthesizing functions from sampling files

To synthesize a function from a sampling file, execute the following command:

$ xyntia [-ops <grammar>] [-time <time>] [-heur <heur>] <file.json>

where grammar is the abbreviation of the grammar used to define the search space (see below), heur is the abbreviation for the search heuristic to be used (see below), time is the time budget of the synthesis in seconds, and file is the path of the sampling file.

The sampling file should be in the format produced by Syntia's random sampling module. The file examples/samples/example.json is an example of a sampling file.

Synthesizing functions from a binary

You can let xyntia sample the output from a binary and synthesize them with the following command:

$ xyntia [-ops <grammar>] [-time <time>] [-heur <heur>] -bin <binary-file> -config <config-file> 

where binary-file is the path to the binary to analyze and config-file is a scriptig file to state which output to sample and how to sample.

The scripting language used in config-file is an extension of the Binsec scripting language. It adds the following new declarations and instructions:

  • sample N [ reg_1, ..., reg_n ], which specify to generate N samples for each output. A list of target register outputs can be set, in such a case, only these are sampled otherwise all detected outputs are;
  • set domain VAR [MIN, MAX] which specify the sampling domain for the VAR input. VAR can be a register or any DBA variable but not a memory cell. To specify the domain of a memory cell, see this example;
  • prune constant outputs which prunes all the constant outputs (i.e., with no input variables) from the set of sampled outputs;
  • set optimal sampling to use the sampling strategy depicted in the Xyntia paper.

For example to synthesize the eax output of the add function, run

$ cd examples/bin && make && cd -
$ xyntia -bin examples/bin/add -config examples/bin/add.ini

In this case, the file add.ini is:

$ cat examples/bin/add.ini
starting from <add>

set sample output stdout

explore all

hook <add:last> with 
    sample 100 eax
    halt
end

The starting from instruction specifies the start of the reverse window. The hook one is the end of the reverse window (remark: <add:last> stands for the last byte of the add function; it works because the last instruction of add is a ret, whose opcode is only 1 byte long).

Other examples of scripts can be found in the sampler/examples directory.

It is also possible to sample a symbolic expression directly. An example is given in sampler/examples/expr.ini. To sample it, use:

# The <() is here to replace the binary path. Indeed, in this case we do not need any binary (only an empty file)
xyntia -bin <() -config sampler/examples/expr.ini

Grammar abbreviations

grammar abbreviation
Mixed Boolean Arithmetic (MBA) mba
MBA+Division expr
MBA+Division+Mod+Shift full
MBA+Shift mba_shift
MBA+If then else mba_ite

Heuristic abbreviations

heuristic abbreviation
Iterated Local Search ils
Hill Climbing hc
Random Walk rw
Simulated Annealing sa
Metropolis-Hastings mh

Generating SyGUS problems

To easily compare other synthesizers with Xyntia or apply them to deobfuscation tasks, we provide a way to extract the the synthesis problem in the standard SyGUS format. To do so, just use the -sygus option.

For instance, to extract the sygus problem from a binary code, run:

$ xyntia -bin <binary-file> -config <config-file> -sygus

Experiments

All datasets and scripts are given to reproduce expriments presented in [1]. Especially, it contains the B1 dataset from the Syntia paper [2] (Thank you Tim Blazytko for sharing it with us), our B2 dataset and the datasets used to evaluate anti-black-box deobfuscation.

Python dependencies

To facilitate installation, we also give the requirements.txt to easily install the python dependencies.

To create and activate a python environment, execute the following commands (optional):

$ python3 -m venv <name-virtualenv> # create a virtual environment for python3
$ source <name-virtualenv>/bin/activate # active the virtual environment

Then, install dependencies:

$ pip install -r requirements.txt

Launch expriments

Datasets used in [1] can be found in the ./datasets directory. To launch Xyntia over a dataset (e.g., B2) with a given timeout (e.g., 1s) execute the following commands:

$ python3 ./scripts/bench/bench.py --dataset datasets/b2 --out results --parallel -- xyntia -check -time 1

The option and their meanings can be found through the --help option.

Synthesize all blocks from a execution trace

We also give ./scripts/utils/all_from_trace.sh which traces code execution with Ghidra or GDB, extracts each code block executed, samples and synthesizes them. The manual is available through ./scripts/utils/all_from_trace.sh --help and it can be run as follows:

$ XYNTIA="xyntia <options>" # the xyntia command to use
$ ./scripts/utils/all_from_trace.sh --outdir <resdir> --all -- binary arg1 arg2 ...

Here is an example:

$ cd examples/bin && make && cd -
$ ./scripts/utils/all_from_trace.sh --outdir <resdir> --all -- ./examples/bin/add

References

[1] Menguy, G., Bardin, S., Bonichon, R., & Lima, C. D. S. (2021, November). Search-Based Local Black-Box Deobfuscation: Understand, Improve and Mitigate. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (pp. 2513-2525).

[2] Blazytko, T., Contag, M., Aschermann, C., & Holz, T. (2017). Syntia: Synthesizing the semantics of obfuscated code. In 26th USENIX Security Symposium (USENIX Security 17) (pp. 643-659).

0