On debian like systems, run the following command:
sudo apt install libgmp3-dev gcc-multilib gdb python3 python3-pip python3-venv openjdk-17-jdk libgmp-dev pkg-config opam
You must also install Ghidra and add the GHIDRA environment variable with the installation directory of ghidra
export GHIDRA=<ghidra-directory>
The easiest way to install xyntia 8000 is to create an opam switch. It will automatically install xyntia and its dependencies:
$ cd <xyntia-directory>
$ opam switch create . 4.14.1 -y # or any version >= 4.14.1
$ eval $(opam env)
The help of xyntia is available through xyntia -help
. In the following we will explain the two ways for using xyntia.
To synthesize a function from a sampling file, execute the following command:
$ xyntia [-ops <grammar>] [-time <time>] [-heur <heur>] <file.json>
where grammar is the abbreviation of the grammar used to define the search space (see below), heur is the abbreviation for the search heuristic to be used (see below), time is the time budget of the synthesis in seconds, and file is the path of the sampling file.
The sampling file should be in the format produced by Syntia's random sampling module. The file examples/samples/example.json
is an example of a sampling file.
You can let xyntia sample the output from a binary and synthesize them with the following command:
$ xyntia [-ops <grammar>] [-time <time>] [-heur <heur>] -bin <binary-file> -config <config-file>
where binary-file is the path to the binary to analyze and config-file is a scriptig file to state which output to sample and how to sample.
The scripting language used in config-file is an extension of the Binsec scripting language. It adds the following new declarations and instructions:
sample N [ reg_1, ..., reg_n ]
, which specify to generate N samples for each output. A list of target register outputs can be set, in such a case, only these are sampled otherwise all detected outputs are;set domain VAR [MIN, MAX]
which specify the sampling domain for the VAR input. VAR can be a register or any DBA variable but not a memory cell. To specify the domain of a memory cell, see this example;prune constant outputs
which prunes all the constant outputs (i.e., with no input variables) from the set of sampled outputs;set optimal sampling
to use the sampling strategy depicted in the Xyntia paper.
For example to synthesize the eax
output of the add function, run
$ cd examples/bin && make && cd -
$ xyntia -bin examples/bin/add -config examples/bin/add.ini
In this case, the file add.ini
is:
$ cat examples/bin/add.ini
starting from <add>
set sample output stdout
explore all
hook <add:last> with
sample 100 eax
halt
end
The starting from
instruction specifies the start of the reverse window.
The hook
one is the end of the reverse window (remark: <add:last>
stands for the last byte of the add
function; it works because the last instruction of add
is a ret
, whose opcode is only 1 byte long).
Other examples of scripts can be found in the sampler/examples
directory.
It is also possible to sample a symbolic expression directly. An example is given in sampler/examples/expr.ini
. To sample it, use:
# The <() is here to replace the binary path. Indeed, in this case we do not need any binary (only an empty file)
xyntia -bin <() -config sampler/examples/expr.ini
grammar | abbreviation |
---|---|
Mixed Boolean Arithmetic (MBA) | mba |
MBA+Division | expr |
MBA+Division+Mod+Shift | full |
MBA+Shift | mba_shift |
MBA+If then else | mba_ite |
heuristic | abbreviation |
---|---|
Iterated Local Search | ils |
Hill Climbing | hc |
Random Walk | rw |
Simulated Annealing | sa |
Metropolis-Hastings | mh |
To easily compare other synthesizers with Xyntia or apply them to deobfuscation tasks, we provide a way to extract the
the synthesis problem in the standard SyGUS format. To do so, just use the -sygus
option.
For instance, to extract the sygus problem from a binary code, run:
$ xyntia -bin <binary-file> -config <config-file> -sygus
All datasets and scripts are given to reproduce expriments presented in [1]. Especially, it contains the B1 dataset from the Syntia paper [2] (Thank you Tim Blazytko for sharing it with us), our B2 dataset and the datasets used to evaluate anti-black-box deobfuscation.
To facilitate installation, we also give the requirements.txt
to easily install the python dependencies.
To create and activate a python environment, execute the following commands (optional):
$ python3 -m venv <name-virtualenv> # create a virtual environment for python3
$ source <name-virtualenv>/bin/activate # active the virtual environment
Then, install dependencies:
$ pip install -r requirements.txt
Datasets used in [1] can be found in the ./datasets
directory.
To launch Xyntia over a dataset (e.g., B2) with a given timeout (e.g., 1s) execute the following commands:
$ python3 ./scripts/bench/bench.py --dataset datasets/b2 --out results --parallel -- xyntia -check -time 1
The option and their meanings can be found through the --help
option.
We also give ./scripts/utils/all_from_trace.sh
which traces code execution with Ghidra or GDB, extracts each code block executed, samples and synthesizes them.
The manual is available through ./scripts/utils/all_from_trace.sh --help
and it can be run as follows:
$ XYNTIA="xyntia <options>" # the xyntia command to use
$ ./scripts/utils/all_from_trace.sh --outdir <resdir> --all -- binary arg1 arg2 ...
Here is an example:
$ cd examples/bin && make && cd -
$ ./scripts/utils/all_from_trace.sh --outdir <resdir> --all -- ./examples/bin/add
[1] Menguy, G., Bardin, S., Bonichon, R., & Lima, C. D. S. (2021, November). Search-Based Local Black-Box Deobfuscation: Understand, Improve and Mitigate. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (pp. 2513-2525).
[2] Blazytko, T., Contag, M., Aschermann, C., & Holz, T. (2017). Syntia: Synthesizing the semantics of obfuscated code. In 26th USENIX Security Symposium (USENIX Security 17) (pp. 643-659).