BinT5: Binary Code Summarisation Model
Creators
- 1. Delft University of Technology
- 2. University of California, Davis
Description
BinT5
This dataset is published as part of the paper: "Extending Source Code Pre-Trained Language Models to Summarise Decompiled Binaries".
It includes the fine-tuned CodeT5 checkpoints, packaged in a single .zip file.
For each of the models, a `pytorch.bin` file is provided in its respective folder.
These models can be loaded into CodeT5 and used for inference or further training.
To utilise the models, download the reference CodeT5-base model from HuggingFace:
> GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/Salesforce/codet5-base
- This will pull the repo but skip the `pytorch_model.bin` file, which will be replaced in the next step.
- Select the model that you wish to use from the respective directory. Copy this file and replace the `pytorch_model.bin` in the local `codet5-base` directory downloaded in the previous step.
- Instead of loading in the model through HuggingFace, load in a local model. To load a local model, change line 66 in the `sh/exp_with_args.sh` file to the path of your local `codet5-base` model which you downloaded and configured in the previous step. The tokenizer does not need to be replaced.
- The model can now be run by executing `sh/run_exp.py`
License
Copyright 2022 ##########
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at:
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Files
BinT5.zip
Files
(4.5 GB)
Name | Size | Download all |
---|---|---|
md5:fe42b04dcf8fa5d78de14ecd6cfed07c
|
4.5 GB | Preview Download |