This is the repository related to our manuscript DepoScope: accurate phage depolymerase annotation and domain delineation using large language models, published in PLOS Computational Biology (link).
The easiest way to get started with DepoScope to predict depolymerases from your phage genomes or genes is to run the Google Colab that we provide here. You only need a zip file of your phage genomes to get started! Alternatively, go to the the scripts_clean
folder and run the VII.DpoDetectionTool.ipynb
notebook.
To run the benchmarking against other depolymerase detection tools, go to the benchmark
folder and run the benchmark_notebook.ipynb
notebook.
To run DepoScope as a script, a few steps are required.
The following instructions are for a Unix-based system, but they can be adapted to Windows with minimal changes.
First, install uv
with:
curl -LsSf https://astral.sh/uv/install.sh | less
Then, make sure that clang
is installed (it is required by phanotate
) using which clang
. If it is not installed, install it with sudo apt install clang
.
Now we need to download the pre-trained model weights. First we download and extract the fine-tuned ESM-2 model weights:
wget https://zenodo.org/records/10957073/files/esm2_t12_finetuned_depolymerases.zip
unzip esm2_t12_finetuned_depolymerases.zip
Then we download the pre-trained DepoScope model:
wget https://zenodo.org/records/10957073/files/Deposcope.esm2_t12_35M_UR50D.2203.full.model
Finally, clone the repository and run deposcope-predict.py
through uv
with the following command:
uv run deposcope-predict.py -i <input_fasta_file> -o <output_file_name> --esm2 <path_to_esm2_checkpoint> --Dpo <path_to_Dpo_model>
where:
<input_fasta_file>
is the path to the input fasta file with a single phage genome to be annotated.<output_file_name>
is the desired name for the output file.<path_to_esm2_checkpoint>
is the path to the folder containing the fine-tuned ESM-2 model checkpoint files.<path_to_Dpo_model>
is the path to the pre-trained DepoScope.model
file.