8000 GitHub - blingbell/MSNGO
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

blingbell/MSNGO

Repository files navigation

MSNGO: Multi-species protein function annotation based on 3D protein structure and network propagation

This is the code repository for protein function prediction model MSNGO.

MSNGO is a a multi-species protein function prediction model based on structural features and heterogeneous network propagation, which provides a structure encoder and can propagate structural feature on heterogeneous network for predicting Gene Ontology terms.

overview

Dependencies

  • The code was developed and tested using python 3.8.
  • To install python dependencies run: pip install -r requirements.txt. Some libraries may need to be installed via conda.
  • The version of CUDA is cudatoolkit==11.3.1

Data

uniprotstringgoagoalphafold


The data used are:

We also provide a small dataset which has less than 50 proteins to quickly test the model. It can be found here.

For a detailed description of data files, please see here.

Train

Read here to get a quick start. If you want to train on your own dataset, please download esm2_t33_650M_UR50D.pt to MSNGO/esm2_t33_650M_UR50D/

Preprocessing.sh is for processing your raw data.

Then run the following command, it can process raw data.

./scripts/preprocessing.sh

The mf, bp, and cc branches will be trained, predicted, and evaluated by the following files respectively.

./scripts/run_mf.sh
./scripts/run_bp.sh
./scripts/run_cc.sh

Predict

Our trained model can be downloaded from here.

You can use the model directly to get predictions. Run the predict.py script to make predictions about the input file (e.g. for MFO):

python predict.py --ontology mf -f your_test.fasta

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0