8000 GitHub - WaymentSteeleLab/Dyna-1: Model for predicting micro-millisecond motions from protein sequence and/or structure
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Model for predicting micro-millisecond motions from protein sequence and/or structure

License

Notifications You must be signed in to change notification settings

WaymentSteeleLab/Dyna-1

Repository files navigation

Dyna-1

Requires Python 3.10+ Open In Colab Hugging Face Spaces image

Dyna-1 is a model introduced in our paper, "Learning millisecond protein dynamics from what is missing in NMR spectra".

Given a sequence and/or structure, Dyna-1 will predict the probability that each residue experiences micro-millisecond motions.

Dyna-1 was achieved using the esm3-sm-open-v1 weights from ESM-3. Inference with this model is subject to the EvolutionaryScale Cambrian Non-Commercial License Agreement of the ESM-3 Model and requires read permission of the weights found here. We also make available an alternate version of Dyna-1 that uses ESM-2 embeddings; use of this model is subject to a Non-Commercial License Agreement.

To make Dyna-1 readily accessible for research purposes, we also provide a Google Colab.

We provide the curated datasets used to evaluate Dyna-1: 133 curated R1/R2/NOE datasets "RelaxDB" and 10 relaxation-dispersion Carr-Purcell-Meiboom-Gill datasets "RelaxDB-CPMG".

In this repository we provide:

If you have any questions not covered here, please create an issue and we will get back to you ASAP.

Installation

To run the scripts in this repository, we recommend using a conda environment. First, clone this repository. Then navigate to the root directory and run:

conda create -n dyna1 python=3.11
conda activate dyna1

This package requires PyTorch, ideally with GPU support. For more information, follow instructions from https://pytorch.org/get-started/locally/ for your system and CUDA version. We used PyTorch 2.5.0 with CUDA 12.4. To install all of the requirements:

pip install -r requirements.txt

Then, download the model weights and upload them to the model/weights folder. The weights can be found on 🤗HuggingFace at gelnesr/Dyna-1. More information on how to download them can be found here.

Inference

The best-performing Dyna-1 is based on ESM-3. To run this version, you will have to request access to the ESM-3 esm3-sm-open-v1 weights at HuggingFace here. Follow the steps to agree to their License terms and receive your access token to the model weights.

Note

If this is your first time requesting access to the ESM-3 weights, you may need to set up your access token. For more information on how to set up an SSH token, please consult this tutorial. Alternatively, you can use the huggingface login prompt, which will prompt you for the access token each time you re-instantiate Dyna-1. This can be configured by adding the following code to the inference script: from huggingface_hub import login; login()

To run inference using our best-performing model, run:

python dyna1.py --pdb <PDB CODE or PATH> --chain <CHAIN> --name <NAME> --use_pdb_seq --write_to_pdb

We provide three options for running inference: sequence and structure input (best performance!), sequence only, or structure only. Additionally, we make it possible to test different sequences for the same backbone. Examples on how to run each of these modes can be found in the scripts/ folder.

To output the probabilities to the input structure, make sure to pass the --write_to_pdb flag.

Alternatively, we also provide a version of Dyna-1 based on ESM-2. To run inference using this version of the model, run:

python dyna1-esm2.py --sequence <SEQUENCE> --name <NAME>

Visualization

We visualize probabilities of exchange on protein structures using PyMol. To re-create the putty visualization on your protein, import the pdb file into PyMol and copy-paste the following commands into the PyMol command line:

cartoon putty; set cartoon_putty_transform, 6; set cartoon_putty_radius, 0.25; set cartoon_putty_range, 0.1; set cartoon_putty_scale_max, 10

Annotated:

cartoon putty
set cartoon_putty_transform, 6 #scaled linear transformation
set cartoon_putty_radius, 0.25 # min radius for p=0
set cartoon_putty_range, 0.1 # min_radius / max_radius, sets max_radius=2.5
set cartoon_putty_scale_max, 10 #max_radius / min_radius

Datasets

RelaxDB contains 133 R1/R2/NOE datasets curated from the BMRB and from literature.

RelaxDB-CPMG contains motion labels derived from 10 CPMG relaxation-dispersion datasets curated from literature.

These datasets are made available on 🤗HuggingFace at datasets/gelnesr/RelaxDB.

In this repo, you can find:

  • data formatted for input into Dyna-1 is in data/RelaxDB_pkls_22jan2025.zip
  • datasets in json format is in data/RelaxDB_datasets/
  • demo notebooks for demo notebooks for visualizing and using datasets to evaluate model outputs in analysis/

Training

Training code will be made available upon journal publication.

Citation

If you are using our code, datasets, or model, please use the following citation:

@article {Dyna-1,
    author = {Wayment-Steele, Hannah K. and El Nesr, Gina and Hettiarachchi, Ramith and Kariyawasam, Hasindu and Ovchinnikov, Sergey and Kern, Dorothee},
    title = {Learning millisecond protein dynamics from what is missing in NMR spectra},
    year = {2025},
    doi = {10.1101/2025.03.19.642801},
    journal = {bioRxiv}
}

Acknowledgements

We would like to acknowledge the Evolutionary Scale Team for their contributions to the field with ESM-3. The code in esm is imported from evolutionaryscale/esm with all modifications identified and includes the associated LICENSE terms for the ESM-3 model.

We would also like to acknowledge the FAIR Team for their contributions to the field with ESM-2. The ESM-2 model is called using the HuggingFace API call.

We thank Katie Henzler-Wildman, Magnus Wolf-Watz, Elan Eisenmesser, J. Patrick Loria, Marcellus Ubbelink, George Lisi, Sam Butcher, and Nicolas Doucet for sharing data. We thank Martin Stone for sharing the Indiana Dynamics Database data his group curated in 2000.

0