Quantitative Biology > Biomolecules

arXiv:2306.12360 (q-bio)

[Submitted on 8 Jun 2023 (v1), last revised 15 Mar 2024 (this version, v2)]

Title:Protein Discovery with Discrete Walk-Jump Sampling

Authors:Nathan C. Frey, Daniel Berenberg, Karina Zadorozhny, Joseph Kleinhenz, Julien Lafrance-Vanasse, Isidro Hotzel, Yan Wu, Stephen Ra, Richard Bonneau, Kyunghyun Cho, Andreas Loukas, Vladimir Gligorijevic, Saeed Saremi

View PDF HTML (experimental)

Abstract:We resolve difficulties in training and sampling from a discrete generative model by learning a smoothed energy function, sampling from the smoothed data manifold with Langevin Markov chain Monte Carlo (MCMC), and projecting back to the true data manifold with one-step denoising. Our Discrete Walk-Jump Sampling formalism combines the contrastive divergence training of an energy-based model and improved sample quality of a score-based model, while simplifying training and sampling by requiring only a single noise level. We evaluate the robustness of our approach on generative modeling of antibody proteins and introduce the distributional conformity score to benchmark protein generative models. By optimizing and sampling from our models for the proposed distributional conformity score, 97-100% of generated samples are successfully expressed and purified and 70% of functional designs show equal or improved binding affinity compared to known functional antibodies on the first attempt in a single round of laboratory experiments. We also report the first demonstration of long-run fast-mixing MCMC chains where diverse antibody protein classes are visited in a single MCMC chain.

Comments:	ICLR 2024 oral presentation, top 1.2% of submissions; {ICLR 2023 Physics for Machine Learning, NeurIPS 2023 GenBio, MLCB 2023} Spotlight
Subjects:	Biomolecules (q-bio.BM); Machine Learning (cs.LG)
Cite as:	arXiv:2306.12360 [q-bio.BM]
	(or arXiv:2306.12360v2 [q-bio.BM] for this version)
	https://doi.org/10.48550/arXiv.2306.12360

Submission history

From: Nathan Frey [view email]
[v1] Thu, 8 Jun 2023 17:03:46 UTC (2,740 KB)
[v2] Fri, 15 Mar 2024 19:16:01 UTC (1,689 KB)

Quantitative Biology > Biomolecules

Title:Protein Discovery with Discrete Walk-Jump Sampling

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Quantitative Biology > Biomolecules

Title:Protein Discovery with Discrete Walk-Jump Sampling

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators