Computer Science > Machine Learning

arXiv:2403.11353 (cs)

[Submitted on 17 Mar 2024 (v1), last revised 16 Dec 2024 (this version, v4)]

Title:TransPeakNet: Solvent-Aware 2D NMR Prediction via Multi-Task Pre-Training and Unsupervised Learning

Authors:Yunrui Li, Hao Xu, Ambrish Kumar, Duosheng Wang, Christian Heiss, Parastoo Azadi, Pengyu Hong

Abstract:Nuclear Magnetic Resonance (NMR) spectroscopy is essential for revealing molecular structure, electronic environment, and dynamics. Accurate NMR shift prediction allows researchers to validate structures by comparing predicted and observed shifts. While Machine Learning (ML) has improved one-dimensional (1D) NMR shift prediction, predicting 2D NMR remains challenging due to limited annotated data. To address this, we introduce an unsupervised training framework for predicting cross-peaks in 2D NMR, specifically Heteronuclear Single Quantum Coherence (HSQC).Our approach pretrains an ML model on an annotated 1D dataset of 1H and 13C shifts, then finetunes it in an unsupervised manner using unlabeled HSQC data, which simultaneously generates cross-peak annotations. Our model also adjusts for solvent effects. Evaluation on 479 expert-annotated HSQC spectra demonstrates our model's superiority over traditional methods (ChemDraw and Mestrenova), achieving Mean Absolute Errors (MAEs) of 2.05 ppm and 0.165 ppm for 13C shifts and 1H shifts respectively. Our algorithmic annotations show a 95.21% concordance with experts' assignments, underscoring the approach's potential for structural elucidation in fields like organic chemistry, pharmaceuticals, and natural products.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Chemical Physics (physics.chem-ph)
Cite as:	arXiv:2403.11353 [cs.LG]
	(or arXiv:2403.11353v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2403.11353

Submission history

From: Yunrui Li [view email]
[v1] Sun, 17 Mar 2024 21:52:51 UTC (13,070 KB)
[v2] Tue, 28 May 2024 02:44:24 UTC (4,263 KB)
[v3] Thu, 30 May 2024 23:18:46 UTC (7,917 KB)
[v4] Mon, 16 Dec 2024 00:31:21 UTC (20,862 KB)

Computer Science > Machine Learning

Title:TransPeakNet: Solvent-Aware 2D NMR Prediction via Multi-Task Pre-Training and Unsupervised Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:TransPeakNet: Solvent-Aware 2D NMR Prediction via Multi-Task Pre-Training and Unsupervised Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators