Abstract
Retrosynthesis and reaction outcome prediction are fundamental problems in organic chemistry and computer-aided synthesis planning (CASP), which are also crucial parts of computer-aided drug design. In recent years, deep learning has spawned a branch of methods which use machine translation frameworks with SMILES data representation to solve these problems. With the successive introduction of additional inverted reaction data as well as the molecular graph representation, the accuracy and validity of machine-transaction-based approaches have been further improved. In this work, we propose a bidirectional graph-to-sequence model (BiG2S) that combines the benefits of inverted reaction training and graph representation. The proposed approach has the ability to provide high-quality retrosynthesis and forward synthesis prediction simultaneously on various datasets, which achieves \(5.5\%\) top-1 accuracy with only \(0.1\%\) invalid results on USPTO-50k retrosynthesis task, and maintain \(85.0\%\) top-1 accuracy for outcome prediction with the same model.
Graphical abstract
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Blakemore DC, Castro L, Churcher I et al (2018) Organic synthesis provides opportunities to transform drug discovery. Nat Chem 10(4):383–394. https://doi.org/10.1038/s41557-018-0021-z
Coley CW, Green WH, Jensen KF (2018) Machine learning in computer-aided synthesis planning. Acc Chem Res 51(5):1281–1289. https://doi.org/10.1021/acs.accounts.8b00087
Segler MHS, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chem - Eur J 23(25):5966–5971. https://doi.org/10.1002/chem.201605499
Segler MHS, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic ai. Nature 555(7698):604–610. https://doi.org/10.1038/nature25978
Coley CW, Rogers L, Green WH et al (2017) Computer-assisted retrosynthesis based on molecular similarity. ACS Cent Sci 3(12):1237–1245. https://doi.org/10.1021/acscentsci.7b00355
Dai H, Li C, Coley C et al (2019) Retrosynthesis prediction with conditional graph logic network. Adv Neural Inf Process Syst 32
Chen S, Jung Y (2021) Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au 1(10):1612–1620. https://doi.org/10.1021/jacsau.1c00246
Liu B, Ramsundar B, Kawthekar P et al (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS Cent Sci 3(10):1103–1113. https://doi.org/10.1021/acscentsci.7b00303
Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: International Conference on Artificial Neural Networks, Springer, pp 817–830
Zheng S, Rao J, Zhang Z et al (2019) Predicting retrosynthetic reactions using self-corrected transformer neural networks. J Chem Inf Model 60(1):47–55. https://doi.org/10.1021/acs.jcim.9b00949
Schwaller P, Laino T, Gaudin T et al (2019) Molecular transformer: A model for uncertainty-calibrated chemical reaction prediction. ACS Cent Sci 5(9):1572–1583. https://doi.org/10.1021/acscentsci.9b00576
Lin K, Xu Y, Pei J et al (2020) Automatic retrosynthetic route planning using template-free models. Chem Sci 11(12):3355–3364. https://doi.org/10.1039/c9sc03666k
Tetko IV, Karpov P, Van Deursen R et al (2020) State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis. Nat Commun 11(1):1–11. https://doi.org/10.1038/s41467-020-19266-y
Kim E, Lee D, Kwon Y et al (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. J Chem Inf Model 61(1):123–133. https://doi.org/10.1021/acs.jcim.0c01074
Tu Z, Coley CW (2022) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. J Chem Inf Model 62(15):3503–3513. https://doi.org/10.1021/acs.jcim.2c00321
Wan Y, Hsieh CY, Liao B et al (2022) Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In: International Conference on Machine Learning, PMLR, pp 22475–22490
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. Advances in Neural Information Processing Systems 30
Shi C, Xu M, Guo H et al (2020) A graph to graphs framework for retrosynthesis prediction. In: International Conference on Machine Learning, PMLR, pp 8818–8827
Yan C, Ding Q, Zhao P et al (2020) Retroxpert: Decompose retrosynthesis prediction like a chemist. Adv Neural Inf Process Syst 33:11248–11258
Somnath VR, Bunne C, Coley C et al (2021) Learning graph models for retrosynthesis prediction. Adv Neural Inf Process Syst 34:9405–9415
Sacha M, Błaz M, Byrski P et al (2021) Molecule edit graph attention network: Modeling chemical reactions as sequences of graph edits. J Chem Inf Model 61(7):3273–3284. https://doi.org/10.1021/acs.jcim.1c00537
Gilmer J, Schoenholz SS, Riley PF et al (2017) Neural message passing for quantum chemistry. In: International Conference on Machine Learning, PMLR, pp 1263–1272
Yang K, Swanson K, Jin W et al (2019) Analyzing learned molecular representations for property prediction. J Chem Inf Model 59(8):3370–3388. https://doi.org/10.1021/acs.jcim.9b00237
Song Y, Zheng S, Niu Z et al (2020) Communicative representation learning on attributed molecular graphs. In: International Joint Conference on Artificial Intelligence, pp 2831–2838, https://doi.org/10.24963/ijcai.2020/392
Hendrycks D, Gimpel K (2016) Gaussian error linear units (gelus). https://doi.org/10.48550/arxiv.1606.08415
Cho K, Van Merriënboer B, Bahdanau D et al (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1724–1734
Ying C, Cai T, Luo S et al (2021) Do transformers really perform badly for graph representation? Adv Neural Inf Process Syst 34:28877–28888
Dauphin YN, Fan A, Auli M et al (2017) Language modeling with gated convolutional networks. In: International Conference on Machine Learning, PMLR, pp 933–941
Zhang B, Sennrich R (2019) Root mean square layer normalization. Advances in Neural Information Processing Systems 32
Wang H, Ma S, Dong L et al (2022) Deepnet: Scaling transformers to 1,000 layers. https://doi.org/10.48550/arxiv.2203.00555
Su J, Lu Y, Pan S et al (2021) Roformer: Enhanced transformer with rotary position embedding. https://doi.org/10.48550/arxiv.2104.09864
Lin TY, Goyal P, Girshick R et al (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2980–2988, https://doi.org/10.1109/iccv.2017.324
Szegedy C, Vanhoucke V, Ioffe S et al (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2818–2826
Müller R, Kornblith S, Hinton GE (2019) When does label smoothing help? Advances in Neural Information Processing Systems 32
Klein G, Kim Y, Deng Y et al (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp 67–72
Wolf T, Debut L, Sanh V et al (2020) Transformers: State-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp 38–45, https://doi.org/10.18653/v1/2020.emnlp-demos.6
Lowe DM (2012) Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge, 10.17863/CAM.16293
Schneider N, Stiefl N, Landrum GA (2016) What’s what: The (nearly) definitive guide to reaction role assignment. J Chem Inf Model 56(12):2336–2346. https://doi.org/10.1021/acs.jcim.6b00564
Landrum G (2022) Rdkit: Open-source cheminformatics software. https://rdkit.org/
Jin W, Coley CW, Barzilay R et al (2017) Predicting organic reaction outcomes with weisfeiler-lehman network. Advances in Neural Information Processing Systems 30
Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. https://doi.org/10.48550/arxiv.1711.05101
Sun R, Dai H, Li L et al (2021) Towards understanding retrosynthesis by energy-based models. Adv Neural Inf Process Syst 34:10186–10194
Irwin R, Dimitriadis S, He J et al (2022) Chemformer: A pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3(1):015022. https://doi.org/10.1088/2632-2153/ac3ffb
Wang X, Li Y, Qiu J et al (2021) Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chem Eng J 420:129845. https://doi.org/10.1016/j.cej.2021.129845
Zhong W, Yang Z, Chen CYC (2023) Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing. Nat Commun 14(1):3009. https://doi.org/10.1038/s41467-023-38851-5
Seo SW, Song YY, Yang JY et al (2021) Gta: Graph truncated attention for retrosynthesis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 531–539, 10.1609/aaai.v35i1.16131
ASKCOS (2022) Askcos: Software tools for organic synthesis. https://askcos.mit.edu/
Coley CW, Rogers L, Green WH et al (2018) Scscore: Synthetic complexity learned from a reaction corpus. J Chem Inf Model 58(2):252–261. https://doi.org/10.1021/acs.jcim.7b00622
Acknowledgements
This work was supported by the National Natural Science Foundation of China (No.61976247) and the Fundamental Research Funds for the Central Universities (No.2682023ZTPY057). The authors thank all members of the research team for their suggestions and contributions to the research ideas and directions of this work. We thank AutoDL cloud computing service platform for providing GPUs rental service. Finally, the authors thank Jianlin Su for his analysis and explanation of the mathematical theory related to Transformer and various language models on the blog.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Feature and parameter settings
Table 10 summarizes the atom and bond features used in BiG2S, the majority of them are adapted from Graph2SMILES [15]. To support the dual-task capability of the model, additional task labels and optional reaction type information have been added to the atom features.
BiG2S introduces a special initialization method with the normalization of each layer in Transformer which are based on DeepNet [30]. Since the output of GLU [28] is significantly smaller in variance than the vanilla FFN [17] when the input data has the same distribution (e.g., standard normal distribution), BiG2S performs different initializations for each of the two types of structures above. The initialization and normalization methods are shown in Tables 11 and 12, where \(\textbf{W}_{query}\), \(\textbf{W}_{key}\), \(\textbf{W}_{value}\), and \(\textbf{W}_{out}\) denote the learnable weights of query, key, value, and the final output in attention layer; \(\textbf{W}_{FFN}\) and \(\textbf{W}_{GLU}\) denote the learnable weights of all linear layers in these two structures; N and M correspond to the number of layers of encoder and decoder, respectively.
Additionally, the hyperparameter settings for each dataset are shown in Table 13, where the primary batch loading type for BiG2S is the number of chemical reactions, with additional reactants and product token counts to constrain the batch loading due to the extremely long reactions in USPTO-full.
Appendix B: Statistical results during training and visualization results in dual-task inference
The statistical results for the training accuracy of each molecular structure and the performance variance on the validation set throughout the training are shown in Figs. 4 and 5.
We additionally perform reaction outcome prediction for the products and retrosynthesis for the reactants, as well as the results evaluation and SCScore [48] obtained from ASKCOS [47], the visualization results are shown in Figs. 6 and 7. Notice that when evaluating the retrosynthesis results of reactants, the result will be marked as “Highly ranked in ASKCOS” if its corresponding reactants (as the prediction results in ASKCOS forward synthesis prediction) rank in the top-5 when the results are input into ASKCOS for forward synthesis prediction, which is different from the evaluation of retrosynthesis results of the product that it requires the product to be the top-1 result in the forward synthesis results from ASKCOS (Fig. 2). Since the training of BiG2S only contains data related to the retrosynthesis of products and the reaction outcome prediction for reactants, the quality of model outputs is significantly reduced when these two additional tasks are performed, especially when multiple molecules of reactants are analyzed in retrosynthesis task at the same time, the model can therefore hardly obtain reasonable prediction results.
From the retrosynthesis results of the reactants, it is noticeable that the model attempts to generate the corresponding prediction results for each input molecule (such as the decomposition of 1-bromobut-2-yne in Fig. 7), but it is limited by the fact that this task is not involved in the training, which results in a considerable part of the molecules being the same as inputs or directly taking the product of the original reaction as one of the results. The retrosynthesis results are more reasonable when each molecule in reactants is input separately, some of which can further achieve a high ranking in the evaluation of ASKCOS. When performing forward synthesis prediction on each product, the results depend mainly on the bias of the model for various reaction types, reaction centers, and functional groups during training due to the lack of constraints and guidelines from other reactants.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hu, H., Jiang, Y., Yang, Y. et al. BiG2S: A dual task graph-to-sequence model for the end-to-end template-free reaction prediction. Appl Intell 53, 29620–29637 (2023). https://doi.org/10.1007/s10489-023-05048-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-05048-8