Abstract
Quantitative structure–activity relationship (QSAR) modelling, an approach that was introduced 60 years ago, is widely used in computer-aided drug design. In recent years, progress in artificial intelligence techniques, such as deep learning, the rapid growth of databases of molecules for virtual screening and dramatic improvements in computational power have supported the emergence of a new field of QSAR applications that we term ‘deep QSAR’. Marking a decade from the pioneering applications of deep QSAR to tasks involved in small-molecule drug discovery, we herein describe key advances in the field, including deep generative and reinforcement learning approaches in molecular design, deep learning models for synthetic planning and the application of deep QSAR models in structure-based virtual screening. We also reflect on the emergence of quantum computing, which promises to further accelerate deep QSAR applications and the need for open-source and democratized resources to support computer-aided drug design.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
£14.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
£139.00 per year
only £11.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Hansch, C., Maloney, P., Fujita, T. & Muir, R. Correlation of biological activity of phenoxyacetic acids with hammett substituent constants and partition coefficients. Nature 194, 178–180 (1962).
Cherkasov, A. et al. QSAR modeling: where have you been? Where are you going to? J. Med. Chem. 57, 4977–5010 (2014).
Muratov, E. N. et al. QSAR without borders. Chem. Soc. Rev. 49, 3525–3564 (2020).
Ivakhnenko, A. G. & Lapa, V. G. Cybernetics and Forecasting Techniques (American Elsevier Co, 1967).
Ma, J., Sheridan, R. P., Liaw, A., Dahl, G. E. & Svetnik, V. Deep neural nets as a method for quantitative structure-activity relationships. J. Chem. Inf. Model. 55, 263–274 (2015).
Chen, H., Engkvist, O., Wang, Y., Olivecrona, M. & Blaschke, T. The rise of deep learning in drug discovery. Drug Discov. Today 23, 1241–1250 (2018).
Yang, X., Wang, Y., Byrne, R., Schneider, G. & Yang, S. Concepts of artificial intelligence for computer-assisted drug discovery. Chem. Rev. 119, 10520–10594 (2019).
Jiménez-Luna, J., Grisoni, F. & Schneider, G. Drug discovery with explainable artificial intelligence. Nat. Mach. Intell. 2, 573–584 (2020).
Pandey, M. et al. The transformational role of GPU computing and deep learning in drug discovery. Nat. Mach. Intell. 4, 211–221 (2022).
Bengio, Y., Courville, A. & Vincent, P. Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828 (2012).
Real, E., Aggarwal, A., Huang, Y. & Le, Q. V. Regularized evolution for image classifier architecture search. Preprint at:arXiv https://doi.org/10.48550/arXiv.1802.01548 (2018).
Elsken, T., Metzen, J. H. & Hutter, F. Neural architecture search: a survey.J. Mach. Learn. Res. 20, 1–21 (2019).
Li, X. & Fourches, D. Inductive transfer learning for molecular activity prediction: next-gen QSAR models with MolPMoFiT. J. Cheminform. 12, 27 (2020).
Xu, Y., Ma, J., Liaw, A., Sheridan, R. P. & Svetnik, V. Demystifying multitask deep neural networks for quantitative structure–activity relationships. J. Chem. Inf. Model. 57, 2490–2504 (2017).
Moon, C. & Kim, D. Prediction of drug-target interactions through multi-task learning. Sci. Rep. 12, 18323 (2022).
Fourches, D., Muratov, E. & Tropsha, A. Trust, but verify: on the importance of chemical structure curation in cheminformatics and QSAR modeling research. J. Chem. Inf. Model. 50, 1189–1204 (2010).
Fourches, D. et al. Trust, but verify II: a practical guide to chemogenomics data curation. J. Chem. Inf. Model. 56, 1243–1252 (2016).
Fourches, D., Muratov, E. & Tropsha, A. Curation of chemogenomics data. Nat. Chem. Biol. 11, 535 (2015).
Alves, V. M. et al. Curated data in — trustworthy in silico models out: the impact of data quality on the reliability of artificial intelligence models as alternatives to animal testing. Altern. Lab. Anim. 49, 73–82 (2021).
Tropsha, A. Best practices for QSAR model development, validation, and exploitation. Mol. Inform. 29, 476–488 (2010).
Golbraikh, A., Muratov, E., Fourches, D. & Tropsha, A. Data set modelability by QSAR. J. Chem. Inf. Model. 54, 1–4 (2014).
Maggiora, G. M. On outliers and activity cliffs — why QSAR often disappoints. J. Chem. Inf. Model. 46, 1535 (2006).
Aldeghi, M. et al. Roughness of molecular property landscapes and its impact on modellability. J. Chem. Inf. Model. 62, 4660–4671 (2022).
Bosc, N. et al. Large scale comparison of QSAR and conformal prediction methods and their applications in drug discovery. J. Cheminform. 11, 4 (2019).
Varnek, A. & Tropsha, A. Chemoinformatics Approaches to Virtual Screening. https://doi.org/10.1039/9781847558879 (Royal Society of Chemistry, 2008).
Schneider, G. & Fechner, U. Computer-based de novo design of drug-like molecules. Nat. Rev. Drug Discov. 4, 649–663 (2005).
Popova, M., Isayev, O. & Tropsha, A. Deep reinforcement learning for de novo drug design. Sci. Adv. 4, eaap7885 (2018).
Schneider, P. et al. Rethinking drug design in the artificial intelligence era. Nat. Rev. Drug Discov. 19, 353–364 (2019).
Schneider, G. Mind and machine in drug design. Nat. Mach. Intell. 1, 128–130 (2019).
Schneider, G. & Clark, D. E. Automated de novo drug design: are we nearly there yet? Angew. Chem. Int. Ed. Engl. 58, 10792–10803 (2019).
Hartenfeller, M. et al. DOGS: reaction-driven de novo design of bioactive compounds. PLoS Comput. Biol. 8, e1002380 (2012).
Tong, X. et al. Generative models for de novo drug design. J. Med. Chem. 64, 14011–14027 (2021).
Moret, M. et al. Leveraging molecular structure and bioactivity with chemical language models for de novo drug design. Nat. Commun. 14, 114 (2023).
Segler, M. H. S., Kogej, T., Tyrchan, C. & Waller, M. P. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci. 4, 120–131 (2018).
Blaschke, T., Olivecrona, M., Engkvist, O., Bajorath, J. & Chen, H. Application of generative autoencoder in de novo molecular design. Mol. Inform. 37, 1700123 (2018).
Putin, E. et al. Reinforced adversarial neural computer for de novo molecular design. J. Chem. Inf. Model. 58, 1194–1204 (2018).
Atz, K., Grisoni, F. & Schneider, G. Geometric deep learning on molecular representations. Nat. Mach. Intell. 3, 1023–1032 (2021).
Button, A., Merk, D., Hiss, J. A. & Schneider, G. Automated de novo molecular design by hybrid machine intelligence and rule-driven chemical synthesis. Nat. Mach. Intell. 1, 307–315 (2019).
Grisoni, F. Chemical language models for de novo drug design: challenges and opportunities. Curr. Opin. Struct. Biol. 79, 102527 (2023).
Kotsias, P. C. et al. Direct steering of de novo molecular generation with descriptor conditional recurrent neural networks. Nat. Mach. Intell. 2, 254–265 (2020).
Korshunova, M. et al. Generative and reinforcement learning approaches for the automated de novo design of bioactive compounds. Commun. Chem. 5, 129 (2022).
Baskin, I. I. Is one-shot learning a viable option in drug discovery? Expert Opin. Drug Discov. 14, 601–603 (2019).
Simões, R. S., Maltarollo, V. G., Oliveira, P. R. & Honorio, K. M. Transfer and multi-task learning in QSAR modeling: advances and challenges. Front. Pharmacol. 9, 74 (2018).
Moret, M., Helmstädter, M., Grisoni, F., Schneider, G. & Merk, D. Beam search for automated design and scoring of novel ROR ligands with machine intelligence. Angew. Chem. Int. Ed. Engl. 60, 19477–19482 (2021).
Moret, M., Friedrich, L., Grisoni, F., Merk, D. & Schneider, G. Generative molecular design in low data regimes. Nat. Mach. Intell. 2, 171–180 (2020).
Blaschke, T. et al. REINVENT 2.0: an AI tool for de novo drug design. J. Chem. Inf. Model. 60, 5918–5922 (2020).
Grisoni, F. & Schneider, G. De novo molecular design with chemical language models. Methods Mol. Biol. 2390, 207–232 (2022).
Chen, H. Can generative-model-based drug design become a new normal in drug discovery? J. Med. Chem. 65, 100–102 (2022).
Lam, L. & Suen, C. Y. Application of majority voting to pattern recognition: an analysis of its behavior and performance. IEEE Trans. Syst. Man Cybern. Part. A Syst. Hum. 27, 553–568 (1997).
Nippa, D. F. et al. Enabling late-stage drug diversification by high-throughput experimentation with geometric deep learning. Preprint at: ChemRxiv https://doi.org/10.26434/CHEMRXIV-2022-GKXM6 (2022).
Clark, K., Luong, M.-T., Le, Q. V. & Manning, C. D. ELECTRA: pre-training text encoders as discriminators rather than generators. Preprint at:arXiv https://doi.org/10.48550/arxiv.2003.10555 (2020).
Corey, E. J. & Wipke, W. T. Computer-assisted design of complex organic syntheses. Science 166, 178–192 (1969).
Corey, E. J. General methods for the construction of complex molecules. Pure Appl. Chem. 14, 19–38 (1967).
Coley, C. W., Green, W. H. & Jensen, K. F. Machine learning in computer-aided synthesis planning. Acc. Chem. Res. 51, 1281–1289 (2018).
Segler, M. H. S., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018).
Segler, M. H. S. & Waller, M. P. Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry 23, 5966–5971 (2017).
Szymkuć, S. et al. Computer-assisted synthetic planning: the end of the beginning. Angew. Chem. Int. Ed. Engl. 55, 5904–5937 (2016).
Genheden, S. et al. AiZynthFinder: a fast, robust and flexible open-source software for retrosynthetic planning. J. Cheminform. 12, 70 (2020).
Jin, W., Coley, C. W., Barzilay, R. & Jaakkola, T. In: NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems 2608–2617 (Neural Information Processing Systems Foundation, 2017).
Sutskever, I., Vinyals, O. & Le, Q. V. In: Proceedings of the 27th International Conference on Neural Information Processing Systems 2, 3104–3112 (Neural Information Processing Systems Foundation, 2014).
Liu, B. et al. Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS Cent. Sci. 3, 1103–1113 (2017).
Schwaller, P., Gaudin, T., Lányi, D., Bekas, C. & Laino, T. “Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models. Chem. Sci. 9, 6091–6098 (2018).
Wołos, A. et al. Computer-designed repurposing of chemical wastes into drugs. Nature 604, 668–676 (2022).
Patel, H. et al. SAVI, in silico generation of billions of easily synthesizable compounds through expert-system type rules. Sci. Data 7, 384 (2020).
Zabolotna, Y. et al. SynthI: a new open-source tool for synthon-based library design. J. Chem. Inf. Model. 62, 2151–2163 (2022).
Bonnet, P. Is chemical synthetic accessibility computationally predictable for drug and lead-like molecules? A comparative assessment between medicinal and computational chemists. Eur. J. Med. Chem. 54, 679–689 (2012).
Boda, K., Seidel, T. & Gasteiger, J. Structure and reaction based evaluation of synthetic accessibility. J. Comput. Aided Mol. Des. 21, 311–325 (2007).
Ertl, P. & Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminform. 1, 8 (2009).
Coley, C. W., Rogers, L., Green, W. H. & Jensen, K. F. SCScore: synthetic complexity learned from a reaction corpus. J. Chem. Inf. Model. 58, 252–261 (2018).
Hoonakker, F., Lachiche, N., Varnek, A. & Wagner, A. A representation to apply usual data mining techniques to chemical reactions — illustration on the rate constant of S(N)2 reactions in water. Int. J. Artif. Intell. Tools 20, 253–270 (2010).
Gimadiev, T. et al. Bimolecular nucleophilic substitution reactions: predictive models for rate constants and molecular reaction pairs analysis. Mol. Inform. 38, 1800104 (2019).
Baskin, I. I., Madzhidov, T. I., Antipin, I. S. & Varnek, A. A. Artificial intelligence in synthetic chemistry: achievements and prospects. Russ. Chem. Rev. 86, 1127–1156 (2017).
Glavatskikh, M. et al. predictive models for kinetic parameters of cycloaddition reactions. Mol. Inform. 38, 1800077 (2019).
Gimadiev, T. R. et al. Assessment of tautomer distribution using the condensed reaction graph approach. J. Comput. Aided Mol. Des. 32, 401–414 (2018).
Granda, J. M., Donina, L., Dragone, V., Long, D. L. & Cronin, L. Controlling an organic synthesis robot with machine learning to search for new reactivity. Nature 559, 377–381 (2018).
Ahneman, D. T., Estrada, J. G., Lin, S., Dreher, S. D. & Doyle, A. G. Predicting reaction performance in C–N cross-coupling using machine learning. Science 360, 186–190 (2018).
Skoraczyñski, G. et al. Predicting the outcomes of organic reactions via machine learning: are current descriptors sufficient? Sci. Rep. 7, 3582 (2017).
Probst, D., Schwaller, P. & Reymond, J.-L. Reaction classification and yield prediction using the differential reaction fingerprint DRFP. Digit. Discov. 1, 91–97 (2022).
Marcou, G. et al. Expert system for predicting reaction conditions: the Michael reaction case. J. Chem. Inf. Model. 55, 239–250 (2015).
Gao, H. et al. Using machine learning to predict suitable conditions for organic reactions. ACS Cent. Sci. 4, 1465–1476 (2018).
Afonina, V. A. et al. Prediction of optimal conditions of hydrogenation reaction using the likelihood ranking approach. Int. J. Mol. Sci. 23, 248 (2021).
Lin, A. I. et al. Automatized assessment of protective group reactivity: a step toward big reaction data analysis. J. Chem. Inf. Model. 56, 2140–2148 (2016).
Schneider, G. Automating drug discovery. Nat. Rev. Drug Discov. 17, 97–113 (2018).
Abolhasani, M. & Kumacheva, E. The rise of self-driving labs in chemical and materials sciences. Nat. Synth. 2, 483–492 (2023).
Reutlinger, M., Rodrigues, T., Schneider, P. & Schneider, G. Combining on-chip synthesis of a focused combinatorial library with computational target prediction reveals imidazopyridine GPCR ligands. Angew. Chem. Int. Ed. Engl. 53, 582–585 (2014).
Burger, B. et al. A mobile robotic chemist. Nature 583, 237–241 (2020).
Genheden, S., Norrby, P. O. & Engkvist, O. AiZynthTrain: robust, reproducible, and extensible pipelines for training synthesis prediction models. J. Chem. Inf. Model. 63, 1841–1846 (2023).
Ton, A.-T., Gentile, F., Hsing, M., Ban, F. & Cherkasov, A. Rapid identification of potential inhibitors of SARS- CoV-2 main protease by deep docking of 1.3 billion compounds. Mol. Inform. 39, e2000028 (2020).
Cherkasov, A., Ban, F., Li, Y., Fallahi, M. & Hammond, G. L. Progressive docking: a hybrid QSAR/docking approach for accelerating in silico high throughput screening. J. Med. Chem. 49, 7466–7478 (2006).
Hilpert, K., Fjell, C. D. & Cherkasov, A. Peptide-based drug design. Methods Mol. Biol. 494, 127–159 (2008).
Durrant, J. D. & McCammon, J. A. NNScore 2.0: a neural-network receptor-ligand scoring function. J. Chem. Inf. Model. 51, 2897–2903 (2011).
Svensson, F., Norinder, U. & Bender, A. Improving screening efficiency through iterative screening using docking and conformal prediction. J. Chem. Inf. Model. 57, 439–444 (2017).
Ahmed, L. et al. Efficient iterative virtual screening with Apache Spark and conformal prediction. J. Cheminform. 10, 8 (2018).
Rossetti, G. G. et al. Non-covalent SARS-CoV-2 Mpro inhibitors developed from in silico screen hits. Sci. Rep. 12, 2505 (2022).
Gentile, F. et al. Automated discovery of noncovalent inhibitors of SARS-CoV-2 main protease by consensus deep docking of 40 billion small molecules. Chem. Sci. 12, 15960–15974 (2021).
Garland, O. et al. Large-scale virtual screening for the discovery of SARS-CoV-2 papain-like protease (PLpro) non-covalent inhibitors. J. Chem. Inf. Model. 63, 2158–2169 (2023).
Radaeva, M. et al. Discovery of novel Lin28 Inhibitors to suppress cancer cell stemness. Cancers 14, 5687 (2022).
Gentile, F. et al. Deep docking: a deep learning platform for augmentation of structure based drug discovery. ACS Cent. Sci. 6, 939–949 (2020).
Gorgulla, C. et al. VirtualFlow Ants — ultra-large virtual screenings with artificial intelligence driven docking algorithm based on ant colony optimization. Int. J. Mol. Sci. 22, 5807 (2021).
Charifson, P. S., Corkery, J. J., Murcko, M. A. & Walters, W. P. Consensus scoring: a method for obtaining improved hit rates from docking databases of three-dimensional structures into proteins. J. Med. Chem. 42, 5100–5109 (1999).
Palacio-Rodríguez, K., Lans, I., Cavasotto, C. N. & Cossio, P. Exponential consensus ranking improves the outcome in docking and receptor ensemble docking. Sci. Rep. 9, 5142 (2019).
Ban, F. et al. Best practices of computer-aided drug discovery: lessons learned from the development of a preclinical candidate for prostate cancer with a new mechanism of action. J. Chem. Inf. Model. 57, 1018–1028 (2017).
Liu, Z. et al. In: 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 381–385 https://doi.org/10.1109/BIBM52615.2021.9669513 (2021).
McNutt, A. T. & Koes, D. R. Improving ΔΔG predictions with a multitask convolutional siamese network. J. Chem. Inf. Model. 62, 1819–1829 (2022).
Wang, J. & Dokholyan, N. V. Yuel: improving the generalizability of structure-free compound-protein interaction prediction. J. Chem. Inf. Model. 62, 463–471 (2022).
Li, X. et al. Deep learning enhancing kinome-wide polypharmacology profiling: model construction and experiment validation. J. Med. Chem. 63, 8723–8737 (2020).
Li, Z. et al. KinomeX: a web application for predicting kinome-wide polypharmacology effect of small molecules. Bioinformatics 35, 5354–5356 (2019).
Krishnan, S. R., Bung, N., Bulusu, G. & Roy, A. Accelerating de novo drug design against novel proteins using deep learning. J. Chem. Inf. Model. 61, 621–630 (2021).
Gentile, F. et al. Artificial intelligence-enabled virtual screening of ultra-large chemical libraries with deep docking. Nat. Protoc. 17, 672–697 (2022).
LeGrand, S. et al. In: BCB ‘20: Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics https://doi.org/10.1145/3388440.3412472 (Association for Computing Machinery, Inc., 2020).
Gorgulla, C. et al. An open-source drug discovery platform enables ultra-large virtual screens. Nature 580, 663–668 (2020).
Venkatraman, V. et al. Drugsniffer: an open source workflow for virtually screening billions of molecules for binding affinity to protein targets. Front. Pharmacol. 13, 1389 (2022).
Lyu, J. et al. Ultra-large library docking for discovering new chemotypes. Nature 566, 224–229 (2019).
Häse, F., Roch, L. M. & Aspuru-Guzik, A. Next-generation experimentation with self-driving laboratories. Trends Chem. 1, 282–291 (2019).
Zubatiuk, T. & Isayev, O. Development of multimodal machine learning potentials: toward a physics-aware artificial intelligence. Acc. Chem. Res. 54, 1575–1585 (2021).
Behler, J. Four generations of high-dimensional neural network potentials. Chem. Rev. 121, 10037–10072 (2021).
Smith, J. S., Nebgen, B., Lubbers, N., Isayev, O. & Roitberg, A. E. Less is more: sampling chemical space with active learning. J. Chem. Phys. 148, 241733 (2018).
Smith, J. S., Isayev, O. & Roitberg, A. E. ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 8, 3192–3203 (2017).
Devereux, C. et al. Extending the applicability of the ANI deep learning molecular potential to sulfur and halogens. J. Chem. Theory Comput. 16, 4192–4202 (2020).
Galvelis, R., Doerr, S., Damas, J. M., Harvey, M. J. & De Fabritiis, G. A scalable molecular force field parameterization method based on density functional theory and quantum-level machine learning. J. Chem. Inf. Model. 59, 3485–3493 (2019).
Rufa, D. A. et al. Towards chemical accuracy for alchemical free energy calculations with hybrid physics-based machine learning / molecular mechanics potentials. Preprint at: bioRxiv https://doi.org/10.1101/2020.07.29.227959 (2020).
Wang, L. et al. Accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern free-energy calculation protocol and force field. J. Am. Chem. Soc. 137, 2695–2703 (2015).
Zubatyuk, R., Smith, J. S., Leszczynski, J. & Isayev, O. Accurate and transferable multitask prediction of chemical properties with an atoms-in-molecules neural network. Sci. Adv. 5, eaav6490 (2019).
Matta, C. F. & Boyd, R. J. An introduction to the quantum theory of atoms in molecules. The Quantum Theory of Atoms in Molecules https://doi.org/10.1002/9783527610709.ch1 (2007).
Gokcan, H. & Isayev, O. Prediction of protein pKa with representation learning. Chem. Sci. 13, 2462–2474 (2022).
Bas, D. C., Rogers, D. M. & Jensen, J. H. Very fast prediction and rationalization of pKa values for protein-ligand complexes. Proteins 73, 765–783 (2008).
Lam, Y. H. et al. Applications. Org. Process. Res. Dev. 24, 1496–1507 (2020).
Hassanzadeh, P. Towards the quantum of quantum chemistry in pharmaceutical process development: current state and opportunities-enabled technologies for development of drugs or delivery systems. J. Control. Rel. 324, 260–279 (2020).
Li, Q. et al. The role of UNC5C in Alzheimer’s disease. Ann. Transl. Med. 6, 178 (2018).
Cao, Y., Romero, J. & Aspuru-Guzik, A. Potential of quantum computing for drug discovery. IBM J. Res. Dev. 62, 10.1147/JRD.2018.2888987 (2018).
Kirsopp, J. J. M. et al. Quantum computational quantification of protein-ligand interactions. Int. J. Quantum Chem. 122, e26975 (2022).
Outeiral, C. et al. The prospects of quantum computing in computational molecular biology. Wiley Interdiscip. Rev. Comput. Mol. Sci. 11, e1481 (2021).
Li, J. et al. Drug discovery approaches using quantum machine learning. Preprint at: arXiv https://doi.org/10.48550/arxiv.2104.00746 (2021).
Romero, J., Olson, J. P. & Aspuru-Guzik, A. Quantum autoencoders for efficient compression of quantum data. Quantum Sci. Technol. 2, 045001 (2017).
Cavasotto, C. N. Binding free energy calculation using quantum mechanics aimed for drug lead optimization. Methods Mol. Biol. 2114, 257–268 (2020).
Heinen, S. et al. Predicting toxicity by quantum machine learning. J. Phys. Commun. 4, 125012 (2020).
Jayatunga, M. K. P., Xie, W., Ruder, L., Schulze, U. & Meier, C. AI in small-molecule drug discovery: a coming wave? Nat. Rev. Drug Discov. 21, 175–176 (2022).
Pyzer-Knapp, E. O. Using Bayesian optimization to accelerate virtual screening for the discovery of therapeutics appropriate for repurposing for COVID-19. Preprint at: arXiv https://doi.org/10.48550/arxiv.2005.07121 (2020).
Jastrzębski, S. et al. Emulating docking results using a deep neural network: a new perspective for virtual screening. J. Chem. Inf. Model. 60, 4246–4262 (2020).
Graff, D. E., Shakhnovich, E. I. & Coley, C. W. Accelerating high-throughput virtual screening through molecular pool-based active learning. Chem. Sci. 12, 7866–7881 (2020).
Martin, L. J. State of the art iterative docking with logistic regression and Morgan fingerprints. ChemRxiv https://doi.org/10.26434/chemrxiv.14348117.v1 (2021).
Berenger, F., Kumar, A., Zhang, K. Y. J. & Yamanishi, Y. Lean-docking: exploiting ligands’ predicted docking scores to accelerate molecular docking. J. Chem. Inf. Model. 61, 2341–2352 (2021).
Kalliokoski, T. Machine learning boosted docking (HASTEN): an open-source tool to accelerate structure-based virtual screening campaigns. Mol. Inform. 40, 2100089 (2021).
Mehta, S. et al. MEMES: machine learning framework for enhanced molecular screening. Chem. Sci. 12, 11710–11721 (2021).
Yang, Y. et al. Efficient exploration of chemical space with docking and deep learning. J. Chem. Theory Comput. 17, 7106–7119 (2021).
Choi, J. & Lee, J. V-Dock: fast generation of novel drug-like molecules using machine-learning-based docking score and molecular optimization. Int. J. Mol. Sci. 22, 11635 (2021).
Bucinsky, L. et al. Machine learning prediction of 3CLpro SARS-CoV-2 docking scores. Comput. Biol. Chem. 98, 107656 (2022).
Sha, C. M., Wang, J. & Dokholyan, N. V. NeuralDock: rapid and conformation-agnostic docking of small molecules. Front. Mol. Biosci. 9, 244 (2022).
Morris, C. J., Stern, J. A., Stark, B., Christopherson, M. & Della Corte, D. MILCDock: machine learning enhanced consensus docking for virtual screening in drug discovery. J. Chem. Inf. Model. 62, 5342–5350 (2022).
García-Ortegón, M. et al. DOCKSTRING: easy molecular docking yields better benchmarks for ligand design. J. Chem. Inf. Model. 62, 3486–3502 (2022).
Qiu, Y. et al. Development and benchmarking of open force field v1.0.0 — the parsley small-molecule force field. J. Chem. Theory Comput. 17, 6262–6280 (2021).
Tingle, B. I. et al. ZINC-22 — a free multi-billion-scale database of tangible compounds for ligand discovery. J. Chem. Inf. Model. 63, 1166–1176 (2023).
Babuji, Y. Targeting SARS-CoV-2 with AI- and HPC-enabled lead generation: a first data release. Preprint at: arXiv https://doi.org/10.48550/arXiv.2006.02431 (2020).
Warr, W. A., Nicklaus, M. C., Nicolaou, C. A. & Rarey, M. Exploration of ultralarge compound collections for drug discovery. J. Chem. Inf. Model. 62, 2021–2034 (2022).
Ruddigkeit, L., van Deursen, R., Blum, L. C. & Reymond, J. L. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17.J. Chem. Inf. Model. 52, 2864–2875 (2012).
Oprea, T. I. & Gottfries, J. Chemography: the art of navigating in chemical space. J. Comb. Chem. 3, 157–166 (2001).
Medina-Franco, J., Martinez-Mayorga, K., Giulianotti, M., Houghten, R. & Pinilla, C. Visualization of the chemical space in drug discovery. Curr. Comput. Aided Drug Des. 4, 322–333 (2008).
Kireeva, N. et al. Generative topographic mapping (GTM): universal tool for data visualization, structure-activity modeling and dataset comparison. Mol. Inform. 31, 301–312 (2012).
Zabolotna, Y. et al. Chemography: searching for hidden treasures. J. Chem. Inf. Model. 61, 179–188 (2021).
Casciuc, I. et al. Virtual screening with generative topographic maps: how many maps are required? J. Chem. Inf. Model. 59, 564–572 (2019).
Zabolotna, Y. et al. ChemSpace Atlas: multiscale chemography of ultralarge libraries for drug discovery. J. Chem. Inf. Model. 62, 4537–4548 (2022).
Sattarov, B. et al. De novo molecular design by combining deep autoencoder recurrent neural networks with generative topographic mapping. J. Chem. Inf. Model. 59, 1182–1196 (2019).
Bort, W. et al. Discovery of novel chemical reactions by deep generative recurrent neural network. Sci. Rep. 11, 3178 (2021).
Acknowledgements
The authors acknowledge support of their studies by the National Institutes of Health (grant R01GM140154) for A.T. and National Science Foundation (grant CHE-2154447) for O.I.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Reviews Drug Discovery thanks Esben Jannik Bjerrum, Eric Martin and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Related links
BioSolveIT KnowledgeSpace: https://www.biosolveit.de/infiniSee/#knowledgespace
CAS. AI drug discovery: assessing the first AI-designed drug candidates to go into human clinical trials: https://www.cas.org/resources/cas-insights/drug-discovery/ai-designed-drug-candidates
CHEMRriya: https://chemriya.com/
Enamine REAL Database: https://enamine.net/compound-collections/real-compounds/real-database
Enamine REAL Space on-demand library: https://enamine.net/compound-collections/real-compounds/real-space-navigator
eXplore from eMolecules: https://www.biosolveit.de/2022/09/12/introducing-explore-trillion-sized-chemical-space-by-emolecules/
Gartner hype cycle: https://www.gartner.com/en/articles/what-s-new-in-artificial-intelligence-from-the-2022-gartner-hype-cycle
Insilico Medicine. From start to phase 1 in 30 months: https://insilico.com/phase1
Kaggle. Merck Molecular Activity Challenge: https://www.kaggle.com/c/MerckActivity
KNIME: https://sites.google.com/site/dtclabdc/
MolDB from Deepcure: https://deepcure.ai/technology/
WuXi AppTec: https://www.wuxiapptec.com/
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tropsha, A., Isayev, O., Varnek, A. et al. Integrating QSAR modelling and deep learning in drug discovery: the emergence of deep QSAR. Nat Rev Drug Discov 23, 141–155 (2024). https://doi.org/10.1038/s41573-023-00832-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41573-023-00832-0
This article is cited by
-
QSPRpred: a Flexible Open-Source Quantitative Structure-Property Relationship Modelling Tool
Journal of Cheminformatics (2024)
-
Understanding predictions of drug profiles using explainable machine learning models
BioData Mining (2024)
-
Utilizing machine learning-based QSAR model to overcome standalone consensus docking limitation in beta-lactamase inhibitors screening: a proof-of-concept study
BMC Chemistry (2024)
-
HBCVTr: an end-to-end transformer with a deep neural network hybrid model for anti-HBV and HCV activity predictor from SMILES
Scientific Reports (2024)
-
Machine learning in preclinical drug discovery
Nature Chemical Biology (2024)