Fragment Library of Natural Products and Compound Databases for Drug Discovery †
<p>Unique and overlapping compounds and fragments from COCONUT, FooDB, DCM, CAS, and 3CLP. Compounds and fragments are represented with colors: yellow (COCONUT), violet (FooDB), purple (DCM), green (CAS), and lime (3CLP).</p> "> Figure 2
<p>The 10 most frequent and unique COCONUT fragments. Frequency (regular font) and proportion (bold font) are listed below the chemical structures.</p> "> Figure 3
<p>The 10 most frequent and unique FooDB fragments. Frequency (regular font) and proportion (bold font) are listed below the chemical structures.</p> "> Figure 4
<p>The 10 most frequent and unique DCM fragments. Frequency (regular font) and proportion (bold font) are listed below the chemical structures.</p> "> Figure 5
<p>The 10 most frequent and unique CAS fragments. Frequency (regular bond) and proportion (bold font) are listed below the chemical structures.</p> "> Figure 6
<p>The 10 most frequent and unique 3CLP fragments. Frequency (regular bond) and proportion (bold font) are listed below the chemical structures.</p> "> Figure 7
<p>Overlapping fragments between COCONUT, FooDB, DCM, CAS, and 3CLP. The sum of frequencies of each fragment in all databases is indicated in bold font.</p> "> Figure 8
<p>Visualization of the chemical space of the compound datasets generated with Tree Maps. Datasets are represented with colors: COCONUT (cyan), DCM (gray), FooDB (orange), CAS (pink), and inhibitors of the main protease of SARS-CoV-2, 3CLP, (olive). Overlapping compounds in COCONUT–FooDB (purple), COCONUT–CAS (black), COCONUT–DCM (green), and COCONUT–3CLP (magenta) are indicated.</p> "> Figure 9
<p>Visualization of the chemical space of fragments generated with Tree Maps. Datasets are represented with colors: COCONUT (cyan), DCM (gray), FooDB (orange), CAS (pink), and inhibitors of the main protease of SARS-CoV-2, 3CLP, (olive). Overlapping fragments in COCONUT–FooDB (purple), COCONUT–CAS (black), COCONUT–DCM (green), and COCONUT–3CLP (magenta) are indicated.</p> "> Figure 10
<p>Visualization of the chemical space from CAS compounds (pink), DCM compounds (gray), and overlapping DCM-CAS compounds (green).</p> "> Figure 11
<p>Visualization of the chemical space from CAS fragments (pink), DCM fragments (gray), and overlapping DCM-CAS fragments (green).</p> ">
Abstract
:1. Introduction
2. Materials and Methods
2.1. Compound Databases
2.2. Data Curation
2.3. Generation of Unique Fragments Using the RECAP Algorithm
2.4. Structural Diversity and Complexity
2.5. Chemical Space Visualization
3. Results and Discussion
3.1. Overlapping Fragments and Compounds
3.2. Fragment Analysis
3.3. Structural Diversity and Complexity
3.4. Chemical Space Visualization
4. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Prieto-Martínez, F.D.; Norinder, U.; Medina-Franco, J.L. Cheminformatics explorations of natural products BT. In Progress in the Chemistry of Organic Natural Products 110: Cheminformatics in Natural Product Research; Kinghorn, A.D., Falk, H., Gibbons, S., Kobayashi, J., Asakawa, Y., Liu, J.-K., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 1–35. ISBN 978-3-030-14632-0. [Google Scholar]
- Newman, D.J.; Cragg, G.M. Natural products as sources of new drugs over the nearly four decades from 01/1981 to 09/2019. J. Nat. Prod. 2020, 83, 770–803. [Google Scholar] [CrossRef]
- López-Vallejo, F.; Giulianotti, M.A.; Houghten, R.A.; Medina-Franco, J.L. Expanding the medicinally relevant chemical space with compound libraries. Drug Discov. Today 2012, 17, 718–726. [Google Scholar] [CrossRef] [PubMed]
- Ganesan, A. Natural products as a hunting ground for combinatorial chemistry. Curr. Opin. Biotechnol. 2004, 15, 584–590. [Google Scholar] [CrossRef] [PubMed]
- Christoforow, A.; Wilke, J.; Binici, A.; Pahl, A.; Ostermann, C.; Sievers, S.; Waldmann, H. Design, synthesis, and phenotypic profiling of pyrano-furo-pyridone pseudo natural products. Angew. Chemie Int. Ed. 2019, 58, 14715–14723. [Google Scholar] [CrossRef] [PubMed]
- Medina-Franco, J.L. Chapter 21—Discovery and development of lead compounds from natural sources using computational approaches. In Evidence-Based Validation of Herbal Medicine; Mukherjee, P.K., Harwansh, R.K., Bahadur, S., Banerjee, S., Kar, A., Eds.; Elsevier: Boston, MA, USA, 2015; pp. 455–475. ISBN 978-0-12-800874-4. [Google Scholar]
- Prachayasittikul, V.; Worachartcheewan, A.; Shoombuatong, W.; Songtawee, N.; Simeon, S.; Prachayasittikul, V.; Nantasenamat, C. Computer-aided drug design of bioactive natural products. Curr. Top. Med. Chem. 2015, 15, 1780–1800. [Google Scholar] [CrossRef] [PubMed]
- Chen, Y.; Kirchmair, J. Cheminformatics in natural product-based drug discovery. Mol. Inf. 2020. [Google Scholar] [CrossRef]
- Medina-Franco, J.L. Towards a unified Latin American natural products database: LANaPD. Futur. Sci. OA 2020, 6, FSO468. [Google Scholar] [CrossRef]
- Chávez-Hernández, A.L.; Sánchez-Cruz, N.; Medina-Franco, J.L. A fragment library of natural products and its comparative chemoinformatic characterization. Mol. Inf. 2020. [Google Scholar] [CrossRef]
- Santini, A.; Cicero, N. Development of food chemistry, natural products, and nutrition research: Targeting new frontiers. Foods 2020, 9, 482. [Google Scholar] [CrossRef]
- Martinez-Mayorga, K.; Medina-Franco, J.L. Foodinformatics: Applications of Chemical Information to Food Chemistry; Springer: Berlin/Heidelberg, Germany, 2014; ISBN 3319102265. [Google Scholar]
- Wassermann, A.M.; Lounkine, E.; Hoepfner, D.; Le Goff, G.; King, F.J.; Studer, C.; Peltier, J.M.; Grippo, M.L.; Prindle, V.; Tao, J.; et al. Dark chemical matter as a promising starting point for drug lead discovery. Nat. Chem. Biol. 2015, 11, 958–966. [Google Scholar] [CrossRef]
- Santibáñez-Morán, M.G.; López-López, E.; Prieto-Martínez, F.D.; Sánchez-Cruz, N.; Medina-Franco, J.L. Consensus virtual screening of dark chemical matter and food chemicals uncover potential inhibitors of SARS-CoV-2 main protease. RSC Adv. 2020, 10, 25089–25099. [Google Scholar] [CrossRef]
- Tang, B.; He, F.; Liu, D.; Fang, M.; Wu, Z.; Xu, D. AI-aided design of novel targeted covalent inhibitors against SARS-CoV-2. bioRxiv 2020. [Google Scholar] [CrossRef] [Green Version]
- Sorokina, M.; Steinbeck, C. Review on natural products databases: Where to find data in 2020. J. Cheminform. 2020, 12, 20. [Google Scholar] [CrossRef] [Green Version]
- The Metabolomics Innovation Centre. The Metabolomics Innovation Centre: FooDB (Version 1). Available online: https://foodb.ca/ (accessed on 19 May 2020).
- American Chemical Society: CAS COVID-19 Antiviral Candidate Compounds Dataset. Available online: https://www.cas.org/covid-19-antiviral-compounds-dataset (accessed on 19 May 2020).
- Toolkit RDKit. Available online: http://rdkit.org (accessed on 21 May 2020).
- MolVS. Available online: https://molvs.readthedocs.io/en/latest/ (accessed on 21 May 2020).
- Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 1988, 28, 31–36. [Google Scholar] [CrossRef]
- Lewell, X.Q.; Judd, D.B.; Watson, S.P.; Hann, M.M. RECAPRetrosynthetic combinatorial analysis procedure: A powerful new technique for identifying privileged molecular fragments with useful applications in combinatorial chemistry. J. Chem. Inf. Comput. Sci. 1998, 38, 511–522. [Google Scholar] [CrossRef]
- Rogers, D.; Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 2010, 50, 742–754. [Google Scholar] [CrossRef]
- Durant, J.L.; Leland, B.A.; Henry, D.R.; Nourse, J.G. Reoptimization of MDL Keys for use in drug discovery. J. Chem. Inf. Comput. Sci. 2002, 42, 1273–1280. [Google Scholar] [CrossRef] [Green Version]
- Agrafiotis, D.K. A constant time algorithm for estimating the diversity of large chemical libraries. J. Chem. Inf. Comput. Sci. 2001, 41, 159–167. [Google Scholar] [CrossRef] [PubMed]
- Probst, D.; Reymond, J.-L. Visualization of very large high-dimensional data sets as minimum spanning trees. J. Cheminform. 2020, 12, 12. [Google Scholar] [CrossRef] [Green Version]
- TMAP. Available online: https://tmap.gdb.tools/ (accessed on 18 August 2020).
- Sánchez-Cruz, N.; Pilón-Jiménez, B.A.; Medina-Franco, J.L. Functional group and diversity analysis of BIOFACQUIM: A Mexican natural product database. F1000Research 2020, 8. [Google Scholar] [CrossRef]
- Sayed, A.M.; Khattab, A.R.; AboulMagd, A.M.; Hassan, H.M.; Rateb, M.E.; Zaid, H.; Abdelmohsen, U.R. Nature as a treasure trove of potential anti-SARS-CoV drug leads: A structural/mechanistic rationale. RSC Adv. 2020, 10, 19790–19802. [Google Scholar] [CrossRef]
- Gentile, D.; Patamia, V.; Scala, A.; Sciortino, M.T.; Piperno, A.; Rescifina, A. Putative inhibitors of SARS-CoV-2 main protease from a library of marine natural products: A virtual screening and molecular modeling study. Mar. Drugs 2020, 18, 225. [Google Scholar] [CrossRef] [Green Version]
- Chen, Y.; de Lomana, G.M.; Friedrich, N.-O.; Kirchmair, J. Characterization of the chemical space of known and readily obtainable natural products. J. Chem. Inf. Model. 2018, 58, 1518–1532. [Google Scholar] [CrossRef] [PubMed]
- Feher, M.; Schmidt, J.M. Property distributions: Differences between drugs, natural products, and molecules from combinatorial chemistry. J. Chem. Inf. Comput. Sci. 2003, 43, 218–227. [Google Scholar] [CrossRef] [PubMed]
- Cremosnik, G.S.; Liu, J.; Waldmann, H. Guided by evolution: From biology oriented synthesis to pseudo natural products. Nat. Prod. Rep. 2020. [Google Scholar] [CrossRef]
Dataset | Original Compounds | Processed Compounds | Generated Fragments | Reference |
---|---|---|---|---|
COCONUT | 432,706 | 382,248 | 52,630 | [16] |
FooDB | 23,883 | 21,319 | 3186 | [17] |
Dark Chemical Matter (DCM) | 139,352 | 139,326 | 14,001 | [13] |
Chemical Abstract Service (CAS) set focused on COVID-19 | 48,876 | 44,692 | 8432 | [18] |
Inhibitors of the main protease of SARS-CoV-2 (3CLP) | 280 | 256 | 108 | [15] |
Structural Feature | COCONUT | FooDB | DCM | CAS | 3CLP |
---|---|---|---|---|---|
Carbon atoms | 25.640 | 26.563 | 18.059 | 22.496 | 25.828 |
Oxygen atoms | 6.167 | 7.343 | 3.252 | 5.773 | 4.922 |
Nitrogen atoms | 1.445 | 0.668 | 2.859 | 4.157 | 3.582 |
Heavy atoms | 33.611 | 34.942 | 25.139 | 33.535 | 35.352 |
Fraction of sp3 carbons | 0.506 | 0.620 | 0.342 | 0.489 | 0.291 |
Fraction of chiral carbons | 0.154 | 0.152 | 0.028 | 0.145 | 0.069 |
Rings | 3.962 | 2.243 | 2.881 | 3.628 | 3.617 |
Aliphatic rings | 2.250 | 1.426 | 0.791 | 1.372 | 0.645 |
Aromatic rings | 1.712 | 0.817 | 2.089 | 2.256 | 2.973 |
Heterocycles | 1.711 | 1.020 | 1.408 | 2.056 | 1.500 |
Aliphatic heterocycles | 1.166 | 0.770 | 0.619 | 0.865 | 0.363 |
Aromatic heterocycles | 1.712 | 0.817 | 2.089 | 2.256 | 2.973 |
Spiro atoms | 0.167 | 0.051 | 0.018 | 0.019 | 0.000 |
Bridgehead atoms | 0.493 | 0.137 | 0.056 | 0.254 | 0.023 |
Structural Feature | COCONUT | FooDB | DCM | CAS | 3CLP | Overlapping Fragments |
---|---|---|---|---|---|---|
Carbon atoms | 18.504 | 12.991 | 10.181 | 9.904 | 8.926 | 5.179 |
Oxygen atoms | 3.524 | 3.173 | 1.748 | 3.678 | 1.556 | 1.107 |
Nitrogen atoms | 0.795 | 0.394 | 1.475 | 0.883 | 0.713 | 0.107 |
Heavy atoms | 23.034 | 16.760 | 14.057 | 15.532 | 11.537 | 6.464 |
Fraction of sp3 carbons | 0.557 | 0.615 | 0.330 | 0.656 | 0.298 | 0.318 |
Fraction of chiral carbons | 0.189 | 0.199 | 0.054 | 0.240 | 0.071 | 0.062 |
Rings | 2.999 | 1.739 | 1.686 | 1.496 | 1.398 | 0.571 |
Aliphatic rings | 2.013 | 1.237 | 0.447 | 0.837 | 0.398 | 0.071 |
Aromatic rings | 0.986 | 0.503 | 1.239 | 0.660 | 1.000 | 0.500 |
Heterocycles | 1.087 | 0.577 | 0.899 | 0.787 | 0.574 | 0.179 |
Aliphatic heterocycles | 0.751 | 0.390 | 0.313 | 0.573 | 0.176 | 0.036 |
Aromatic heterocycles | 0.986 | 0.503 | 1.239 | 0.660 | 1.000 | 0.500 |
Spiro atoms | 0.190 | 0.085 | 0.013 | 0.010 | 0.000 | 0.000 |
Bridgehead atoms | 0.507 | 0.288 | 0.043 | 0.109 | 0.056 | 0.000 |
Dataset | Morgan2 a (1024-bits) | MACCS Keys a (166-bits) |
---|---|---|
COCONUT | 0.107 | 0.380 |
FooDB | 0.092 | 0.322 |
DCM | 0.136 | 0.407 |
CAS | 0.117 | 0.473 |
3CLP inhibitors | 0.127 | 0.403 |
Dataset of Fragments | Morgan2 a (1024-bits) | MACCS Keys a (166-bits) |
---|---|---|
COCONUT | 0.111 | 0.300 |
FooDB | 0.106 | 0.241 |
DCM | 0.125 | 0.243 |
CAS | 0.095 | 0.222 |
3CLP inhibitors | 0.147 | 0.214 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chávez-Hernández, A.L.; Sánchez-Cruz, N.; Medina-Franco, J.L. Fragment Library of Natural Products and Compound Databases for Drug Discovery. Biomolecules 2020, 10, 1518. https://doi.org/10.3390/biom10111518
Chávez-Hernández AL, Sánchez-Cruz N, Medina-Franco JL. Fragment Library of Natural Products and Compound Databases for Drug Discovery. Biomolecules. 2020; 10(11):1518. https://doi.org/10.3390/biom10111518
Chicago/Turabian StyleChávez-Hernández, Ana L., Norberto Sánchez-Cruz, and José L. Medina-Franco. 2020. "Fragment Library of Natural Products and Compound Databases for Drug Discovery" Biomolecules 10, no. 11: 1518. https://doi.org/10.3390/biom10111518
APA StyleChávez-Hernández, A. L., Sánchez-Cruz, N., & Medina-Franco, J. L. (2020). Fragment Library of Natural Products and Compound Databases for Drug Discovery. Biomolecules, 10(11), 1518. https://doi.org/10.3390/biom10111518