Abstract
The identification of possible targets for a known compound by its sole molecular representation is one of the most important tasks for drug design and development. In this work, a methodology is proposed for target identification using supervised machine learning. To predict drug binding targets, classification models across targets were constructed using the k-NN algorithm by integrating multiple data types. Two different groups of descriptors are used: 1) Morgan’s fingerprint and 2) general molecular properties of interest. The findings demonstrate that the k-NN classification models achieved a higher f1-score with descriptors based on molecular properties of interest with 0.7 in comparison to the Morgan fingerprint descriptors that achieved a score of 0.57 or the fusion of both with a score of 0.58.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Czarnecki, W.M.: Weighted Tanimoto extreme learning machine with case study in drug discovery. IEEE Comput. Intell. Mag. 10(3), 19–29 (2015)
Zhang, W., Lin, W., Zhang, D., Wang, S., Shi, J., Niu, Y.: Recent advances in the machine learning-based drug-target interaction prediction. Curr. Drug Metab. 20(3), 194–202 (2019)
Sydow, D., et al.: Advances and challenges in computational target prediction. J. Chem. Inf. Model. 59 (2019)
Mathai, N., Kirchmair, J.: Similarity-based methods and machine learning approaches for target prediction in early drug discovery: performance and scope. Int. J. Mol. Sci. 21(10), 3585 (2020)
Yang, S., et al.: Current advances in ligand-based target prediction. Wiley Interdisc. Rev. Comput. Mol. Sci. 11, 1–21 (2020)
Schuffenhauer, A., Floersheim, P., Acklin, P., Jacoby, E.: Similarity metrics for ligands reflecting the similarity of the target proteins. J. Chem. Inf. Comput. Sci. 43(2), 391–405 (2003)
Nogueira, M.S., Koch, O.: The development of target-specific machine learning models as scoring functions for docking-based target prediction. J. Chem. Inf. Model. 59(3), 1238–1252 (2019). PMID: 30802041
Zhao, S., Shao, L.: Network-based relating pharmacological and genomic spaces for drug target identification. PLoS ONE 5(7) (2010)
Shaikh, F., Tai, H.K., Desai, N., Siu, S.: Ligtmap: ligand and structure-based target identification and activity prediction for small molecules. J. Cheminform. (2020)
Bento, A.P., et al.: The ChEMBL bioactivity database: an update. Nucleic Acids Res. 42(D1), D1083–D1090 (2013)
Mendez, D., et al.: ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 47(D1), D930–D940 (2018)
Wishart, D.S., et al.: DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 46(D1), D1074–D1082 (2017)
Wang, Y., et al.: PubChem BioAssay: 2017 update. Nucleic Acids Res. 45(D1), D955–D963 (2016)
Ding, Y., Tang, J., Guo, F.: Identification of drug-target interactions via multiple information integration. Inf. Sci. 418–419, 546–560 (2017)
Peón, A., et al.: Moltarpred: a web tool for comprehensive target prediction with reliability estimation. Chem. Biol. Drug Des. 94 (2019)
Cockroft, N.T., Cheng, X., Fuchs, J.R.: Starfish: a stacked ensemble target fishing approach and its application to natural products. J. Chem. Inf. Model. 59(11), 4906–4920 (2019). PMID: 31589422
Awale, M., Reymond, J.-L.: The polypharmacology browser ppb2: target prediction combining nearest neighbors with machine learning. J. Chem. Inf. Model. 59, 12 (2018)
Cui, X., Liu, J., Zhang, J., Qiuyun, W., Li, X.: In silico prediction of drug-induced rhabdomyolysis with machine-learning models and structural alerts. J. Appl. Toxicol. 39, 1224–1232 (2019)
Shi, Y., Hua, Y., Wang, B., Zhang, R., Li, X.: In silico prediction and insights into the structural basis of drug induced nephrotoxicity. Front. Pharmacol. 12, 01 (2022)
Landrum, G., et al.: rdkit/rdkit: 2022_09_1b1 (q3 2022) release, October 2022
Prakisya, N.P.T., Liantoni, F., Hatta, P., Aristyagama, Y.H., Setiawan, A.: Utilization of k-nearest neighbor algorithm for classification of white blood cells in AML m4, m5, and m7. Open Eng. 11, 662–668 (2021)
Klimo, M., Škvarek, O., Tarábek, P., Šuch, O., Hrabovsky, J.: Nearest neighbor classification in Minkowski quasi-metric space. In: 2018 World Symposium on Digital Intelligence for Systems and Machines (DISA), pp. 227–232 (2018)
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)
Wettschereck, D.: A study of distance-based machine learning algorithms. Ph.D. thesis, Oregon State University, USA, AAI9507711 (1994)
Bramer, M.: Principles of Data Mining. Springer, London (2007). https://doi.org/10.1007/978-1-84628-766-4
Li-Yu, H., Huang, M.-W., Ke, S.-W., Tsai, C.-F.: The distance function effect on k-nearest neighbor classification for medical datasets. Springerplus 5, 12 (2016)
Williams, J., Li, Y.: Comparative study of distance functions for nearest neighbors. Adv. Tech. Comput. Sci. Softw. Eng. 79–84 (2008)
Berrar, D.: Cross-validation. In: Ranganathan, S., Gribskov, M., Nakai, K., Schönbach, C. (eds.) Encyclopedia of Bioinformatics and Computational Biology, pp. 542–545. Academic Press, Oxford (2019)
Deegalla, S., Boström, H.: Classification of microarrays with kNN: comparison of dimensionality reduction methods. In: Yin, H., Tino, P., Corchado, E., Byrne, W., Yao, X. (eds.) IDEAL 2007. LNCS, vol. 4881, pp. 800–809. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-77226-2_80
Gfeller, D., Michielin, O., Zoete, V.: Shaping the interaction landscape of bioactive molecules. Bioinformatics 29(23), 3073–3079 (2013)
Wang, L., Ma, C., Wipf, P., Liu, H., Weiwei, S., Xie, X.-Q.: Targethunter: an in silico target identification tool for predicting therapeutic potential of small organic molecules based on chemogenomic database. AAPS J. 15(2), 395–406 (2013)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Jimenes-Vargas, K., Perez-Castillo, Y., Tejera, E., Munteanu, C.R. (2023). Exploring Target Identification for Drug Design with K-Nearest Neighbors’ Algorithm. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2023. Lecture Notes in Computer Science(), vol 14126. Springer, Cham. https://doi.org/10.1007/978-3-031-42508-0_20
Download citation
DOI: https://doi.org/10.1007/978-3-031-42508-0_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42507-3
Online ISBN: 978-3-031-42508-0
eBook Packages: Computer ScienceComputer Science (R0)