Abstract
China Conference on Knowledge Graph and Semantic Computing (ccks2022) proposed the task of chemical element knowledge graph construction and compound properties prediction.
For this task, we proposed to generate vector representations of chemical molecules by using molecular descriptors and pharmacophore fingerprints, and using large-scale chemical molecular data for unsupervised training to generate vector representations of chemical molecules. Then we discussed the performance of molecular representations generated by different methods in molecular properties prediction. The vector representations generated based on different ways were concatenated, and they were input into the ensemble model for prediction. Finally, the score of 0.8985 was obtained in the test dataset, and won the first place.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Durant, J.L., Leland, B.A., Henry, D.R., Nourse, J.G.: Reoptimization of MDL keys for use in drug discovery. J. Chem. Inf. Comput. Sci. 42(6), 1273–1280 (2002). https://doi.org/10.1021/ci010132r
Cereto-Massagué, A., Ojeda, M.J., Valls, C., Mulero, M., Garcia-Vallvé, S., Pujadas, G.: Molecular fingerprint similarity search in virtual screening. Methods 71, 58–63 (2015). https://doi.org/10.1016/j.ymeth.2014.08.005
Morgan, H.L.: The generation of a unique machine description for chemical structures-a technique developed at chemical abstracts service. J. Chem. Doc. 5(2), 107–113 (1965). https://doi.org/10.1021/c160017a018
Goh, G.B., Hodas, N.O., Siegel, C., Vishnu, A.: Smiles2vec: an interpretable general-purpose deep neural network for predicting chemical properties. arXiv preprint arXiv:1712.02034 (2017)
Jaeger, S., Fulle, S., Turk, S.: Mol2vec: unsupervised machine learning approach with chemical intuition. J. Chem. Inf. Model. 58(1), 27–35 (2018)
Chithrananda, S., Grand, G., Ramsundar, B.: ChemBERTa: large-scale self-supervised pretraining for molecular property prediction. arXiv preprint arXiv:2010.09885 (2020)
Wang, S., Guo, Y., Wang, Y., et al.: SMILES-BERT: large scale unsupervised pre-training for molecular property prediction. In: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, pp. 429–436 (2019)
Wu, Z., Jiang, D., Wang, J., et al.: Knowledge-based BERT: a method to extract molecular features like computational chemists. Briefings Bioinform. 23(3), bbac131 (2022)
Hu, W., Liu, B., Gomes, J., et al.: Strategies for pre-training graph neural networks. arXiv preprint arXiv:1905.12265 (2019)
Sun, M., Xing, J., Wang, H., et al.: MoCL: data-driven molecular fingerprint via knowledge-aware contrastive learning from molecular graph. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 3585–3594 (2021)
Fang, Y., Zhang, Q., Yang, H., et al.: Molecular contrastive learning with chemical element knowledge graph. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 4, pp. 3968–3976 (2022)
Zhang, Z., Liu, Q., Wang, H., et al.: Motif-based graph self-supervised learning for molecular property prediction. Adv. Neural Inf. Process. Syst. 34, 15870–15882 (2021)
Fang, X., Liu, L., Lei, J., et al.: Geometry-enhanced molecular representation learning for property prediction. Nat. Mach. Intell. 4(2), 127–134 (2022)
Rong, Y., Bian, Y., Xu, T., et al.: Self-supervised graph transformer on large-scale molecular data. Adv. Neural Inf. Process. Syst. 33, 12559–12571 (2020)
Ying, C., Cai, T., Luo, S., et al.: Do transformers really perform badly for graph representation? Adv. Neural Inf. Process. Syst. 34, 28877–28888 (2021)
Zhang, X.C., Wu, C.K., Yang, Z.J., et al.: MG-BERT: leveraging unsupervised atomic representation learning for molecular property prediction. Briefings Bioinform. 22(6), bbab152 (2021)
Zeng, Z., Yao, Y., Liu, Z., et al.: A deep-learning system bridging molecule structure and biomedical text with comprehension comparable to human professionals. Nat. Commun. 13(1), 1–11 (2022)
Qian, Y., Li, X., Wu, J., et al.: Picture-word order compound protein interaction: predicting compound-protein interaction using structural images of compounds. J. Comput. Chem. 43(4), 255–264 (2022)
Li, B., Liu, Y., Wang, X.: Gradient harmonized single-stage detector. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, 8577–8584 (2019)
Zhang, N., et al.: OntoProtein: protein pretraining with gene ontology embedding. In: International Conference on Learning Representations, 29 September 2021
Wu, F., et al.: Molformer: motif-based transformer on 3D heterogeneous molecular graphs. arXiv preprint arXiv:2110.01191, 4 October 2021
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Yang, W., Zou, J., Yin, L. (2022). Compound Property Prediction Based on Multiple Different Molecular Features and Ensemble Learning. In: Zhang, N., Wang, M., Wu, T., Hu, W., Deng, S. (eds) CCKS 2022 - Evaluation Track. CCKS 2022. Communications in Computer and Information Science, vol 1711. Springer, Singapore. https://doi.org/10.1007/978-981-19-8300-9_7
Download citation
DOI: https://doi.org/10.1007/978-981-19-8300-9_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-8299-6
Online ISBN: 978-981-19-8300-9
eBook Packages: Computer ScienceComputer Science (R0)