Abstract
Proteins have significant biological effects when they bind to other substances, with binding to DNA being particularly crucial. Therefore, accurate identification of protein-DNA binding residues is important for further understanding of the protein-DNA interaction mechanism. Most current state-of-the-art methods are two-step approaches: the first step uses a sliding window technique to extract residue features; the second step uses each residue as an input to the model for prediction. This has a negative impact on the efficiency of prediction and ease of use. In this study, we propose a sequence-to-sequence (seq2seq) model that can input the entire protein sequence of variable length and use multiple modules including Transformer Encoder Module, Feature Fusion Module, and Feature Extraction Module for multi-layer feature processing. The Transformer Encoder Module is used to extract global features while the Feature Extraction Module is used to extract local features, further improving the recognition capability of the model. Comparison results on two benchmark datasets PDNA-543 and PDNA-41 demonstrate the effectiveness of our method in identifying protein-DNA binding residues. The code is available at https://github.com/HaipengZZhao/Prediction-of-Residues.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Dobson, C.M.: Chemical space and biology. Nature 432(7019), 824–828 (2004)
Gao, M., Skolnick, J.: The distribution of ligand-binding pockets around protein-protein interfaces suggests a general mechanism for pocket formation. Proc. Natl. Acad. Sci. 109(10), 3784–3789 (2012)
Zhao, J., Cao, Y., Zhang, L.: Exploring the computational methods for protein-ligand binding site prediction. Comput. Struct. Biotechnol. J. 18, 417–426 (2020)
Ofran, Y., Mysore, V., Rost, B.: Prediction of DNA-binding residues from sequence. Bioinformatics 23(13), i347–i353 (2007)
Jones, S., Van Heyningen, P., Berman, H.M., et al.: Protein-DNA interactions: a structural analysis. J. Mol. Biol. 287(5), 877–896 (1999)
Smyth, M.S., Martin, J.H.J.: X Ray crystallography. Mol. Pathol. 53(1), 8 (2000)
Nelson, J.D., Denisenko, O., Bomsztyk, K.: Protocol for the fast chromatin immunoprecipitation (ChIP) method. Nat. Protoc. 1(1), 179–185 (2006)
Heffler, M.A., Walters, R.D., Kugel, J.F.: Using electrophoretic mobility shift assays to measure equilibrium dissociation constants: GAL4-p53 binding DNA as a model system. Biochem. Mol. Biol. Educ. 40(6), 383–387 (2012)
Hellman, L.M., Fried, M.G.: Electrophoretic mobility shift assay (EMSA) for detecting protein–nucleic acid interactions. Nat. Protoc. 2(8), 1849–1861 (2007)
Vajda, S., Guarnieri, F.: Characterization of protein-ligand interaction sites using experimental and computational methods. Curr. Opin. Drug Discov. Devel. 9(3), 354 (2006)
Ding, Y., Yang, C., Tang, J., et al.: Identification of protein-nucleotide binding residues via graph regularized k-local hyperplane distance nearest neighbor model. Appl. Intell. 1–15 (2022)
Wang, L., Brown, S.J.: BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences. Nucleic Acids Res. 34(suppl_2), W243-W248 (2006)
Chu, W.Y., Huang, Y.F., Huang, C.C., et al.: ProteDNA: a sequence-based predictor of sequence-specific DNA-binding residues in transcription factors. Nucleic Acids Res. 37(suppl_2), W396-W401 (2009)
Hwang, S., Gou, Z., Kuznetsov, I.B.: DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins. Bioinformatics 23(5), 634–636 (2007)
Wang, L., Huang, C., Yang, M.Q., et al.: BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features. BMC Syst. Biol. 4, 1–9 (2010)
Si, J., Zhang, Z., Lin, B., et al.: MetaDBSite: a meta approach to improve protein DNA-binding sites prediction. BMC Syst. Biol. 5(1), 1–7 (2011)
Hu, J., Li, Y., Zhang, M., et al.: Predicting protein-DNA binding residues by weightedly combining sequence-based features and boosting multiple SVMs. IEEE/ACM Trans. Comput. Biol. Bioinf. 14(6), 1389–1398 (2016)
Liu, R., Hu, J.: DNABind: a hybrid algorithm for structure‐based prediction of DNA‐binding residues by combining machine learning‐and template‐based approaches. PROTEINS: Structure, Function Bioinform. 81(11), 1885–1899 (2013)
Zhu, Y.H., Hu, J., Song, X.N., et al.: DNAPred: accurate identification of DNA-binding sites from protein sequence by ensembled hyperplane-distance-based support vector machines. J. Chem. Inf. Model. 59(6), 3057–3071 (2019)
Hu, J., Bai, Y.S., Zheng, L.L., et al.: Protein-DNA binding residue prediction via bagging strategy and sequence-based cube-format feature. IEEE/ACM Trans. Comput. Biol. Bioinf. 19(6), 3635–3645 (2021)
Altschul, S.F., Madden, T.L., Schäffer, A.A., et al.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)
Gao, M., Skolnick, J.: DBD-Hunter: a knowledge-based method for the prediction of DNA–protein interactions. Nucleic Acids Res. 36(12), 3978–3992 (2008)
Ozbek, P., Soner, S., Erman, B., et al.: DNABINDPROT: fluctuation-based predictor of DNA-binding residues within a network of interacting residues. Nucleic Acids Res. 38(suppl_2), W417-W423 (2010)
Chen, Y.C., Wright, J.D., Lim, C.: DR_bind: a web server for predicting DNA-binding residues from the protein structure based on electrostatics, evolution and geometry. Nucleic Acids Res. 40(W1), W249–W256 (2012)
Tsuchiya, Y., Kinoshita, K., Nakamura, H.: PreDs: a server for predicting dsDNA-binding site on protein molecular surfaces. Bioinformatics 21(8), 1721–1723 (2005)
Yu, D.J., Hu, J., Tang, Z.M., et al.: Improving protein-ATP binding residues prediction by boosting SVMs with random under-sampling. Neurocomputing 104, 180–190 (2013)
Yang, J., Roy, A., Zhang, Y.: Protein–ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment. Bioinformatics 29(20), 2588–2595 (2013)
Yu, D.J., Hu, J., Yang, J., et al.: Designing template-free predictor for targeting protein-ligand binding sites with classifier ensemble and spatial clustering. IEEE/ACM Trans. Comput. Biol. Bioinf. 10(4), 994–1008 (2013)
Chen, K., Mizianty, M.J., Kurgan, L.: ATPsite: sequence-based prediction of ATP-binding residues proteome science. BioMed Central 9(1), 1–8 (2011)
Chen, K., Mizianty, M.J., Kurgan, L.: Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors. Bioinformatics 28(3), 331–341 (2012)
Zhang, Q., Wang, S., Chen, Z., et al.: Locating transcription factor binding sites by fully convolutional neural network. Brief. Bioinform. 22(5), bbaa435 (2021)
Cui, Z., Chen, Z.H., Zhang, Q.H., et al.: Rmscnn: a random multi-scale convolutional neural network for marine microbial bacteriocins identification. IEEE/ACM Trans. Comput. Biol. Bioinf. 19(6), 3663–3672 (2021)
Su, X., You, Z.H., Huang, D., et al.: Biomedical knowledge graph embedding with capsule network for multi-label drug-drug interaction prediction. IEEE Trans. Knowl. Data Eng. (2022)
Cui, Y., Dong, Q., Hong, D., et al.: Predicting protein-ligand binding residues with deep convolutional neural networks. BMC Bioinform. 20(1), 1–12 (2019)
Li, W., Godzik, A.: Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22(13), 1658–1659 (2006)
Wang, Y., Ding, Y., Guo, F., et al.: Improved detection of DNA-binding proteins via compression technology on PSSM information. PLoS ONE 12(9), e0185587 (2017)
Ding, Y., Tang, J., Guo, F.: Identification of protein–ligand binding sites by sequence information and ensemble classifier. J. Chem. Inf. Model. 57(12), 3149–3161 (2017)
Ahmad, S., Sarai, A.: PSSM-based prediction of DNA binding sites in proteins. BMC Bioinformatics 6, 1–6 (2005)
UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47(D1), D506-D515 (2019)
Acknowledgement
This paper is supported by the National Natural Science Foundation of China (62073231, 62176175, 61902271), National Research Project (2020YFC2006602), Provincial Key Laboratory for Computer Information Processing Technology, Soochow University (KJS2166), Opening Topic Fund of Big Data Intelligent Engineering Laboratory of Jiangsu Province (SDGC2157).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhao, H., Zhu, B., Jiang, T., Cui, Z., Wu, H. (2023). A Transformer-Based Deep Learning Approach with Multi-layer Feature Processing for Accurate Prediction of Protein-DNA Binding Residues. In: Huang, DS., Premaratne, P., Jin, B., Qu, B., Jo, KH., Hussain, A. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2023. Lecture Notes in Computer Science, vol 14088. Springer, Singapore. https://doi.org/10.1007/978-981-99-4749-2_47
Download citation
DOI: https://doi.org/10.1007/978-981-99-4749-2_47
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-4748-5
Online ISBN: 978-981-99-4749-2
eBook Packages: Computer ScienceComputer Science (R0)