Abstract
Protein acetylation refers to a process of adding acetyl groups (CH3CO-) to lysine residues on protein chains. As one of the most commonly used protein post-translational modifications, lysine acetylation plays an important role in different organisms. In our study, we developed a human-specific method which uses a cascade classifier of complex-valued polynomial model (CVPM), combined with sequence and structural feature descriptors to solve the problem of imbalance between positive and negative samples. Complex-valued gene expression programming and differential evolution are utilized to search the optimal CVPM model. We also made a systematic and comprehensive analysis of the acetylation data and the prediction results. The performances of our proposed method aie 79.15% in Sp, 78.17% in Sn, 78.66% in ACC 78.76% in F1, and 0.5733 in MCC, which performs better than other state-of-the-art methods.
Similar content being viewed by others
References
Kouzarides T. Chromatin modifications and their function. Cell, 2007, 128(4): 693–705
Mann M, Jensen O N. Proteomic analysis of post-translational modifications. Nature Biotechnology, 2003, 21(3): 255–261
Lu CT, Lee TY, Chen YJ, et al. “An intelligent system for identifying acetylated lysine on histones and nonhistone proteins,” BioMed research international, 6(528650), 2014.
Deng W, Wang C, Zhang Y, et al. “GPS-PAIL: prediction of lysine acetyltransferase-specific modification sites from protein sequences,” Scientific reports, 6(39787), 2016.
Wysocka J, Swigut T, Xiao H, Milne T A, Kwon S Y, Landry J, Kauer M, Tackett A J, Chait B T, Badenhorst P, Wu C, Allis C D. A PHD finger of NURF couples histone H3 lysine 4 trimethylation with chromatin remodelling. Nature, 2006, 442(7098): 86–90
Wysocka J, Swigut T, Milne T A, Dou Y, Zhang X, Burlingame A L, Roeder R G, Brivanlou A H, Allis C D. WDR5 associates with histone H3 methylated at K4 and is essential for H3 K4 methylation and vertebrate development. Cell, 2005, 121(6): 859–872
Zeng L, Zhou M M. Bromodomain: an acetyl-lysine binding domain. FEBS Letters, 2002, 513(1): 124–128
Jenuwein T, Allis C D. Translating the histone code. Science, 2001, 293(5532): 1074–1080
Marmorstein R, Roth S Y. Histone acetyltransferases: function, structure, and catalysis. Current Opinion in Genetics & Development, 2001, 11(2): 155–161
Bode A M, Dong Z. Post-translational modification of p53 in tumorigenesis. Nature Reviews Cancer, 2004, 4(10): 793–805
Walsh G, Jefferis R. Post-translational modifications in the context of therapeutic proteins. Nature Biotechnology, 2006, 24(10): 1241–1252
Westermann S, Weber K. Post-translational modifications regulate microtubule function. Nature Reviews Molecular Cell Biology, 2003, 4(12): 938–948
Janke C, Bulinski J C. Post-translational regulation of the microtubule cytoskeleton: mechanisms and functions. Nature Reviews Molecular Cell Biology, 2011, 12(12): 773–786
Xu Y, Shao X J, Wu L Y, Deng N Y, Chou K C. iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins. PeerJ, 2013, 1: e171
Qiu W R, Xiao X, Lin W Z, Chou K C. iMethyl-PseAAC: identification of protein methylation sites via a pseudo amino acid composition approach. BioMed Research International, 2014: 947416
Xu Y, Wen X, Shao X J, Deng N Y, Chou K C. iHyd-PseAAC: predicting hydroxyproline and hydroxylysine in proteins by incorporating dipeptide position-specific propensity into pseudo amino acid composition. International Journal of Molecular Sciences, 2014, 15(5): 7594–7610
Xiao X, Ye H X, Liu Z, Jia J H, Chou K C. iROS-gPseKNC: predicting replication origin sites in DNA by incorporating dinucleotide position-specific propensity into general pseudo nucleotide composition. Oncotarget, 2016, 7(23): 34180–34189
Tu Y, Lin Y, Hou C, Mao S. Complex-valued networks for automatic modulation classification. IEEE Transactions on Vehicular Technology, 2020, 69(9): 10085–10089
Rawat S, Rana K P S, Kumar V. A novel complex-valued convolutional neural network for medical image denoising. Biomedical Signal Processing and Control, 2021, 69: 102859
Yang B, Bao W. Complex-valued ordinary differential equation modeling for time series identification. IEEE Access, 2019, 7: 41033–41042
Chen W, Tang H, Ye J, Lin H, Chou K C. iRNA-PseU: identifying RNA pseudouridine sites. Molecular Therapy Nucleic Acids, 2016, 5: e332
Jia J, Liu Z, Xiao X, Liu B, Chou K C. iCar-PseCp: identify carbonylation sites in proteins by Monte Carlo sampling and incorporating sequence coupled effects into general PseAAC. Oncotarget, 2016, 7(23): 34558–34570
Jia J, Zhang L, Liu Z, Xiao X, Chou K C. pSumo-CD: predicting sumoylation sites in proteins with covariance discriminant algorithm by incorporating sequence-coupled effects into general PseAAC. Bioinformatics, 2016, 32(20): 3133–3141
Liu Z, Xiao X, Yu D J, Jia J, Qiu W R, Chou K C. pRNAm-PC: predicting N6-methyladenosine sites in RNA sequences via physical-chemical properties. Analytical Biochemistry, 2016, 497: 60–67
Qiu W R, Sun B Q, Xiao X, Xu Z C, Chou K C. iPTM-mLys: identifying multiple lysine PTM sites and their different types. Bioinformatics, 2016, 32(20): 3116–3123
Qiu W R, Xiao X, Xu Z C, Chou K C. iPhos-PseEn: identifying phosphorylation sites in proteins by fusing different pseudo components into an ensemble classifier. Oncotarget, 2016, 7(32): 51270–51283
Feng P, Ding H, Yang H, Chen W, Lin H, Chou K C. iRNA-PseColl: identifying the occurrence sites of different RNA modifications by incorporating collective effects of nucleotides into PseKNC. Molecular Therapy Nucleic Acids, 2017, 7: 155–163
Bao W, Huang Z, Yuan C A, Huang D S. Pupylation sites prediction with ensemble classification model. International Journal of Data Mining and Bioinformatics, 2017, 18(2): 91–104
Qiu W R, Jiang S Y, Xu Z C, Xiao X, Chou K C. iRNAm5C-PseDNC: identifying RNA 5-methylcytosine sites by incorporating physical-chemical properties into pseudo dinucleotide composition. Oncotarget, 2017, 8(25): 41178–41188
Qiu W R, Sun B Q, Xiao X, Xu D, Chou K C. iPhos - PseEvo: identifying human phosphorylated proteins by incorporating evolutionary information into general PseAAC via grey system theory. Molecular Informatics, 2017, 36(5–6): 1600010
Qiu W R, Sun B Q, Xiao X, Xu Z C, Jia J H, Chou K C. iKcr-PseEns: identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier. Genomics, 2018, 110(5): 239–246
Xu Y, Wang Z, Li C, Chou K C. iPreny-PseAAC: identify C-terminal cysteine prenylation sites in proteins by incorporating two tiers of sequence couplings into PseAAC. Medicinal Chemistry, 2017, 13(6): 544–551
Bao W, Jiang Z, Huang D S. Novel human microbe-disease association prediction using network consistency projection. BMC Bioinformatics, 2017, 18(S16): 543
Chou K C. Prediction of human immunodeficiency virus protease cleavage sites in proteins. Analytical Biochemistry, 1996, 233(1): 1–14
Khan Y D, Rasool N, Hussain W, Khan S A, Chou K C. iPhosT-PseAAC: identify phosphothreonine sites by incorporating sequence statistical moments into PseAAC. Analytical Biochemistry, 2018, 550: 109–116
Liu B, Liu F, Wang X, Chen J, Fang L, Chou K C. Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Research, 2015, 43(W1): W65–W71
Chou K C. Impacts of bioinformatics to medicinal chemistry. Medicinal Chemistry, 2015, 11(3): 218–234
Yuan L F, Ding C, Guo S H, Ding H, Chen W, Lin H. Prediction of the types of ion channel-targeted conotoxins based on radial basis function network. Toxicology in Vitro, 2013, 27(2): 852–856
Chen W, Lin H, Chou K C. Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. Molecular Biosystems, 2015, 11(10): 2620–2634
Cheng X, Zhao S G, Lin W Z, Xiao X, Chou K C. pLoc-mAnimal: predict subcellular localization of animal proteins with both single and multiple sites. Bioinformatics, 2017, 33(22): 3524–3531
Cheng X, Xiao X, Chou K C. pLoc-mGneg: predict subcellular localization of Gram-negative bacterial proteins by deep gene ontology learning via general PseAAC. Genomics, 2018, 110(4): 231–239
Cheng X, Xiao X, Chou K C. pLoc-mEuk: predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general PseAAC. Genomics, 2018, 110(1): 50–58
Bao W, Chen Y, Wang D. Prediction of protein structure classes with flexible neural tree. Bio-Medical Materials and Engineering, 2014, 24(6): 3797–3806
Bao W, Wang D, Chen Y. Classification of protein structure classes on flexible neutral tree. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2017, 14(5): 1122–1133
Chen Y, Yang B, Dong J, Abraham A. Time-series forecasting using flexible neural tree model. Information Sciences, 2005, 174(3–4): 219–235
Chen Y, Abraham A, Yang B. Hybrid flexible neural-tree-based intrusion detection systems. International Journal of Intelligent Systems, 2007, 22(4): 337–352
Chen Y, Abraham A, Yang B. Feature selection and classification using flexible neural tree. Neurocomputing, 2006, 70(1–3): 305–313
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant No. 61902337), Xuzhou Science and Technology Plan Project (KC21047), Jiangsu Provincial Natural Science Foundation (No. SBK2019040953), Natural Science Fund for Colleges and Universities in Jiangsu Province (No. 19KJB520016) and Young Talents of Science and Technology in Jiangsu, the Key Research Program of the Science Foundation of Shandong Province (ZR2020KE001), the talent project of “Qingtan Scholar” of Zaozhuang University, the PhD research startup foundation of Zaozhuang University (No.2014BS13), and Zaozhuang University Foundation (No. 2015YY02).
Author information
Authors and Affiliations
Corresponding author
Additional information
Wenzheng Bao received the PhD degree in Computer Science from Tongji University, China in 2018. He is an associate professor, the master’s tutor of School of Information Engineering, Xuzhou University of Technology, China. His research interests include bioinformatics and machine learning.
Bin Yang received the PhD degree in Computer Science from Shandong University, China in 2014. He is a professor, the master’s tutor of School of Information Science and Engineering, Zaozhuang University, China. His research interests include bioinformatics and machine learning.
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Bao, W., Yang, B. Protein acetylation sites with complex-valued polynomial model. Front. Comput. Sci. 18, 183904 (2024). https://doi.org/10.1007/s11704-023-2640-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11704-023-2640-9