Abstract
Machinery of pre-mRNA splicing is carried out through the interaction of RNA sequence elements and a variety of RNA splicing-related proteins (SRPs) (e.g. spliceosome and splicing factors). Alternative splicing, which is an important post-transcriptional regulation in eukaryotes, gives rise to multiple mature mRNA isoforms, which encodes proteins with functional diversities. However, the regulation of RNA splicing is not yet fully elucidated, partly because SRPs have not yet been exhaustively identified and the experimental identification is labor-intensive. Therefore, we are motivated to design a new method for identifying SRPs with their functional roles in the regulation of RNA splicing. The experimentally verified SRPs were manually curated from research articles. According to the functional annotation of Splicing Related Gene Database, the collected SRPs were further categorized into four functional groups including small nuclear Ribonucleoprotein, Splicing Factor, Splicing Regulation Factor and Novel Spliceosome Protein. The composition of amino acid pairs indicates that there are remarkable differences among four functional groups of SRPs. Then, support vector machines (SVMs) were utilized to learn the predictive models for identifying SRPs as well as their functional roles. The cross-validation evaluation presents that the SVM models trained with significant amino acid pairs and functional domains could provide a better predictive performance. In addition, the independent testing demonstrates that the proposed method could accurately identify SRPs in mammals/plants as well as effectively distinguish between SRPs and RNA-binding proteins. This investigation provides a practical means to identifying potential SRPs and a perspective for exploring the regulation of RNA splicing.
Similar content being viewed by others
References
Jurica MS, Moore MJ (2003) Mol Cell 12:5
Zahler AM, Lane WS, Stolk JA, Roth MB (1992) Genes Dev 6:837
Keren H, Lev-Maor G, Ast G (2010) Nat Rev Genet 11:345
Hui JY (2009) Sci China Ser C Life Sci 52:253
Hsu JBK, Bretana NA, Lee TY, Huang HD (2011) Plos One 6:e27567
Wang ET, Sandberg R, Luo SJ, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB (2008) Nature 456:470
Johnson JM, Castle J, Garrett-Engele P, Kan ZY, Loerch PM, Armour CD, Santos R, Schadt EE, Stoughton R, Shoemaker DD (2003) Science 302:2141
Chen L, Zheng SK (2009) Genome Biol 10:R3
Ben-Dov C, Hartmann B, Lundgren J, Valcarcel J (2008) J Biol Chem 283:1229
Grabowski PJ, Black DL (2001) Prog Neurobiol 65:289
Barbosa-Morais NL, Carmo-Fonseca M, Aparicio S (2006) Genome Res 16:66
Reed R (2000) Curr Opin Cell Biol 12:340
Patel AA, Steitz JA (2003) Nat Rev Mol Cell Biol 4:960
Johnson PJ (2002) Proc Natl Acad Sci USA 99:3359
Wahl MC, Will CL, Luhrmann R (2009) Cell 136:701
Stamm S, Ben-Ari S, Rafalska I, Tang YS, Zhang ZY, Toiber D, Thanaraj TA, Soreq H (2005) Gene 344:1
Matlin AJ, Clark F, Smith CWJ (2005) Nat Rev Mol Cell Biol 6:386
Cartegni L, Chew SL, Krainer AR (2002) Nat Rev Genet 3:285
Maniatis T, Tasic B (2002) Nature 418:236
Smith CWJ, Valcarcel J (2000) Trends Biochem Sci 25:381
Black DL (2003) Ann Rev Biochem 72:291
Paz I, Akerman M, Dror I, Kosti I, Mandel-Gutfreund Y (2010) Nucleic Acids Res 38:W281
Wang BB, Brendel V (2004) Genome Biol 5:R102
Mueller WF, Hertel KJ (2011) Landes Bioscience and Springer Science+Business Media
Long JC, Caceres JF (2009) Biochem J 417:15
Cazalla D, Newton K, Caceres JF (2005) Mol Cell Biol 25:2969
Stojdl DF, Bell JC (1999) Biochem Cell Biol 77:293
Zhou ZL, Licklider LJ, Gygi SP, Reed R (2002) Nature 419:182
Rappsilber J, Ryder U, Lamond AI, Mann M (2002) Genome Res 12:1231
Kasyapa CS, Kunapuli P, Cowell JK (2005) Exp Cell Res 309:78
Chen YIG, Moore RE, Ge HY, Young MK, Lee TD, Stevens SW (2007) Nucleic Acids Res 35:3928
Barbazuk WB, Fu Y, McGinnis KM (2008) Genome Res 18:1381
Neubauer G, King A, Rappsilber J, Calvio C, Watson M, Ajuh P, Sleeman J, Lamond A, Mann M (1998) Nat Genet 20:46
Kumar M, Gromiha MM, Raghava GP (2010) J Mol Recognit 24:303
Han LY, Cai CZ, Lo SL, Chung MC, Chen YZ (2004) RNA 10:355
Ma X, Guo J, Wu J, Liu H, Yu J, Xie J, Sun X (2011) Proteins 79:1230
Wang L, Huang C, Yang MQ, Yang JY (2010) BMC Syst Biol 4(Suppl 1):S3
Murakami Y, Spriggs RV, Nakamura H, Jones S (2010) Nucleic Acids Res 38:W412
Liu ZP, Wu LY, Wang Y, Zhang XS, Chen L (2010) Bioinformatics 26:1616
Maetschke SR, Yuan Z (2009) BMC Bioinforma 10:341
Wang Y, Xue Z, Shen G, Xu J (2008) Amino Acids 35:295
Tong J, Jiang P, Lu ZH (2008) Comput Methods Programs Biomed 90:148
Kumar M, Gromiha MM, Raghava GP (2008) Proteins 71:189
Wang L, Brown SJ (2006) Conf Proc IEEE Eng Med Biol Soc 1:5830
Terribilini M, Lee JH, Yan C, Jernigan RL, Honavar V, Dobbs D (2006) RNA 12:1450
Hsu JB, Bretana NA, Lee TY, Huang HD (2011) PLoS One 6:e27567
Duvick J, Fu A, Muppirala U, Sabharwal M, Wilkerson MD, Lawrence CJ, Lushbough C, Brendel V (2008) Nucleic Acids Res 36:D959
Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O’Donovan C, Redaschi N, Yeh LS (2004) Nucleic Acids Res 32:D115
Panwar B, Raghava GP (2010) BMC Genom 11:507
Li W, Jaroszewski L, Godzik A (2001) Bioinformatics 17:282
Lin C.-J, Chen Y.-W (2003) NIPS 2003 feature selection challenge 1
Chen SA, Lee TY, Ou YY (2010) BMC Bioinforma 11:536
Jones DT (1999) J Mol Biol 292:195
Xie D, Li A, Wang MH, Fan ZW, Feng HQ (2005) Nucleic Acids Res 33:W105
Ou YY, Gromiha MM, Chen SA, Suwa M (2008) Comput Biol Chem 32:227
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Nucleic Acids Res 25:3389
Wang L, Huang C, Yang JY (2011) BMC Genom 11(Suppl 3):S2
Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, Finn RD, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Laugraud A, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Mulder N, Natale D, Orengo C, Quinn AF, Selengut JD, Sigrist CJ, Thimma M, Thomas PD, Valentin F, Wilson D, Wu CH, Yeats C (2009) Nucleic Acids Res 37:D211
Bairoch A (1991) Nucleic Acids Res 19(Suppl):2241
Attwood TK, Beck ME, Bleasby AJ, Parry-Smith DJ (1994) Nucleic Acids Res 22:3590
Sonnhammer EL, Eddy SR, Durbin R (1997) Proteins 28:405
Corpet F, Gouzy J, Kahn D (1998) Nucleic Acids Res 26:323
Chang C.-C, Lin C.-J (2001) Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm (2001)
Peng H, Ozaki T, Haggan-Ozaki V, Toyoda Y (2003) IEEE Trans Neural Netw 14:432
Chou KC, Shen HB (2007) Anal Biochem 370:1
Wang BB, Brendel V (2004) Genome Biol 5:R102
Kumar M, Gromiha AM, Raghava GPS (2008) Proteins Struct Funct Bioinforma 71:189
Bhasin M, Raghava GP (2004) J Biol Chem 279:23262
Wong YH, Lee TY, Liang HK, Huang CM, Wang TY, Yang YH, Chu CH, Huang HD, Ko MT, Hwang JK (2007) Nucleic Acids Res 35:W588
Sadygov RG, Yates JR 3rd (2003) Anal Chem 75:3792
Acknowledgments
The authors would like to sincerely thank the National Science Council of the Republic of China for financially supporting this research under Contract No. 101-2628-E-155-002-MY2 and 102-2221-E-155-069.
Author information
Authors and Affiliations
Corresponding author
Additional information
Justin Bo-Kai Hsu and Kai-Yao Huang have contributed equally to this work.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Hsu, J.BK., Huang, KY., Weng, TY. et al. Incorporating significant amino acid pairs and protein domains to predict RNA splicing-related proteins with functional roles. J Comput Aided Mol Des 28, 49–60 (2014). https://doi.org/10.1007/s10822-014-9706-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-014-9706-6