[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Sequence-Based Prediction of microRNA-Binding Residues in Proteins Using Cost-Sensitive Laplacian Support Vector Machines

Published: 01 May 2013 Publication History

Abstract

The recognition of microRNA (miRNA)-binding residues in proteins is helpful to understand how miRNAs silence their target genes. It is difficult to use existing computational method to predict miRNA-binding residues in proteins due to the lack of training examples. To address this issue, unlabeled data may be exploited to help construct a computational model. Semisupervised learning deals with methods for exploiting unlabeled data in addition to labeled data automatically to improve learning performance, where no human intervention is assumed. In addition, miRNA-binding proteins almost always contain a much smaller number of binding than nonbinding residues, and cost-sensitive learning has been deemed as a good solution to the class imbalance problem. In this work, a novel model is proposed for recognizing miRNA-binding residues in proteins from sequences using a cost-sensitive extension of Laplacian support vector machines (CS-LapSVM) with a hybrid feature. The hybrid feature consists of evolutionary information of the amino acid sequence (position-specific scoring matrices), the conservation information about three biochemical properties (HKM) and mutual interaction propensities in protein-miRNA complex structures. The CS-LapSVM receives good performance with an F1 score of $(26.23 \pm 2.55\%)$ and an AUC value of $(0.805 \pm 0.020)$ superior to existing approaches for the recognition of RNA-binding residues. A web server called SARS is built and freely available for academic usage.

References

[1]
D.P. Bartel, "MicroRNAs: Target Recognition and Regulatory Functions," Cell, vol. 136, pp. 215-233, Jan. 2009.
[2]
L. He and G.J. Hannon, "MicroRNAs: Small RNAs with a Big Role in Gene Regulation," Nature Rev. Genetics, vol. 5, pp. 522-31, July 2004.
[3]
T.M. Rana, "Illuminating the Silence: Understanding the Structure and Function of Small RNAs," Nature Rev. Molecular Cell Biology, vol. 8, pp. 23-36, Jan. 2007.
[4]
Z.P. Liu et al., "Prediction of Protein-RNA Binding Sites by a Random Forest Method with Combined Features," Bioinformatics, vol. 26, pp. 1616-1622, July 2010.
[5]
X. Ma et al., "Prediction of RNA-Binding Residues in Proteins from Primary Sequence Using an Enriched Random Forest Model with a Novel Hybrid Feature," Proteins, vol. 79, pp. 1230-1239, Apr. 2011.
[6]
L. Wang et al., "BindN+ for Accurate Prediction of DNA and RNA-Binding Residues from Protein Sequence Features," BMC Systems Biology, vol. 4, Suppl 1, article S3, May 2010.
[7]
Y. Murakami et al., "PiRaNhA: A Server for the Computational Prediction of RNA-Binding Residues in Protein Sequences," Nucleic Acids Research, vol. 38, pp. W412-W416, July 2010.
[8]
M.B. Carson et al., "NAPS: A Residue-Level Nucleic Acid-Binding Prediction Server," Nucleic Acids Research, vol. 38, pp. W431-W435, July 2010.
[9]
H.M. Berman et al., "The Protein Data Bank," Nucleic Acids Research, vol. 28, pp. 235-242, Jan. 2000.
[10]
R. Apweiler et al., "UniProt: The Universal Protein Knowledge-Base," Nucleic Acids Research, vol. 32, pp. D115-D119, Jan. 2004.
[11]
Z.-H. Zhou, "When Semi-Supervised Learning Meets Ensemble Learning," Proc. Eighth Int'l Workshop Multiple Classifier Systems (MCS '09), pp. 529-538, 2009.
[12]
Z.-H. Zhou, "Learning with Unlabeled Data and Its Application to Image Retrieval," Proc. Ninth Pacific Rim Int'l Conf. Artificial Intelligence (PRICAI '06), pp. 5-10, 2006.
[13]
B. Settles, "Active Learning Literature Survey," Technical Report 1648, Dept. of Computer Sciences, Univ. of Wisconsin at Madison, 2009.
[14]
S.J. Huang, R. Jin, and Z.-H. Zhou, "Active Learning by Querying Informative and Representative Examples," Proc. Advances in Neural Information Processing Systems (NIPS '10), vol. 23, pp. 892- 900, 2010.
[15]
V. Vapnik, Statistical Learning Theory. Wiley. 1998.
[16]
Z.-H. Zhou and M. Li, "Tri-Training: Exploiting Unlabeled Data Using Three Classifiers," IEEE Trans. Knowledge and Data Eng., vol. 17, no. 11, pp. 1529-1541, Nov. 2005.
[17]
O. Chapelle, B. Schölkopf, and A. Zien, Semi-Supervised Learning. MIT Press, 2006.
[18]
Z.-H. Zhou and M. Li, "Semi-Supervised Learning by Disagreement," Knowledge and Information Systems, vol. 24, pp. 415-439, 2010.
[19]
X. Zhu, "Semi-Supervised Learning Literature Survey," Technical Report 1530, Dept. of Computer Sciences, Univ. of Wisconsin at Madison, 2006.
[20]
M. Belkin, P. Niyogi, and V. Sindhwani, "Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples," J. Machine Learning Research, vol. 7, pp. 2399-2434, 2006.
[21]
N.V. Chawla, N. Japkowicz, and A. Kotcz, "Editorial: Special Issue on Learning from Imbalanced Data Sets," ACM SIGKDD Explorations, vol. 6, pp. 1-6, 2004.
[22]
D. Margineantu, "When Does Imbalanced Data Require More than Cost-Sensitive Learning," Proc. the AAAI 2000 Workshop Learning from Imbalanced Data Sets, pp. 47-50, 2000.
[23]
G.M. Weiss and F.J. Provost, "Learning When Training Data Are Costly: The Effect of Class Distribution on Tree Induction," J. Artificial Intelligence Research, vol. 19, pp. 315-354, 2003.
[24]
M.A. Maloof, "Learning When Data Sets Are Imbalanced and When Costs Are Unequal and Unknown," Proc. ICML 2003 Workshop Learning from Imbalanced Data Sets II, 2003.
[25]
Y.F. Li, J.T. Kwok, and Z.-H. Zhou, "Cost-Sensitive Semi-Supervised Support Vector Machine," Proc. 24th AAAI Conf. Artificial Intelligence (AAAI '10), pp. 500-505, 2010.
[26]
S.F. Altschul et al., "Basic Local Alignment Search Tool," J. Molecular Biology, vol. 215, pp. 403-410, Oct. 1990.
[27]
J. Wu et al., "Prediction of DNA-Binding Residues in Proteins from Amino Acid Sequences Using a Random Forest Model with a Hybrid Feature," Bioinformatics, vol. 25, pp. 30-35, Jan. 2009.
[28]
S.F. Altschul et al., "Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs," Nucleic Acids Research, vol. 25, pp. 3389-3402, Sept. 1997.
[29]
Y. Wang et al., "Better Prediction of the Location of Alpha-Turns in Proteins with Support Vector Machine," Proteins, vol. 65, pp. 49- 54, Oct. 2006.
[30]
L. Wang and S.J. Brown, "BindN: A Web-Based Tool for Efficient Prediction of DNA and RNA Binding Sites in Amino Acid Sequences," Nucleic Acids Research, vol. 34, pp. W243-W248, July 2006.
[31]
L. Gómez-Chova, L.G. Camps-Valls, J. Munõz-Marí, and J. Calpe, "Semisupervised Image Classification with Laplacian Support Vector Machines," IEEE Geoscience and Remote Sensing Letters, vol. 5, no. 3, pp. 336-340, July 2008.
[32]
J. Wu et al., "A Semi-Supervised Learning Based Method: Laplacian Support Vector Machine Used in Diabetes Disease Diagnosis," Interdisciplinary Sciences: Computational Life Sciences, vol. 1, pp. 151-155, 2009.
[33]
X.B. Xue and Z.-H. Zhou, "Distributional Features for Text Categorization," IEEE Trans. Knowledge and Data Eng., vol. 21, no. 3, pp. 428-442, Mar. 2009.
[34]
C.E. Metz, "Basic Principles of ROC Analysis," Seminars in Nuclear Medicine, vol. 8, pp. 283-298, Oct. 1978.
[35]
D. Matthews et al., "Homeostasis Model Assessment: Insulin Resistance and ß-cell Function from Fasting Plasma Glucose and Insulin Concentrations in Man," Diabetologia, vol. 28, pp. 412-419, 1985.
[36]
T. Joachims, "Transductive Inference for Text Classification Using Support Vector Machines," Proc. 16th Int'l Conf. Machine Learning, pp. 200-209, 1999.

Cited By

View all
  1. Sequence-Based Prediction of microRNA-Binding Residues in Proteins Using Cost-Sensitive Laplacian Support Vector Machines

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
      IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 10, Issue 3
      May 2013
      272 pages

      Publisher

      IEEE Computer Society Press

      Washington, DC, United States

      Publication History

      Published: 01 May 2013
      Published in TCBB Volume 10, Issue 3

      Author Tags

      1. Laplacian support vector machine
      2. cost-sensitive learning
      3. evolutionary information
      4. miRNA-binding residues
      5. mutual interaction propensities

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 01 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)Robust Multi-view Classification with Sample ConstraintsNeural Processing Letters10.1007/s11063-021-10483-054:4(2589-2612)Online publication date: 1-Aug-2022
      • (2022)Robust SVM for Cost-Sensitive LearningNeural Processing Letters10.1007/s11063-021-10480-354:4(2737-2758)Online publication date: 1-Aug-2022
      • (2019)Robust SVM with adaptive graph learningWorld Wide Web10.1007/s11280-019-00766-x23:3(1945-1968)Online publication date: 27-Dec-2019
      • (2018)RNA Secondary Structure Prediction Based on Long Short-Term Memory ModelIntelligent Computing Theories and Application10.1007/978-3-319-95930-6_59(595-599)Online publication date: 15-Aug-2018
      • (2017)A fuzzy multi-objective hybrid TLBO-PSO approach to select the associated genes with breast cancerSignal Processing10.1016/j.sigpro.2016.07.035131:C(58-65)Online publication date: 1-Feb-2017
      • (2016)Model-Based Oversampling for Imbalanced Sequence ClassificationProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983784(1009-1018)Online publication date: 24-Oct-2016
      • (2015)Discriminative cost sensitive Laplacian score for face recognitionNeurocomputing10.1016/j.neucom.2014.10.059152:C(333-344)Online publication date: 25-Mar-2015

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media