[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Protein Inference from the Integration of Tandem MS Data and Interactome Networks

Published: 01 November 2017 Publication History

Abstract

Since proteins are digested into a mixture of peptides in the preprocessing step of tandem mass spectrometry MS, it is difficult to determine which specific protein a shared peptide belongs to. In recent studies, besides tandem MS data and peptide identification information, some other information is exploited to infer proteins. Different from the methods which first use only tandem MS data to infer proteins and then use network information to refine them, this study proposes a protein inference method named TMSIN, which uses interactome networks directly. As two interacting proteins should co-exist, it is reasonable to assume that if one of the interacting proteins is confidently inferred in a sample, its interacting partners should have a high probability in the same sample, too. Therefore, we can use the neighborhood information of a protein in an interactome network to adjust the probability that the shared peptide belongs to the protein. In TMSIN, a multi-weighted graph is constructed by incorporating the bipartite graph with interactome network information, where the bipartite graph is built with the peptide identification information. Based on multi-weighted graphs, TMSIN adopts an iterative workflow to infer proteins. At each iterative step, the probability that a shared peptide belongs to a specific protein is calculated by using the Bayes’ law based on the neighbor protein support scores of each protein which are mapped by the shared peptides. We carried out experiments on yeast data and human data to evaluate the performance of TMSIN in terms of ROC, q-value, and accuracy. The experimental results show that AUC scores yielded by TMSIN are 0.742 and 0.874 in yeast dataset and human dataset, respectively, and TMSIN yields the maximum number of true positives when q-value less than or equal to 0.05. The overlap analysis shows that TMSIN is an effective complementary approach for protein inference.

References

[1]
Y. Zhang, B. R. Fonslow, B. Shan, M.-C. Baek, and J. R. Yates III, "Protein analysis by shotgunnbottom-up proteomics," Chemical Rev., vol. 113, no. 4, pp. 2343-2394, 2013.
[2]
J. S. Cottrell and U. London, "Probability-based protein identification by searching sequence databases using mass spectrometry data," Electrophoresis, vol. 20, no. 18, pp. 3551-3567, 1999.
[3]
J. K. Eng, A. L. McCormack, and J. R. Yates, "An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database," J. Amer. Soc. Mass Spectrometry, vol. 5, no. 11, pp. 976-989, 1994.
[4]
D. M. Creasy and J. S. Cottrell, "Unimod: Protein modifications for mass spectrometry," Proteomics, vol. 4, no. 6, pp. 1534-1536, 2004.
[5]
B. Ma, et al., "PEAKS: Powerful software for peptide de novo sequencing by tandem mass spectrometry," Rapid Commun. Mass Spectrometry, vol. 17, no. 20, pp. 2337-2342, 2003.
[6]
B. Dost, N. Bandeira, X. Li, Z. Shen, S. P. Briggs, and V. Bafna, "Accurate mass spectrometry based protein quantification via shared peptides," J. Comput. Biol., vol. 19, no. 4, pp. 337-348, 2012.
[7]
T. Huang, J. Wang, W. Yu, and Z. He, "Protein inference: A review," Briefings Bioinf., vol. 13, pp. 586-614, 2012.
[8]
P. Kelchtermans, et al., "Machine learning applications in proteomics research: How the past can boost the future," Proteomics, vol. 14, no. 4/5, pp. 353-366, 2014.
[9]
Y. Perez-Riverol, R. Wang, H. Hermjakob, M. Müller, V. Vesada, and J. A. Vizcaíno, "Open source libraries and frameworks for mass spectrometry based proteomics: A developer's perspective," Biochimica et Biophysica Acta (BBA)-Proteins Proteomics, vol. 1844, no. 1, pp. 63-76, 2014.
[10]
A. I. Nesvizhskii, A. Keller, E. Kolker, and R. Aebersold, "A statistical model for identifying proteins by tandem mass spectrometry," Analytical Chemistry, vol. 75, no. 17, pp. 4646-4658, 2003.
[11]
J. Feng, D. Q. Naiman, and B. Cooper, "Probability model for assessing proteins assembled from peptide sequences inferred from tandem mass spectrometry data," Analytical Chemistry, vol. 79, no. 10, pp. 3901-3911, 2007.
[12]
Y. F. Li, R. J. Arnold, Y. Li, P. Radivojac, Q. Sheng, and H. Tang, "A Bayesian approach to protein inference problem in shotgun proteomics," J. Comput. Biol., vol. 16, no. 8, pp. 1183-1193, 2009.
[13]
O. Serang, M. J. MacCoss, and W. S. Noble, "Efficient marginalization to compute protein posterior probabilities from shotgun mass spectrometry data," J. Proteome Res., vol. 9, no. 10, pp. 5346-5357, 2010.
[14]
C. Yang, Z. He, and W. Yu, "A combinatorial perspective of the protein inference problem," IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 10, no. 6, pp. 1542-1547, Nov./Dec. 2013.
[15]
R. E. Moore, M. K. Young, and T. D. Lee, "Qscore: An algorithm for evaluating SEQUEST database search results," J. Amer. Soc. Mass Spectrometry, vol. 13, no. 4, pp. 378-386, 2002.
[16]
R. G. Sadygov, H. Liu, and J. R. Yates, "Statistical models for protein validation using tandem mass spectral data and protein amino acid sequence databases," Analytical Chemistry, vol. 76, no. 6, pp. 1664-1671, 2004.
[17]
M. Bern and D. Goldberg, "Improved ranking functions for protein and modification-site identifications," J. Comput. Biol., vol. 15, no. 7, pp. 705-719, 2008.
[18]
J. Shi and F.-X. Wu, "Protein inference by assembling peptides identified from tandem mass spectra," Current Bioinf., vol. 4, no. 3, pp. 226-233, 2009.
[19]
Z.-Q. Ma, et al., "IDPicker 2.0: Improved protein assembly with high discrimination peptide identification filtering," J. Proteome Res., vol. 8, no. 8, pp. 3872-3881, 2009.
[20]
X. Yang, et al., "DBParser: Web-based software for shotgun proteomic data analyses," J. Proteome Res., vol. 3, no. 5, pp. 1002-1008, 2004.
[21]
D. J. Slotta, M. A. McFarland, and S. P. Markey, "MassSieve: Panning MS\MS peptide data for proteins," Proteomics, vol. 10, no. 16, pp. 3035-3039, 2010.
[22]
D. L. Tabb, W. H. McDonald, and J. R. Yates, "DTAselect and contrast: Tools for assembling and comparing protein identifications from shotgun proteomics," J. Proteome Res., vol. 1, no. 1, pp. 21-26, 2002.
[23]
T. Huang and Z. He, "A linear programming model for protein inference problem in shotgun proteomics," Bioinf., vol. 28, no. 22, pp. 2956-2962, 2012.
[24]
Q. Li, M. J. MacCoss, and M. Stephens, "A nested mixture model for protein identification using mass spectrometry," Ann. Appl. Statistics, vol. 4, no. 2, pp. 962-987, 2010.
[25]
J. Shi, B. Chen, and F. Wu, "Unifying protein inference and peptide identification with feedback to update consistency between peptides," Proteomics, vol. 13, no. 2, pp. 239-247, 2013.
[26]
M. Spivak, J. Weston, D. Tomazela, M. J. MacCoss, and W. S. Noble, "Direct maximization of protein identifications from tandem mass spectra," Mol. Cellular Proteomics, vol. 11, no. 2, 2012, Art. no. 012161.
[27]
B. C. Searle, "Scaffold: A bioinformatic tool for validating MS \MS-based proteomic studies," Proteomics, vol. 10, no. 6, pp. 1265-1269, 2010.
[28]
P. Alves, R. J. Arnold, M. V. Novotny, P. Radivojac, J. P. Reilly, and H. Tang, "Advancement in protein inference from shotgun proteomics using peptide detectability," in Proc. Conf. Pacific Symp. Biocomputing, 2007, pp. 409-420.
[29]
T. Huang, H. Gong, C. Yang, and Z. He, "ProteinLasso: A Lasso regression approach to protein inference problem in shotgun proteomics," Comput. Biol. Chemistry, vol. 43, pp. 46-54, 2013.
[30]
P. Kearney, H. Butler, K. Eng, and P. Hugo, "Protein identification and peptide expression resolver: Harmonizing protein identification with protein expression data," J. Proteome Res., vol. 7, no. 1, pp. 234-244, 2007.
[31]
J. Li, L. J. Zimmerman, B. Park, D. L. Tabb, D. C. Liebler, and B. Zhang, "Network-assisted protein identification and data interpretation in shotgun proteomics," Mol. Syst. Biol., vol. 5, no. 1, 2009, Art. no. 303.
[32]
S. R. Ramakrishnan, C. Vogel, T. Kwon, L. O. Penalva, E. M. Marcotte, and D. P. Miranker, "Mining gene functional networks to improve mass-spectrometry-based protein identification," Bioinf., vol. 25, no. 22, pp. 2955-2961, 2009.
[33]
S. R. Ramakrishnan, et al., "Integrating shotgun proteomics and mRNA expression data to improve protein identification," Bioinf., vol. 25, no. 11, pp. 1397-1403, 2009.
[34]
A. Keller, A. I. Nesvizhskii, E. Kolker, and R. Aebersold, "Empirical statistical model to estimate the accuracy of peptide identifications made by MSnMS and database search," Analytical Chemistry, vol. 74, no. 20, pp. 5383-5392, 2002.
[35]
J. Shi and F.-X. Wu, "A feedback framework for protein inference with peptides identified from tandem mass spectra," Proteome Sci., vol. 10, no. 1, 2012, Art. no. 68.
[36]
A. Keller, J. Eng, N. Zhang, X.-j. Li, and R. Aebersold, "A uniform proteomics MS/MS analysis platform utilizing open XML file formats," Mol. Syst. Biol., vol. 1, no. 1, 2005, Art. no. E1.
[37]
I. Xenarios, L. Salwinski, X. J. Duan, P. Higney, S.-M. Kim, and D. Eisenberg, "DIP, the database of interacting proteins: a research tool for studying cellular networks of protein interactions," Nucleic Acids Res., vol. 30, no. 1, pp. 303-305, 2002.
[38]
I. Lee, U. M. Blom, P. I. Wang, J. E. Shim, and E. M. Marcotte, "Prioritizing candidate disease genes by network-based boosting of genome-wide association data," Genome Res., vol. 21, no. 7, pp. 1109-1121, 2011.
[39]
H. Jeong, S. P. Mason, A.-L. Barabási, and Z. N. Oltvai, "Lethality and centrality in protein networks," Nature, vol. 411, no. 6833, pp. 41-42, 2001.
[40]
M. A. Calderwood, et al., "Epstein-Barr virus and virus human protein interaction maps," Proc. Nat. Academy Sci. United States America, vol. 104, no. 18, pp. 7606-7611, 2007.
[41]
E. Zotenko, J. Mestre, D. P. O'leary, and T. M. Przytycka, "Why do hubs in the yeast protein interaction network tend to be essential: Reexamining the connection between the network topology and essentiality," PLoS Comput. Biol., vol. 4, no. 8, 2008, Art. no. e1000140.
  1. Protein Inference from the Integration of Tandem MS Data and Interactome Networks

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
      IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 14, Issue 6
      November 2017
      287 pages

      Publisher

      IEEE Computer Society Press

      Washington, DC, United States

      Publication History

      Published: 01 November 2017
      Published in TCBB Volume 14, Issue 6

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 22
        Total Downloads
      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 04 Jan 2025

      Other Metrics

      Citations

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media