[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2492517.2500245acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

A high performance algorithm for clustering of large-scale protein mass spectrometry data using multi-core architectures

Published: 25 August 2013 Publication History

Abstract

High-throughput mass spectrometers can produce thousands of peptide spectra from a single complex protein sample in a short amount of time. These data sets contain a substantial amount of redundancy (i.e. the same peptide is selected and identified multiple times in a single experiment) from peptides that may get selected multiple times in the liquid chromatography mass spectrometry (LC-MS/MS) experiment. The data from these mass spectrometers contain a substantial number of spectra that have low signal to noise (S/N) ratio and may not get interpreted due to poor quality. Recently, we presented a graph theoretic algorithm, CAMS (<u>C</u>lustering <u>A</u>lgorithm for <u>M</u>ass <u>S</u>pectra) for clustering mass spectrometry data. CAMS utilized a novel metric, called a F-set, that allows accurate identification of the spectra that are similar with much higher accuracy and sensitivity than if single peak comparisons were performed. In this paper we present a multithreaded algorithm, called P-CAMS, for clustering of mass spectral data on multicore machines. The algorithm relies on intelligent matrix completion for graph construction and a load-balancing scheme for substantial speedups. We study the scalability performance of the proposed parallel algorithm on a multicore machine using synthetically generated spectra with parameters carefully chosen to mimic real-world mass spectrometry datasets. Real experimental datasets were also generated for quality assessment of the clustering results from the proposed algorithm. The results show that the proposed algorithms have scalable runtime performances and gives clustering results similar to a serial algorithm. The study also provides insight into the design of high performance algorithms for irregular problems in proteomics on many-core architectures.

References

[1]
J. Hoffert, T. Pisitkun, G. Wang, F. Shen, and M. Knepper, "Quantitative phosphoproteomics of vasopressin-sensitive renal cells: regulation of aquaporin-2 phosphorylation at two sites," Proc. Natl. Acad. Sci. U.S.A., vol. 103, no. 18, pp. 7159--7164, 2006.
[2]
X. Li, S. A. Gerber, A. D. Rudner, S. A. Beausoleil, W. Haas, J. E. Elias, and S. P. Gygi, "Large-scale phosphorylation analysis of alpha-factor-arrested saccharomyces cerevisiae.," J Proteome Res, vol. 6, no. 3, pp. 1190--7, 2007.
[3]
A. Gruhler, J. V. Olsen, S. Mohammed, P. Mortensen, N. J. Fãrgeman, M. Mann, and O. N. Jensen, "Quantitative Phosphoproteomics Applied to the Yeast Pheromone Signaling Pathway," Molecular & Cellular Proteomics, vol. 4, pp. 310--327, March 2005.
[4]
J. D. Hoffert, T. Pisitkun, F. Saeed, J. H. Song, C.-L. Chou, and M. A. Knepper, "Dynamics of the g protein-coupled vasopressin v2 receptor signaling network revealed by quantitative phosphoproteomics," Molecular & Cellular Proteomics, vol. 11, no. 2, 2012.
[5]
T. Cantin, D. Venable, D. Cociorva, and R. Yates, "Iii quantitative phosphoproteomic analysis of the tumor necrosis factor pathway," J. Proteome Res., vol. 5, p. 127, 2006.
[6]
A. Beausoleil, M. Jedrychowski, D. Schwartz, E. Elias, J. Villen, J. Li, A. Cohn, C. Cantley, and P. Gygi, "Large-scale characterization of hela cell nuclear phosphoproteins," Proc. Natl. Acad. Sci. U.S.A., vol. 101, p. 12130, 2004.
[7]
B. Zhao, T. Pisitkun, J. D. Hoffert, M. A. Knepper, and F. Saeed, "Cphos: A program to calculate and visualize evolutionarily conserved functional phosphorylation sites," Proteomics, vol. 12, no. 22, pp. 3299--3303, 2012.
[8]
X. Jiang, M. Ye, G. Han, X. Dong, and H. Zou, "Classification filtering strategy to improve the coverage and sensitivity of phosphoproteome analysis," Analytical Chemistry, vol. 82, no. 14, pp. 6168--6175, 2010.
[9]
X. Du, F. Yang, N. P. Manes, D. L. Stenoien, M. E. Monroe, J. N. Adkins, D. J. States, S. O. Purvine, D. G. Camp, II, and R. D. Smith, "Linear discriminant analysis-based estimation of the false discovery rate for phosphopeptide identifications," Journal of Proteome Research, vol. 7, no. 6, pp. 2195--2203, 2008.
[10]
F. Saeed, T. Pisitkun, J. D. Hoffert, G. Wang, M. Gucek, and M. A. Knepper, "An efficient dynamic programming algorithm for phosphorylation site assignment of large-scale mass spectrometry data," in Bioinformatics and Biomedicine Workshops (BIBMW), 2012 IEEE International Conference on, pp. 618--625, IEEE, 2012.
[11]
I. Beer, E. Barnea, T. Ziv, and A. Admon, "Improving large-scale proteomics by clustering of mass spectrometry data," PROTEOMICS, vol. 4, no. 4, pp. 950--960, 2004.
[12]
A. M. Frank, N. Bandeira, Z. Shen, S. Tanner, S. P. Briggs, R. D. Smith, and P. A. Pevzner, "Clustering Millions of Tandem Mass Spectra," Journal of Proteome Research, vol. 7, pp. 113--122, January 2008.
[13]
A. M. Frank, N. Bandeira, Z. Shen, S. Tanner, S. P. Briggs, R. D. Smith, and P. A. Pevzner, "Clustering millions of tandem mass spectra," Journal of Proteome Research, vol. 7, no. 1, pp. 113--122, 2008.
[14]
U. V. Catalyurek, J. Feo, A. H. Gebremedhin, M. Halappanavar, and A. Pothen, "Graph coloring algorithms for multi-core and massively multithreaded architectures," Parallel Computing, vol. 38, no. 1011, pp. 576--594, 2012.
[15]
T. Majumder, M. Borgens, P. Pande, and A. Kalyanaraman, "Onchip network-enabled multicore platforms targeting maximum likelihood phylogeny reconstruction," Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol. 31, pp. 1061--1073, july 2012.
[16]
Y. Liu, B. Schmidt, and D. Maskell, "Parallelized short read assembly of large genomes using de bruijn graphs," BMC Bioinformatics, vol. 12, no. 1, p. 354, 2011.
[17]
J. Riedy, H. Meyerhenke, D. Bader, D. Ediger, and T. Mattson, "Analysis of streaming social networks and graphs on multicore architectures," in Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, pp. 5337--5340, IEEE, 2012.
[18]
A. Sarje, J. Zola, and S. Aluru, "Accelerating pairwise computations on cell processors," Parallel and Distributed Systems, IEEE Transactions on, vol. 22, pp. 69--77, jan. 2011.
[19]
F. Saeed, J. Hoffert, T. Pisitkun, and M. Knepper, "High performance phosphorylation site assignment algorithm for mass spectrometry data using multicore systems," in Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine, pp. 667--672, ACM, 2012.
[20]
F. Saeed, T. Pisitkun, M. A. Knepper, and J. D. Hoffert, "An efficient algorithm for clustering of large-scale mass spectrometry data," in Bioinformatics and Biomedicine (BIBM), 2012 IEEE International Conference on, pp. 1--4, IEEE, 2012.
[21]
D. L. Tabb, M. J. MacCoss, C. C. Wu, S. D. Anderson, and J. R. Yates, "Similarity among tandem mass spectra from proteomic experiments: detection, significance, and utility," Analytical Chemistry, vol. 75, no. 10, pp. 2470--2477, 2003. 12918992.
[22]
D. L. Tabb, M. R. Thompson, G. Khalsa-Moyers, N. C. VerBerkmoes, and W. H. McDonald, "Ms2grouper: Group assessment and synthetic replacement of duplicate proteomic tandem mass spectra," Journal of the American Society for Mass Spectrometry, vol. 16, no. 8, pp. 1250--1261, 2005.
[23]
S. R. Ramakrishnan, R. Mao, A. A. Nakorchevskiy, J. T. Prince, W. S. Willard, W. Xu, E. M. Marcotte, and D. P. Miranker, "A fast coarse filtering method for peptide identification by mass spectrometry," Bioinformatics, vol. 22, no. 12, pp. 1524--1531, 2006.
[24]
D. Dutta and T. Chen, "Speeding up tandem mass spectrometry database search: metric embeddings and fast near neighbor search," Bioinformatics, vol. 23, no. 5, pp. 612--618, 2007.
[25]
F. Saeed and A. Khokhar, "A domain decomposition strategy for alignment of multiple biological sequences on multiprocessor platforms," Journal of Parallel and Distributed Computing, vol. 69, no. 7, pp. 666--677, 2009.
[26]
F. Saeed, A. Perez-Rathke, J. Gwarnicki, T. Berger-Wolf, and A. Khokhar, "A high performance multiple sequence alignment system for pyrosequencing reads from multiple reference genomes," Journal of parallel and distributed computing, vol. 72, no. 1, pp. 83--93, 2012.

Cited By

View all
  • (2018)Fast-GPU-PCC: A GPU-Based Technique to Compute Pairwise Pearson’s Correlation Coefficients for Time Series Data—fMRI StudyHigh-Throughput10.3390/ht70200117:2(11)Online publication date: 20-Apr-2018
  • (2014)CAMS-RSIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2013.15211:1(128-141)Online publication date: 1-Jan-2014
  • (2014)Exploiting thread-level and instruction-level parallelism to cluster mass spectrometry data using multicore architecturesNetwork Modeling Analysis in Health Informatics and Bioinformatics10.1007/s13721-014-0054-13:1Online publication date: 15-Apr-2014

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASONAM '13: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
August 2013
1558 pages
ISBN:9781450322409
DOI:10.1145/2492517
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 August 2013

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

ASONAM '13
Sponsor:
ASONAM '13: Advances in Social Networks Analysis and Mining 2013
August 25 - 28, 2013
Ontario, Niagara, Canada

Acceptance Rates

Overall Acceptance Rate 116 of 549 submissions, 21%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)1
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Fast-GPU-PCC: A GPU-Based Technique to Compute Pairwise Pearson’s Correlation Coefficients for Time Series Data—fMRI StudyHigh-Throughput10.3390/ht70200117:2(11)Online publication date: 20-Apr-2018
  • (2014)CAMS-RSIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2013.15211:1(128-141)Online publication date: 1-Jan-2014
  • (2014)Exploiting thread-level and instruction-level parallelism to cluster mass spectrometry data using multicore architecturesNetwork Modeling Analysis in Health Informatics and Bioinformatics10.1007/s13721-014-0054-13:1Online publication date: 15-Apr-2014

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media