Abstract
Protein identification via mass spectrometry forms the foundation of high-throughput proteomics. Tandem mass spectrometry, when applied to a complex mixture of peptides, selects and fragments each peptide to reveal its amino-acid sequence structure. The successful analysis of such an experiment typically relies on amino-acid sequence databases to provide a set of biologically relevant peptides to examine. A key sub-problem, then, for amino-acid sequence database search engines that analyze tandem mass spectra is to efficiently generate all the peptide candidates from a sequence database with mass equal to one of a large set of observed peptide masses. We demonstrate that to solve the problem efficiently, we must deal with substring redundancy in the amino-acid sequence database and focus our attention on looking up the observed peptide masses quickly. We show that it is possible, with some preprocessing and memory overhead, to solve the peptide candidate generation problem in time asymptotically proportional to the size of the sequence database and the number of peptide candidates output.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
V. Bafna and N. Edwards. Scope: A probabilistic model for scoring tandem mass spectra against a peptide database. Bioinformatics, 17(Suppl. 1):S13–S21, 2001.
T. Chen, M. Kao, M. Tepel, J. Rush, and G. Church. A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry. In ACMSIAM Symposium on Discrete Algorithms, 2000.
M. Cieliebak, T. Erlebach, S. Lipták, J. Stoye, and E. Welzl. Algorithmic complexity of protein identification: Combinatorics of weighted strings. Submitted to Discrete Applied Mathematics special issue on Combinatorics of Searching, Sorting, and Coding., 2002.
J. Cottrell and C. Sutton. The identification of electrophoretically separated proteins by peptide mass fingerprinting. Methods in Molecular Biology, 61:67–82, 1996.
V. Dancik, T. Addona, K. Clauser, J. Vath, and P. Pevzner. De novo peptide sequencing via tandem mass spectrometry. Journal of Computational Biology, 6:327–342, 1999.
J. Eng, A. McCormack, and J. Yates. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Journal of American Society of Mass Spectrometry, 5:976–989, 1994.
D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.
P. James, M. Quadroni, E. Carafoli, and G. Gonnet. Protein identification in dna databases by peptide mass fingerprinting. Protein Science, 3(8):1347–1350, 1994.
S. Kurtz. Reducing the space requirement of suffix trees. Software-Practice and xperience, 29(13):1149–1171, 1999.
D. Pappin. Peptide mass fingerprinting using maldi-tof mass spectrometry. Methods in Molecular Biology, 64:165–173, 1997.
D. Pappin, P. Hojrup, and A. Bleasby. Rapid identification of proteins by peptidemass fingerprinting. Currents in Biology, 3(6):327–332, 1993.
D. Perkins, D. Pappin, D. Creasy, and J. Cottrell. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis, 20(18):3551–3567, 1997.
P. Pevzner, V. Dancik, and C. Tang. Mutation-tolerant protein identification by mass-spectrometry. In R. Shamir, S. Miyano, S. Istrail, P. Pevzner, and M. Waterman, editors, International Conference on Computational Molecular Biology (RECOMB), pages 231–236. ACM Press, 2000.
J. Taylor and R. Johnson. Sequence database searches via de novo peptide sequencing by mass spectrometry. Rapid Communications in Mass Spectrometry, 11:1067–1075, 1997.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Edwards, N., Lippert, R. (2002). Generating Peptide Candidates from Amino-Acid Sequence Databases for Protein Identification via Mass Spectrometry. In: Guigó, R., Gusfield, D. (eds) Algorithms in Bioinformatics. WABI 2002. Lecture Notes in Computer Science, vol 2452. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45784-4_6
Download citation
DOI: https://doi.org/10.1007/3-540-45784-4_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44211-0
Online ISBN: 978-3-540-45784-8
eBook Packages: Springer Book Archive