[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1862344.1862359acmotherconferencesArticle/Chapter ViewAbstractPublication PagessisapConference Proceedingsconference-collections
research-article

An inverted index for mass spectra similarity query and comparison with a metric-space method: case study

Published: 18 September 2010 Publication History

Abstract

Query performance is a determining factor in the adoption of an indexing method for similarity query. Metric space indexing methods take great pride in their general applicability. However, it is usually hard for a general method to perform well for every domain. Therefore, it is of interest to investigate the performance of metric-space methods, comparing with domain specific methods, on a particular domain. This paper describes such an investigation for proteomic mass spectra. An inverted index method that exploits the sparsity of mass spectra binary format data and acts as a coarse filter before fine ranking is proposed and empirically compared with an existing metric-space indexing method. Results show that the inverted index method yields greater search efficiency and outperforms the metric-space method in query speed and index size.

References

[1]
}}Bozkaya, T. and M. Ozsoyoglu, Indexing large metric spaces for similarity search queries. ACM Trans. Database Syst., 1999. 24(3): p. 361--404.
[2]
}}Chavez, E., G. Navarro, R. Baeza-Yates, and J. Marroqu, Searching in metric spaces. ACM Computing Surveys (CSUR), 2001. 33(3): p. 273--321.
[3]
}}D. Dutta and T. Chen. Speeding up tandem mass spectrometry database search: metric embeddings and fast near neighbor search. Bioinformatics, 23(5):612--618, 2007.
[4]
}}Ari M. Frank, Nuno Bandeira, Zhouxin Shen, Stephen Tanner, Steven P. Briggs, Richard D. Smith, and Pavel A. Pevzner. Clustering Millions of Tandem Mass Spectra. J. Proteome Res. 2008 January; 7(1): 113--122.
[5]
}}A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In The VLDB Journal, pages 518--529, 1999.
[6]
}}Hjaltason, G. R. and H. Samet, Index-driven similarity search in metric spaces. ACM Transactions on Database Systems (TODS), 2003. 28(4): p. 517--580.
[7]
}}D. Hoksza and T. Skopal. Index-based approach to similarity search in protein and nucleotide databases. CEUR Proc. Dateso 2007, vol. 235, pp. 67--80. 2007.
[8]
}}Miranker, D. P., Xu W. and Mao, R. Mobios: a metric-space dbms to support biological discovery. Proceedings of the International Conference on Scientific and Statistical Database Management System, pp. 241--244, 2003.
[9]
}}The MoBIoS repository: http://aug.csres.utexas.edu/sisap2010_ms
[10]
}}J. Novák, D. Hoksza. Parametrised Hausdorff Distance as a Non-Metric Similarity Model for Tandem Mass Spectrometry. In the Proceedings of the Dateso 2010 Annual International Workshop on DAtabases, TExts, Specifications and Objects. Stedronin-Plazy, Czech Republic, April 21, 2010.
[11]
}}Perkins, D. et al. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis, 20, 3551--3567, 1999.
[12]
}}Pevzner, P. et al. Efficiency of database search for identification of mutated and modified proteins via mass spectrometry. Genome Res., 11, 290--299, 2001.
[13]
}}J. Prince, M. Carlson, R. Wang, P. Lu, and E. Marcotte. The need for a public proteomics repository. Nature Biotechnology, 22(4):471--472, 2004.
[14]
}}Ramakrishnan, S. R., Mao, R., Nakorchevskiy, A. A., Prince, J. T., Willard, W. S., Xu, W., Marcotte, E. M., and Miranker, D. P. 2006. A fast coarse filtering method for peptide identification by mass spectrometry. Bioinformatics 22, 12 (Jun. 2006), 1524--1531.
[15]
}}Samet, H., Foundations of Multidimensional and Metric Data Structures. 2006, Morgan-Kaufmann.
[16]
}}The Sashimi mass spectra repository: http://sashimi.sourceforge.net.
[17]
}}G. Shakhnarovich, T. Darrell, and P. Indyk, editors. Nearest-Neighbor Methods g and Vision: Theory and Practice (Neural Information Processing). The MIT Press, March 2006.
[18]
}}Yates III, J. et al. Method to correlate tandem mass spectral data of modified peptides to amino acid sequences in the protein database. Anal. Chem., 67, 1426--1436, 1995.
[19]
}}P. Zezula, G. Amato, V. Dohnal and M. Batko. Similarity Search: The Metric Space Approach (Advances in Database Systems). Springer, New York, USA. 2006.
[20]
}}Zhang, W. and Chait, B. ProFound---an expert system for protein identification using mass spectrometric peptide mapping information. Anal. Chem., 72, 2482--2489, 2000.

Cited By

View all
  • (2013)Building an Information Retrieval System: Global Indexing or Local Indexing?Software Engineering and Applications10.12677/SEA.2013.2100202:01(6-14)Online publication date: 2013
  • (2012)On optimizing the non-metric similarity search in tandem mass spectra by clusteringProceedings of the 8th international conference on Bioinformatics Research and Applications10.1007/978-3-642-30191-9_18(189-200)Online publication date: 21-May-2012
  • (2011)Element detection relying on information retrieval techniques applied to laser spectroscopyProceedings of the Fourth International Conference on SImilarity Search and APplications10.1145/1995412.1995429(89-95)Online publication date: 30-Jun-2011

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
SISAP '10: Proceedings of the Third International Conference on SImilarity Search and APplications
September 2010
130 pages
ISBN:9781450304207
DOI:10.1145/1862344
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • Bilkent University: Bilkent University
  • Mexican Computer Science Society

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 September 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. inverted index
  2. mass spectra
  3. metric-space indexing
  4. similarity query
  5. sparse matrix

Qualifiers

  • Research-article

Funding Sources

Conference

SISAP '10
Sponsor:
  • Bilkent University

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2013)Building an Information Retrieval System: Global Indexing or Local Indexing?Software Engineering and Applications10.12677/SEA.2013.2100202:01(6-14)Online publication date: 2013
  • (2012)On optimizing the non-metric similarity search in tandem mass spectra by clusteringProceedings of the 8th international conference on Bioinformatics Research and Applications10.1007/978-3-642-30191-9_18(189-200)Online publication date: 21-May-2012
  • (2011)Element detection relying on information retrieval techniques applied to laser spectroscopyProceedings of the Fourth International Conference on SImilarity Search and APplications10.1145/1995412.1995429(89-95)Online publication date: 30-Jun-2011

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media