[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

A graph approach to the threshold all-against-all substring matching problem

Published: 12 June 2008 Publication History

Abstract

We present a novel graph model and an efficient algorithm for solving the “threshold all against all” problem, which involves searching two strings (with length M and N, respectively) for all maximal approximate substring matches of length at least S, with up to K differences. Our algorithm solves the problem in time O(MNK3), which is a considerable improvement over the previous known bound for this problem. We also provide experimental evidence that, in practice, our algorithm exhibits a better performance than its worst-case running time.

References

[1]
Abouelhoda, M. I. and Ohlebusch, E. 2005. Chaining algorithms for multiple genome comparison. J. Discrete Algorithms 3, 2--4, 321--341.]]
[2]
Arslan, A. N., Egecioglu, Ö., and Pevzner, P. A. 2001. A new approach to sequence comparison: normalized sequence alignment. Bioinformatics 17, 4, 327--337.]]
[3]
Baeza-Yates, R. A. and Gonnet, G. H. 1990. All-against-all sequence matching. Tech. rep., Department of Computer Science, Universidad de Chile.]]
[4]
Baeza-Yates, R. A. and Gonnet, G. H. 1999. A fast algorithm on average for all-against-all sequence matching. In SPIRE/CRIWG. IEEE, Los Alamitos, CA. 16--23.]]
[5]
Bailey, T. L. and Elkan, C. 1995. Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning 21, 1--2, 51--80.]]
[6]
Barsky, M., Stege, U., Thomo, A., and Upton, C. 2006. A new algorithm for fast all-against-all substring matching. In SPIRE, F. Crestani, P. Ferragina, and M. Sanderson, Eds. Lecture Notes in Computer Science, vol. 4209. Springer, New York. 360--366.]]
[7]
Burkhardt, S., Crauser, A., Ferragina, P., Lenhof, H.-P., Rivals, E., and Vingron, M. 1999. q-gram based database searching using a suffix array (quasar). In RECOMB '99: Proceedings of the 3rd Annual International Conference on Computational Molecular Biology. ACM, New York. 77--83.]]
[8]
Eppstein, D., Galil, Z., Giancarlo, R., and Italiano, G. F. 1992a. Sparse dynamic programming i: Linear cost functions. J. ACM 39, 3, 519--545.]]
[9]
Eppstein, D., Galil, Z., Giancarlo, R., and Italiano, G. F. 1992b. Sparse dynamic programming ii: Convex and concave cost functions. J. ACM 39, 3, 546--567.]]
[10]
Gusfield, D. 1997. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge.]]
[11]
Höhl, M., Kurtz, S., and Ohlebusch, E. 2002. Efficient multiple genome alignment. Bioinformatics 18, 312--320.]]
[12]
Kececioglu, J. D. 1993. The maximum weight trace problem in multiple sequence alignment. In CPM, A. Apostolico, M. Crochemore, Z. Galil, and U. Manber, Eds. Lecture Notes in Computer Science, vol. 684. Springer, New York. 106--119.]]
[13]
Lawrence, C., Altschul, S., Boguski, M., Liu, J., Neuwald, A., and Wootton, J. 1993. Detecting subtle sequence signals: A gibbs sampling strategy for multiple alignment. Science 262, 208--214.]]
[14]
Rasmussen, K. R., Stoye, J., and Myers, E. W. 2005. Efficient q-gram filters for finding all epsilon-matches over a given length. In RECOMB, S. Miyano, J. P. Mesirov, S. Kasif, S. Istrail, P. A. Pevzner, and M. S. Waterman, Eds. Lecture Notes in Computer Science, vol. 3500. Springer, New York. 189--203.]]
[15]
Reinert, K., Lenhof, H.-P., Mutzel, P., Mehlhorn, K., and Kececioglu, J. D. 1997. A branch-and-cut algorithm for multiple sequence alignment. In RECOMB '97: Proceedings of the 1st Annual International Conference on Computational Molecular Biology. ACM, New York. 241--250.]]
[16]
Rocke, E. 2000. Using suffix trees for gapped motif discovery. In CPM, R. Giancarlo and D. Sankoff, Eds. Lecture Notes in Computer Science, vol. 1848. Springer, New Yorlk. 335--349.]]
[17]
Roth, F., Hughes, D., Estep, E., and Church, M. 1998. Finding dna regulatory motifs within unaligned non-coding sequences clustered by whole genome mrna quantization. Nature Biotechnology 16, 10, 939--945.]]
[18]
Sankoff, D. and Kruskal, J. 1983. Time Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparisons. Addison-Wesley, Reading, MA.]]
[19]
Ukkonen, E. 1985. Algorithms for approximate string matching. Information and Control 64, 1--3, 100--118.]]
[20]
Vilo, J. 2002. Pattern discovery from biosequences. Ph.D. thesis, University of Helsinki, Finland.]]

Cited By

View all
  • (2010)JaPaFi: A Novel Program for the Identification of Highly Conserved DNA SequencesViruses10.3390/v20918672:9(1867-1885)Online publication date: 31-Aug-2010

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Journal of Experimental Algorithmics
ACM Journal of Experimental Algorithmics  Volume 12, Issue
2008
507 pages
ISSN:1084-6654
EISSN:1084-6654
DOI:10.1145/1227161
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2008
Accepted: 01 September 2007
Revised: 01 June 2007
Received: 01 February 2007
Published in JEA Volume 12

Author Tags

  1. String matching
  2. bioinformatics
  3. complexity

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2010)JaPaFi: A Novel Program for the Identification of Highly Conserved DNA SequencesViruses10.3390/v20918672:9(1867-1885)Online publication date: 31-Aug-2010

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media