[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Approximate k-Mer matching using fuzzy hash maps

Published: 01 January 2014 Publication History

Abstract

We present a fuzzy technique for approximate k-mer matching that combines the speed of hashing with the sensitivity of dynamic programming. Our approach exploits the collision detection mechanism used by hash maps, unifying the two phases of "seed and extend" into a single operation that executes in close to O(1) average time.

References

[1]
W. Pearson and D. Lipman, "Improved Tools for Biological Sequence Comparison," Proc. Nat'l Academy of Sciences USA, vol. 85, pp. 2444-2448, 1988.
[2]
X. Yang, K.S. Dorman, and S. Aluru, "Reptile: Representative Tiling for Short Read Error Correction," Bioinformatics, vol. 26, pp. 2526-2533, 2010.
[3]
X. Li and M.S. Waterman, "Estimating the Repeat Structure and Length of DNA Sequences Using l-Tuples," Genome Research, vol. 13, pp. 1916-1922, 2003.
[4]
R.M. Idury and M.S. Waterman, "A New Algorithm for DNA Sequence Assembly," J. Computational Biology, vol. 2, pp. 291-306, 1995.
[5]
J. Butler, I. MacCallum, M. Kleber, I. Shlyakhter, M.K. Belmonte, E.S. Lander, C. Nusbaum, and D.B. Jaffe, "ALLPATHS: De Novo Assembly of Whole-Genome Shotgun Microreads," Genome Research, vol. 18, pp. 810-820, 2008.
[6]
T.F. Smith and M.S. Waterman, "Identification of Common Molecular Subsequences," J. Molecular Biology, vol. 147, pp. 195-197, 1981.
[7]
J.R. Miller, S. Koren, and G. Sutton, "Assembly Algorithms for Next-Generation Sequencing Data," Genomics, vol. 95, pp. 315-327, 2010.
[8]
S.F. Altschul, W. Gish, W. Miller, E.W. Myers, and D.J. Lipman, "Basic Local Alignment Search Tool," J. Molecular Biology, vol. 215, pp. 403-410, 1990.
[9]
W.J. Kent, "BLAT - The BLAST-Like Alignment Tool," Genome Research, vol. 12, pp. 656-664, 2002.
[10]
W.-P. Lee, M.P. Stromberg, A. Ward, C. Stewart, E.P. Garrison et al. "MOSAIK: A Hash-Based Algorithm for Accurate Next-Generation Sequencing Short-Read Mapping," PLoS ONE, vol. 9, no. 3, p. e90581, 2014.
[11]
B. Ma, J. Tromp, and M. Li, "PatternHunter: Faster and More Sensitive Homology Search," Bioinformatics, vol. 18, pp. 440-445, 2002.
[12]
M. Li, B. Ma, D. Kisman, and J. Tromp, "Patternhunter II: Highly Sensitive and Fast Homology Search," J. Bioinformatics and Computational Biology, vol. 2, pp. 417-439, 2004.
[13]
L. Noé and G. Kucherov, "YASS: Enhancing the Sensitivity of DNA Similarity Search," Nucleic Acids Research, vol. 33, pp. W540-W543, 2005.
[14]
D. Mak, Y. Gelfand, and G. Benson, "Indel Seeds for Homology Search," Bioinformatics, vol. 22, pp. e341-e349, 2006.
[15]
H. Li and N. Homer, "A Survey of Sequence Alignment Algorithms for Next-Generation Sequencing," Briefings in Bioinformatics, vol. 11, pp. 473-483, 2010.
[16]
P. Flicek and E. Birney, "Sense from Sequence Reads: Methods for Alignment and Assembly," Nature Methods, vol. 6, pp. S6-S12, 2009.
[17]
S. Misra, A. Agrawal, W.K. Liao, and A. Choudhary, "Anatomy of a Hash-Based Long Read Sequence Mapping Algorithm for Next Generation DNA Sequencing," Bioinformatics, vol. 27, pp. 189-195, 2011.
[18]
J. Healy and D. Chambers, "Fast and Accurate Genome Anchoring Using Fuzzy Hash Maps," Proc. Fifth Int'l Conf. Practical Applications of Computational Biology & Bioinformatics (PACBB '11), pp. 149-156, 2011.
[19]
J. Healy and D. Chambers, "De Novo Draft Genome Assembly Using Fuzzy K-mers," Proc. Third Int'l Conf. Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO '11), pp. 104-109, 2011.
[20]
T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein, Introduction to Algorithms. MIT Press, 2001.
[21]
V. Topac, "Efficient Fuzzy Search Enabled Hash Map," Proc. Fourth Int'l Workshop Soft Computing Applications, pp. 39-44, 2010.
[22]
K. Tanaka and T. Niimura, An Introduction to Fuzzy Logic for Practical Applications. Springer, 1997.
[23]
V. Levenshtein, "Binary Codes Capable of Correcting Deletions, Insertions, and Reversals," Soviet Physics Doklady, vol. 10, pp. 707-710, 1966.
[24]
J. Gosling, B. Joy, G. Steele, and G. Bracha, Java (TM) Language Specification, The (Java (Addison-Wesley)). Addison-Wesley Professional, 2005.
[25]
R.W. Hamming, "Error Detecting and Error Correcting Codes," Bell System Technical J., vol. 29, pp. 147-160, 1950.
[26]
Oracle, "The Java Collections Framework," Java Language Application Programming Interface, 2011.

Cited By

View all
  • (2021)Cross-layer Design for Computing-in-MemoryProceedings of the 26th Asia and South Pacific Design Automation Conference10.1145/3394885.3431617(132-139)Online publication date: 18-Jan-2021
  • (2020)Seed-and-vote based in-memory accelerator for DNA read mappingProceedings of the 39th International Conference on Computer-Aided Design10.1145/3400302.3415651(1-9)Online publication date: 2-Nov-2020
  • (2016)Nonlinear approach for estimating WCET during programming phaseCluster Computing10.1007/s10586-016-0606-519:3(1449-1459)Online publication date: 1-Sep-2016

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 11, Issue 1
January/February 2014
265 pages

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Accepted: 21 February 2014
Published: 01 January 2014
Revised: 17 December 2013
Received: 21 June 2013
Published in TCBB Volume 11, Issue 1

Author Tags

  1. biology and genetics
  2. fuzzy set

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)3
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Cross-layer Design for Computing-in-MemoryProceedings of the 26th Asia and South Pacific Design Automation Conference10.1145/3394885.3431617(132-139)Online publication date: 18-Jan-2021
  • (2020)Seed-and-vote based in-memory accelerator for DNA read mappingProceedings of the 39th International Conference on Computer-Aided Design10.1145/3400302.3415651(1-9)Online publication date: 2-Nov-2020
  • (2016)Nonlinear approach for estimating WCET during programming phaseCluster Computing10.1007/s10586-016-0606-519:3(1449-1459)Online publication date: 1-Sep-2016

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media