[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Seed-Set Construction by Equi-entropy Partitioning for Efficient and Sensitive Short-Read Mapping

  • Conference paper
Algorithms in Bioinformatics (WABI 2011)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6833))

Included in the following conference series:

Abstract

Spaced seeds have been shown to be superior to continuous seeds for efficient and sensitive homology search based on the seed-and-extend paradigm. Much the same is true in genome mapping of high-throughput short-read data. However, a highly sensitive search with multiple spaced patterns often requires the use of a great amount of index data. We propose a novel seed-set construction method for efficient and sensitive genome mapping of short reads with relatively high error rates, which uses only continuous seeds of variable length allowing a few errors. The seed lengths and allowable error positions are optimized on the basis of entropy, which is a measure of ambiguity or repetitiveness of mapping positions. These seeds can be searched efficiently using the Burrows-Wheeler transform of the reference genome. Evaluation using actual biological SOLiD sequence data demonstrated that our method was competitive in speed and sensitivity using much less memory and disk space in comparison to spaced-seed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 35.99
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 44.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Adjeroh, D., et al.: The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching. Springer Science+Business Media, LLC, Heidelberg (2008)

    Book  Google Scholar 

  2. Breu, H.: A theoretical understanding of 2 base color codes and its application to annotation, error detection, and error correction. White Paper, Applied Biosystems (2010), publication 139WP01-02 CO13982

    Google Scholar 

  3. Burrows, M., Wheeler, D.: A block-sorting lossless data compression algorithm. Tech. Rep. 124, Digital Equipment Corporation (1994)

    Google Scholar 

  4. Clark, D.: Compact Pat Trees. Ph.D. thesis, the University of Waterloo (1996)

    Google Scholar 

  5. Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proceedings of the 41th Annual IEEE Symposium on Foundations of Computer Science, pp. 390–398 (2000)

    Google Scholar 

  6. González, R., et al.: Practical implementation of rank and select queries. In: Proceedings of the 4th International Workshop on Experimental and Efficient Algorithms (WEA 2005), pp. 27–38. CTI Press and Ellinika Grammata, Greece (2005)

    Google Scholar 

  7. Holt, R., Jones, S.: The new paradigm of flow cell sequencing. Genome Research 18(6), 839–846 (2008)

    Article  Google Scholar 

  8. Homer, N., Merriman, B., Nelson, S.: BFAST: An alignment tool for large scale genome resequencing. PLoS ONE 4(11), e7767 (2009)

    Article  Google Scholar 

  9. Kimura, K., et al.: Computation of rank and select functions on hierarchical binary string and its application to genome mapping problems for short-read DNA sequences. Journal of Computational Biology 16(11), 1601–1613 (2009)

    Article  Google Scholar 

  10. Langmead, B., et al.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10(3), R25 (2009)

    Article  Google Scholar 

  11. Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14), 1754–1760 (2009)

    Article  Google Scholar 

  12. Li, H., Ruan, J., Durbin, R.: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research 18(8), 1851–1858 (2008)

    Article  Google Scholar 

  13. Li, M., et al.: PatternHunter II: Highly sensitive and fast homology search. In: Proceedings of the 14th International Conference on Genome Informatics 2003. Genome Informatics Series, pp. 164–175. World Scientific, Singapore (2003)

    Google Scholar 

  14. Li, R., et al.: SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25(15), 1966–1967 (2009)

    Article  Google Scholar 

  15. Lippert, R.: Space-efficient whole genome comparisons with Burrows-Wheeler transforms. Journal of Computational Biology 12(4), 407–415 (2005)

    Article  Google Scholar 

  16. Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics 18(3), 440–445 (2002)

    Article  Google Scholar 

  17. Myers, G.: A fast bit-vector algorithm for approximate string matching based dynamic programming. Journal of the ACM 46(3), 395–415 (1999)

    Article  MATH  Google Scholar 

  18. Schwartz, S., Kent, W., et al.: Humanmouse alignments with BLASTZ. Genome Research 13, 103–107 (2003)

    Article  Google Scholar 

  19. Trapnell, C., Salzberg, S.: How to map billions of short reads onto genomes. Nature Biotechnology 27(5), 455–457 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kimura, K., Koike, A., Nakai, K. (2011). Seed-Set Construction by Equi-entropy Partitioning for Efficient and Sensitive Short-Read Mapping. In: Przytycka, T.M., Sagot, MF. (eds) Algorithms in Bioinformatics. WABI 2011. Lecture Notes in Computer Science(), vol 6833. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23038-7_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23038-7_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23037-0

  • Online ISBN: 978-3-642-23038-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics