Seed-Set Construction by Equi-entropy Partitioning for Efficient and Sensitive Short-Read Mapping

Kouichi Kimura²¹,
Asako Koike²¹ &
Kenta Nakai²²

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6833))

Included in the following conference series:

International Workshop on Algorithms in Bioinformatics

1070 Accesses
1 Citations

Abstract

Spaced seeds have been shown to be superior to continuous seeds for efficient and sensitive homology search based on the seed-and-extend paradigm. Much the same is true in genome mapping of high-throughput short-read data. However, a highly sensitive search with multiple spaced patterns often requires the use of a great amount of index data. We propose a novel seed-set construction method for efficient and sensitive genome mapping of short reads with relatively high error rates, which uses only continuous seeds of variable length allowing a few errors. The seed lengths and allowable error positions are optimized on the basis of entropy, which is a measure of ambiguity or repetitiveness of mapping positions. These seeds can be searched efficiently using the Burrows-Wheeler transform of the reference genome. Evaluation using actual biological SOLiD sequence data demonstrated that our method was competitive in speed and sensitivity using much less memory and disk space in comparison to spaced-seed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

OPTIMA: sensitive and accurate whole-genome alignment of error-prone genomic maps by combinatorial indexing and technology-agnostic statistical analysis

Article Open access 19 January 2016

Context-aware seeds for read mapping

Article Open access 23 May 2020

Accurate high throughput alignment via line sweep-based seed processing

Article Open access 26 April 2019

References

Adjeroh, D., et al.: The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching. Springer Science+Business Media, LLC, Heidelberg (2008)
Book Google Scholar
Breu, H.: A theoretical understanding of 2 base color codes and its application to annotation, error detection, and error correction. White Paper, Applied Biosystems (2010), publication 139WP01-02 CO13982
Google Scholar
Burrows, M., Wheeler, D.: A block-sorting lossless data compression algorithm. Tech. Rep. 124, Digital Equipment Corporation (1994)
Google Scholar
Clark, D.: Compact Pat Trees. Ph.D. thesis, the University of Waterloo (1996)
Google Scholar
Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proceedings of the 41th Annual IEEE Symposium on Foundations of Computer Science, pp. 390–398 (2000)
Google Scholar
González, R., et al.: Practical implementation of rank and select queries. In: Proceedings of the 4th International Workshop on Experimental and Efficient Algorithms (WEA 2005), pp. 27–38. CTI Press and Ellinika Grammata, Greece (2005)
Google Scholar
Holt, R., Jones, S.: The new paradigm of flow cell sequencing. Genome Research 18(6), 839–846 (2008)
Article Google Scholar
Homer, N., Merriman, B., Nelson, S.: BFAST: An alignment tool for large scale genome resequencing. PLoS ONE 4(11), e7767 (2009)
Article Google Scholar
Kimura, K., et al.: Computation of rank and select functions on hierarchical binary string and its application to genome mapping problems for short-read DNA sequences. Journal of Computational Biology 16(11), 1601–1613 (2009)
Article Google Scholar
Langmead, B., et al.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10(3), R25 (2009)
Article Google Scholar
Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14), 1754–1760 (2009)
Article Google Scholar
Li, H., Ruan, J., Durbin, R.: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research 18(8), 1851–1858 (2008)
Article Google Scholar
Li, M., et al.: PatternHunter II: Highly sensitive and fast homology search. In: Proceedings of the 14th International Conference on Genome Informatics 2003. Genome Informatics Series, pp. 164–175. World Scientific, Singapore (2003)
Google Scholar
Li, R., et al.: SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25(15), 1966–1967 (2009)
Article Google Scholar
Lippert, R.: Space-efficient whole genome comparisons with Burrows-Wheeler transforms. Journal of Computational Biology 12(4), 407–415 (2005)
Article Google Scholar
Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics 18(3), 440–445 (2002)
Article Google Scholar
Myers, G.: A fast bit-vector algorithm for approximate string matching based dynamic programming. Journal of the ACM 46(3), 395–415 (1999)
Article MATH Google Scholar
Schwartz, S., Kent, W., et al.: Humanmouse alignments with BLASTZ. Genome Research 13, 103–107 (2003)
Article Google Scholar
Trapnell, C., Salzberg, S.: How to map billions of short reads onto genomes. Nature Biotechnology 27(5), 455–457 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Central Research Laboratory, Hitachi Ltd., 1-280 Higashi-Koigakubo, Kokubunji, Tokyo, 185-8601, Japan
Kouichi Kimura & Asako Koike
The Institute of Medical Science, The University of Tokyo, 4-6-1 Shirokane-dai, Minato-ku, Tokyo, 108-8639, Japan
Kenta Nakai

Authors

Kouichi Kimura
View author publications
You can also search for this author in PubMed Google Scholar
Asako Koike
View author publications
You can also search for this author in PubMed Google Scholar
Kenta Nakai
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Center for Biotechnology Information, U.S. National Library of Medicine, 8600 Rockville Pike, 20894, Bethesda, MD, USA
Teresa M. Przytycka
Institut National de Recherche en Informatique et en Automatique (INRIA) and Université Lyon 1 (UCBL), 43 bd du 11 Novembre 1918, 69622, Villeurbanne cedex, France
Marie-France Sagot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kimura, K., Koike, A., Nakai, K. (2011). Seed-Set Construction by Equi-entropy Partitioning for Efficient and Sensitive Short-Read Mapping. In: Przytycka, T.M., Sagot, MF. (eds) Algorithms in Bioinformatics. WABI 2011. Lecture Notes in Computer Science(), vol 6833. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23038-7_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-23038-7_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23037-0
Online ISBN: 978-3-642-23038-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Seed-Set Construction by Equi-entropy Partitioning for Efficient and Sensitive Short-Read Mapping

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

OPTIMA: sensitive and accurate whole-genome alignment of error-prone genomic maps by combinatorial indexing and technology-agnostic statistical analysis

Context-aware seeds for read mapping

Accurate high throughput alignment via line sweep-based seed processing

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Seed-Set Construction by Equi-entropy Partitioning for Efficient and Sensitive Short-Read Mapping

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

OPTIMA: sensitive and accurate whole-genome alignment of error-prone genomic maps by combinatorial indexing and technology-agnostic statistical analysis

Context-aware seeds for read mapping

Accurate high throughput alignment via line sweep-based seed processing

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation