Searching for Supermaximal Repeats in Large DNA Sequences

Chen Na Lian¹,
Mihail Halachev¹ &
Nematollaah Shiri¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 13))

Included in the following conference series:

International Conference on Bioinformatics Research and Development

752 Accesses
2 Citations

Abstract

We study the problem of finding supermaximal repeats in large DNA sequences. For this, we propose an algorithm called SMR which uses an auxiliary index structure (POL), which is derived from and replaces the suffix tree index STTD64 [1]. The results of our numerous experiments using the 24 human chromosomes data indicate that SMR outperforms the solution provided as part of the Vmatch [2] software tool. In searching for supermaximal repeats of size at least 10 bases, SMR is twice faster than Vmatch; for a minimum length of 25 bases, SMR is 7 times faster; and for repeats of length at least 200, SMR is about 9 times faster. We also study the cost of POL in terms of time and space requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 58.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Alphabet-Independent Algorithms for Finding Context-Sensitive Repeats in Linear Time

3S: A Fast and Exhaustive STR Search Algorithm

Fast Algorithm for Vernier Search of Long Repeats in DNA Sequences with Bounded Error Density

References

Halachev, M., Shiri, N., Thamildurai, A.: Efficient and scalable indexing techniques for biological sequence data. In: Hochreiter, S., Wagner, R. (eds.) BIRD 2007. LNCS (LNBI), vol. 4414, pp. 464–479. Springer, Heidelberg (2007)
Chapter Google Scholar
Vmatch: large scale sequence analysis software, http://www.vmtach.de
Gusfield, D.: Algorithms on strings, trees and sequences: computer science and computational biology. Cambridge University Press, New York (1997)
MATH Google Scholar
Korf, B.R.: Human Genetics: A Problem-Based Approach. Blackwell, Boston (2000)
Google Scholar
Watson, J., Hopkins, N., Roberts, J., Steitz, J., Weiner, A.: Molecular Biology of the Gene, 6th edn. Benjamin-Cummings, Menlo Park (2007)
Google Scholar
Kurtz, S.: Reducing the space requirement of suffix trees. Software Practice and Experience 29(13), 1149–1171 (1999)
Article Google Scholar
Grossi, R., Vitter, J.S.: Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching. SIAM Journal on Computing 35(2), 378–407 (2005)
Article MATH MathSciNet Google Scholar
Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: 41st IEEE Symposium on Foundations of Computer Science, pp. 390–398 (2000)
Google Scholar
Valimaki, N., Gerlach, W., Dixit, K., Makinen, V.: Compressed suffix tree - a basis for genome-scale sequence analysis. Bioinformatics 23(5), 629–630 (2007)
Article Google Scholar
Hon, W.-K., Lam, T.W., Sung, W.-K., Tse, W.-L., Wong, C.-K., Yiu, S.-M.: Practical Aspects of Compressed Suffix Arrays and FM-index in Searching DNA Sequences. In: 6th Workshop on Algorithm Engineering and Experiments, pp. 31–38 (2004)
Google Scholar
Kurtz, S., Schleiermacher, C.: REPuter: Fast Computation of Maximal Repeats in Complete Genomes. Bioinformatics 15, 426–427 (1999)
Article Google Scholar
RepeatMatch, http://mummer.sourceforge.net/manual/#repeat
RepeatMasker, http://repeatmasker.org/
Bedell, J.A., Korf, I., Gish, W.: MaskerAid: a Performance Enhancement to RepeatMasker. Bioinformatics 16(11), 1040–1041 (2000)
Article Google Scholar
Gotoh, O.: An Improved Algorithm for Matching Biological Sequences. Journal of Molecular Biology 162(3), 705–708 (1982)
Article Google Scholar
Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix tree with enhances suffix arrays. Journal of Discrete Algorithms 2(1), 53–86 (2004)
Article MATH MathSciNet Google Scholar
Miki, B.L., Neelin, J.M.: DNA repeat lengths of erythrocyte chromatins differing in content of histones H1 and H5. Nucleic Acids Res. 8(3), 529–542 (1980)
Article Google Scholar
National Center for Biotechnology Information, http://www.ncbi.nim.nih.gov

Download references

Author information

Authors and Affiliations

Dept. of Computer Science & Software Engineering, Concordia University, Montreal, Quebec, Canada
Chen Na Lian, Mihail Halachev & Nematollaah Shiri

Authors

Chen Na Lian
View author publications
You can also search for this author in PubMed Google Scholar
Mihail Halachev
View author publications
You can also search for this author in PubMed Google Scholar
Nematollaah Shiri
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Mourad Elloumi Josef Küng Michal Linial Robert F. Murphy Kristan Schneider Cristian Toma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lian, C.N., Halachev, M., Shiri, N. (2008). Searching for Supermaximal Repeats in Large DNA Sequences. In: Elloumi, M., Küng, J., Linial, M., Murphy, R.F., Schneider, K., Toma, C. (eds) Bioinformatics Research and Development. BIRD 2008. Communications in Computer and Information Science, vol 13. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70600-7_7

Download citation

DOI: https://doi.org/10.1007/978-3-540-70600-7_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70598-7
Online ISBN: 978-3-540-70600-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Searching for Supermaximal Repeats in Large DNA Sequences

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Alphabet-Independent Algorithms for Finding Context-Sensitive Repeats in Linear Time

3S: A Fast and Exhaustive STR Search Algorithm

Fast Algorithm for Vernier Search of Long Repeats in DNA Sequences with Bounded Error Density

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Searching for Supermaximal Repeats in Large DNA Sequences

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Alphabet-Independent Algorithms for Finding Context-Sensitive Repeats in Linear Time

3S: A Fast and Exhaustive STR Search Algorithm

Fast Algorithm for Vernier Search of Long Repeats in DNA Sequences with Bounded Error Density

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation