[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Space-Efficient Parallel Construction of Succinct Representations of Suffix Tree Topologies

Published: 28 January 2017 Publication History

Abstract

A compressed suffix tree usually consists of three components: a compressed suffix array, a compressed LCP-array, and a succinct representation of the suffix tree topology. There are parallel algorithms that construct the suffix array and the LCP-array, but none for the third component. In this article, we present parallel algorithms on shared memory architectures that construct the balanced parentheses sequence (BPS), an explicit succinct representation of the suffix tree topology, as well as the enhanced balanced parentheses representation (eBPR), an implicit succinct representation of the suffix tree topology. For both representations, this article presents a sequential construction algorithm (a new one for the BPS), a linear work and O(log n) time parallel construction algorithm, and a heuristic parallel construction algorithm that works very well in practice. The experimental results show that our methods are well suited for real-world applications.

References

[1]
M. I. Abouelhoda, S. Kurtz, and E. Ohlebusch. 2004. Replacing suffix trees with enhanced suffix arrays. Journal of Discrete Algorithms 2, 53--86.
[2]
U. Baier, T. Beller, and E. Ohlebusch. 2015. Parallel construction of succinct representations of suffix tree topologies. In Proceedings of the 22nd International Symposium on String Processing and Information Retrieval, Lecture Notes in Computer Science, Vol. 9309. Springer, Berlin, 234--245.
[3]
U. Baier, T. Beller, and E. Ohlebusch. 2016. Graphical pan-genome analysis with compressed suffix trees and the burrows Wheeler transform. Bioinformatics 32, 4, 497--504.
[4]
D. Belazzougui. 2014. Linear time construction of compressed text indices in compact space. In Proceedings of the 46th Annual ACM Symposium on Theory of Computing (STOC’14). ACM, New York, NY, 148--193.
[5]
O. Berkman, B. Schieber, and U. Vishkin. 1993. Optimal doubly logarithmic parallel algorithms based on finding all nearest smaller values. Journal of Algorithms 14, 3, 344--370.
[6]
R. P. Brent. 1974. The parallel evaluation of general arithmetic expressions. Journal of the ACM 21, 2, 201--206.
[7]
M. Deo and S. Keely. 2013. Parallel suffix array and least common prefix for the GPU. ACM SIGPLAN Notices 48, 8, 197--206.
[8]
J. Fayolle and M. D. Ward. 2005. Analysis of the average depth in a suffix tree under a Markov model. In Proceedings of the 2005 International Conference on Analysis of Algorithms, Vol. AD. Discrete Mathematics and Theoretical Computer Science, Nancy, France, 95--104.
[9]
J. Fischer. 2010. Optimal succinctness for range minimum queries. In Proceedings of the 9th Latin American Theoretical Informatics Symposium, Lecture Notes in Computer Science, Vol. 6034. Springer, Berlin, 158--169.
[10]
J. Fischer. 2011. Combined data structure for previous- and next-smaller-values. Theoretical Computer Science 412, 22, 2451--2456.
[11]
J. Fischer, V. Mäkinen, and G. Navarro. 2009. Faster entropy-bounded compressed suffix trees. Theoretical Computer Science 410, 51, 5354--5364.
[12]
P. Flick and S. Aluru. 2015. Parallel distributed memory construction of suffix and longest common prefix arrays. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’15). ACM, New York, NY, 16:1--16:10.
[13]
S. Gog. 2011. Compressed Suffix Trees: Design, Construction, and Applications. Ph.D. Dissertation. University of Ulm, Ulm, Germany.
[14]
S. Gog, T. Beller, A. Moffat, and M. Petri. 2014. From theory to practice: Plug and play with succinct data structures. In Proceedings of the 13th International Symposium on Experimental Algorithms, Lecture Notes in Computer Science, Vol. 8504. Springer, Berlin, 326--337.
[15]
W.-K. Hon. 2004. On the Construction and Application of Compressed Text Indexes. Ph.D. Dissertation. University of Hong Kong, Hong Kong.
[16]
W.-K. Hon and K. Sadakane. 2002. Space-economical algorithms for finding maximal unique matches. In Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching, Lecture Notes in Computer Science, Vol. 2373. Springer, Berlin, 144--152.
[17]
J. Jaja. 1992. Introduction to Parallel Algorithms. Addison-Wesley Professional, Boston, MA.
[18]
J. Kärkkäinen and D. Kempa. 2016a. Faster external memory LCP array construction. In Proceedings of the 24th Annual European Symposium on Algorithms (ESA’16), Vol. 57. Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 61:1--61:16.
[19]
J. Kärkkäinen and D. Kempa. 2016b. LCP array construction using O(sort(n)) (or Less) I/Os. In Proceedings of the 23rd International Symposium on String Processing and Information Retrieval, Lecture Notes in Computer Science, Vol. 9954. Springer, Berlin, 204--217.
[20]
J. Kärkkäinen, D. Kempa, and S. J. Puglisi. 2015. Parallel external memory suffix sorting. In Proceedings of the 26th Annual Symposium on Combinatorial Pattern Matching, Lecture Notes in Computer Science, Vol. 9133. Springer, Berlin, 329--342.
[21]
J. Kärkkäinen, G. Manzini, and S. J. Puglisi. 2009. Permuted longest-common-prefix array. In Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching, Lecture Notes in Computer Science, Vol. 5577. Springer, Berlin, 181--192.
[22]
T. Kasai, G. Lee, H. Arimura, S. Arikawa, and K. Park. 2001. Linear-time longest-common-prefix computation in suffix arrays and its applications. In Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching Lecture Notes in Computer Science, Vol. 2089. Springer, Berlin, 181--192.
[23]
S. Kurtz. 1999. Reducing the space requirement of suffix trees. Software—Practice and Experience 29, 13, 1149--1171.
[24]
J. Labeit, J. Shun, and G. E. Blelloch. 2016. Parallel lightweight wavelet tree, suffix array and FM-index construction. In Proceedings of the 26th IEEE Data Compression Conference (DCC’16). IEEE Press, Piscataway, NJ, 33--43.
[25]
M. Léonard, L. Mouchard, and M. Salson. 2012. On the number of elements to reorder when updating a suffix array. Journal of Discrete Algorithms 11, 87--99.
[26]
U. Manber and E. W. Myers. 1990. Suffix arrays: A new method for on-line string searches. In Proceedings of the 1st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’90). Society for Industrial and Applied Mathematics, Philadelphia, PA, 319--327.
[27]
G. Navarro and V. Mäkinen. 2007. Compressed full-text indexes. Computer Surveys 39, 1, Article 2, 61 pages.
[28]
E. Ohlebusch. 2013. Bioinformatics Algorithms: Sequence Analysis, Genome Rearrangements, and Phylogenetic Reconstruction. Oldenbusch Verlag, Bremen, Germany.
[29]
E. Ohlebusch, J. Fischer, and S. Gog. 2010. CST++. In Proceedings of the 17th International Symposium on String Processing and Information Retrieval, Lecture Notes in Computer Science, Vol. 6393. Springer, Berlin, 322--333.
[30]
E. Ohlebusch and S. Gog. 2009. A compressed enhanced suffix array supporting fast string matching. In Proceedings of the 16th International Symposium on String Processing and Information Retrieval, Lecture Notes in Computer Science, Vol. 5721. Springer, Berlin, 51--62.
[31]
S. J. Puglisi, W. F. Smyth, and A. Turpin. 2007. A taxonomy of suffix array construction algorithms. Computer Surveys 39, 2, Article 4, 31 pages.
[32]
K. Sadakane. 2002. Succinct representations of LCP information and improvements in the compressed suffix arrays. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’02). Society for Industrial and Applied Mathematics, Philadelphia, PA, 225--232.
[33]
K. Sadakane. 2007. Compressed suffix trees with full functionality. Theory of Computing Systems 41, 589--607.
[34]
J. Shun. 2014. Fast parallel computation of longest common prefixes. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’14). IEEE Press, Piscataway, NJ, 387--398.
[35]
J. Shun and G. E. Blelloch. 2014. A simple parallel Cartesian tree algorithm and its application to suffix tree construction. ACM Transactions on Parallel Computing 1, 1, Article 8, 20 pages.
[36]
N. Välimäki, V. Mäkinen, W. Gerlach, and K. Dixit. 2009. Engineering a compressed suffix tree implementation. ACM Journal of Experimental Algorithmics 14, Article 2, 23 pages.
[37]
P. Weiner. 1973. Linear pattern matching algorithms. In Proceedings of the 14th IEEE Annual Symposium on Switching and Automata Theory. IEEE Press, Piscataway, NJ, 1--11.

Cited By

View all
  • (2023)Scalable Text Index ConstructionAlgorithms for Big Data10.1007/978-3-031-21534-6_14(252-284)Online publication date: 18-Jan-2023
  • (2020)Parallel computation of the Burrows Wheeler Transform in compact spaceTheoretical Computer Science10.1016/j.tcs.2019.09.030812(123-136)Online publication date: Apr-2020

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Journal of Experimental Algorithmics
ACM Journal of Experimental Algorithmics  Volume 22, Issue
2017
175 pages
ISSN:1084-6654
EISSN:1084-6654
DOI:10.1145/3047249
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 January 2017
Accepted: 01 December 2016
Revised: 01 October 2016
Received: 01 May 2016
Published in JEA Volume 22

Author Tags

  1. Algorithm engineering
  2. balanced parentheses sequence
  3. compressed text index
  4. parallel algorithms
  5. succinct tree representation

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • DFG

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Scalable Text Index ConstructionAlgorithms for Big Data10.1007/978-3-031-21534-6_14(252-284)Online publication date: 18-Jan-2023
  • (2020)Parallel computation of the Burrows Wheeler Transform in compact spaceTheoretical Computer Science10.1016/j.tcs.2019.09.030812(123-136)Online publication date: Apr-2020

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media