[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Efficient algorithms for three variants of the LPF table

Published: 01 February 2012 Publication History

Abstract

The concept of a longest previous factor (LPF) is inherent to Ziv-Lempel factorization of strings in text compression, as well as in statistics of repetitions and symmetries. It is expressed in the form of a table - LPF[i] is the maximum length of a factor starting at position i, that also appears earlier in the given text. We show how to compute efficiently three new tables storing different variants of previous factors (past segments) of a string. The longest previous non-overlapping factor, for a given position i, is the longest factor starting at i which has an exact copy occurring entirely before, while the longest previous non-overlapping reverse factor for a given position i is the longest factor starting at i, such that its reverse copy occurs entirely before. In both problems the previous copies of the factors are required to occur within the prefix ending at position i-1. The longest previous (possibly overlapping) reverse factor is the longest factor starting at i, such that its reverse copy starts before i. These problems have not been explicitly considered before, but they have several applications and they are natural extensions of the longest previous factor problem, which has been extensively studied. Moreover, the newly introduced tables store additional information on the structure of the string, helpful to improve, for example, gapped palindrome detection and text compression using reverse factors.

References

[1]
Bell, T.C., Clearly, J.G. and Witten, I.H., Text Compression. 1990. Prentice Hall Inc., New Jersey.
[2]
Bender, M.A. and Farach-Colton, M., The LCA problem revisited. In: Gonnet, G.H., Panario, D., Viola, A. (Eds.), Lecture Notes in Computer Science, vol. 1776. Springer. pp. 88-94.
[3]
Böckenhauer, H.-J. and Bongartz, D., Algorithmic Aspects of Bioinformatics. 2007. Springer, Berlin.
[4]
Chairungsee, S. and Crochemore, M., Efficient computing of longest previous reverse factors. In: Shoukourian, Y. (Ed.), Proc. Seventh International Conference on Computer Science and Information Technologies, The National Academy of Sciences of Armenia Publishers, Yerevan, Armenia. pp. 27-30.
[5]
Chen, G., Puglisi, S.J. and Smyth, W.F., Lempel-Ziv factorization using less time and space. Mathematics in Computer Science. v1 i4. 605-623.
[6]
Crochemore, M., Transducers and repetitions. Theoretical Computer Science. v45 i1. 63-86.
[7]
Crochemore, M., Hancart, C. and Lecroq, T., Algorithms on Strings. 2007. Cambridge University Press, Cambridge, UK.
[8]
Crochemore, M. and Ilie, L., Computing longest previous factor in linear time and applications. Information Processing Letters. v106 i2. 75-80.
[9]
Crochemore, M., Ilie, L., Iliopoulos, C., Kubica, M., Rytter, W. and Waleń, T., LPF computation revisited. In: Fiala, J., Kratochvíl, J., Miller, M. (Eds.), Lecture Notes in Computer Science, vol. 5874. Springer, Berlin. pp. 158-169.
[10]
Crochemore, M., Ilie, L. and Smyth, W.F., A simple algorithm for computing the Lempel-Ziv factorization. In: Storer, J.A., Marcellin, M.W. (Eds.), Proc. 18th Data Compression Conference, IEEE Computer Society, Los Alamitos, CA. pp. 482-488.
[11]
Fischer, J. and Heun, V., Theoretical and practical improvements on the RMQ-problem, with applications to LCA and LCE. In: Lewenstein, M., Valiente, G. (Eds.), Lecture Notes in Computer Science, vol. 4009. Springer. pp. 36-48.
[12]
Fischer, J. and Heun, V., A new succinct representation of RMQ-information and improvements in the enhanced suffix array. In: Chen, B., Paterson, M., Zhang, G. (Eds.), Lecture Notes in Computer Science, vol. 4614. Springer-Verlag. pp. 459-470.
[13]
Franek, F., Holub, J., Smyth, W.F. and Xiao, X., Computing quasi suffix arrays. Journal of Automata, Languages and Combinatorics. v8 i4. 593-606.
[14]
H. Gabow, J. Bentley, R. Tarjan, Scaling and related techniques for geometry problems, in: Symposium on the Theory of Computing (STOC), 1984, pp. 135-143.
[15]
S. Grumbach, F. Tahi, Compression of DNA sequences, in: Proc. Data Compression Conference, 1993, pp. 340-350.
[16]
Hartman, A. and Rodeh, M., Optimal parsing of strings. In: Apostolico, A., Galil, Z. (Eds.), Computer and System Sciences, vol. 12. Springer, Berlin. pp. 155-167.
[17]
Kärkkäinen, J. and Sanders, P., Simple linear work suffix array construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (Eds.), Lecture Notes in Computer Science, vol. 2719. Springer. pp. 943-955.
[18]
Kasai, T., Lee, G., Arimura, H., Arikawa, S. and Park, K., Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Lecture Notes in Computer Science, vol. 2089. Springer. pp. 181-192.
[19]
Kim, D.K., Sim, J.S., Park, H. and Park, K., Linear-time construction of suffix arrays. In: Lecture Notes in Computer Science, vol. 2676. Springer. pp. 186-199.
[20]
Ko, P. and Aluru, S., Space efficient linear time construction of suffix arrays. In: Lecture Notes in Computer Science, vol. 2676. Springer. pp. 200-210.
[21]
R.M. Kolpakov, G. Kucherov, Finding maximal repetitions in a word in linear time, in: Proc. Symposium on Foundations of Computer Science (FOCS), 1999, pp. 596-604.
[22]
Kolpakov, R.M. and Kucherov, G., Searching for gapped palindromes. In: Ferragina, P., Landau, G.M. (Eds.), Lecture Notes in Computer Science, vol. 5029. Springer, Berlin. pp. 18-30.
[23]
Main, M.G., Detecting leftmost maximal periodicities. Discrete Applied Mathematics. v25. 145-153.
[24]
Manacher, G.K., A new linear-time "on-line" algorithm for finding the smallest initial palindrome of a string. Journal of the ACM. v22 i3. 346-351.
[25]
Manber, U. and Myers, E.W., Suffix arrays: A new method for on-line string searches. SIAM Journal on Computing. v22 i5. 935-948.
[26]
Nong, G., Zhang, S. and Chan, W.H., Linear time suffix array construction using D-critical substrings. In: Kucherov, G., Ukkonen, E. (Eds.), Lecture Notes in Computer Science, vol. 5577. Springer. pp. 54-67.
[27]
Sadakane, K., Succinct data structures for flexible text retrieval systems. Journal of Discrete Algorithms. v5 i1. 12-22.
[28]
G. Tischler, Personal communication, 2009.
[29]
J. Ziv, A. Lempel, A universal algorithm for sequential data compression, in: IEEE Transactions on Information Theory, 1977, pp. 337-343.

Cited By

View all
  • (2023)The Effective of Algorithms on Web Application Development to Detect Repetitive DNAProceedings of the 2023 11th International Conference on Information Technology: IoT and Smart City10.1145/3638985.3638992(43-48)Online publication date: 14-Dec-2023
  • (2023)Algorithm of Palindrome Detection ToolProceedings of the 2023 11th International Conference on Information Technology: IoT and Smart City10.1145/3638985.3638990(31-36)Online publication date: 14-Dec-2023
  • (2021)Indexing Highly Repetitive String Collections, Part IACM Computing Surveys10.1145/343439954:2(1-31)Online publication date: 5-Mar-2021
  • Show More Cited By
  1. Efficient algorithms for three variants of the LPF table

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Journal of Discrete Algorithms
    Journal of Discrete Algorithms  Volume 11, Issue
    February, 2012
    101 pages

    Publisher

    Elsevier Science Publishers B. V.

    Netherlands

    Publication History

    Published: 01 February 2012

    Author Tags

    1. Longest previous factor
    2. Longest previous non-overlapping factor
    3. Longest previous non-overlapping reverse factor
    4. Longest previous reverse factor
    5. Palindrome
    6. Runs
    7. Suffix array
    8. Text compression

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 15 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)The Effective of Algorithms on Web Application Development to Detect Repetitive DNAProceedings of the 2023 11th International Conference on Information Technology: IoT and Smart City10.1145/3638985.3638992(43-48)Online publication date: 14-Dec-2023
    • (2023)Algorithm of Palindrome Detection ToolProceedings of the 2023 11th International Conference on Information Technology: IoT and Smart City10.1145/3638985.3638990(31-36)Online publication date: 14-Dec-2023
    • (2021)Indexing Highly Repetitive String Collections, Part IACM Computing Surveys10.1145/343439954:2(1-31)Online publication date: 5-Mar-2021
    • (2019)A Linear Time Algorithm for Finding Tandem Repeat in DNA SequencesProceedings of the 2019 7th International Conference on Information Technology: IoT and Smart City10.1145/3377170.3377203(426-429)Online publication date: 20-Dec-2019
    • (2019)On the Computation of Longest Previous Non-overlapping FactorsString Processing and Information Retrieval10.1007/978-3-030-32686-9_26(372-381)Online publication date: 7-Oct-2019
    • (2017)Approximate Tandem Repeats ComputationProceedings of the 2017 International Conference on Information Technology10.1145/3176653.3176660(107-111)Online publication date: 27-Dec-2017
    • (2013)The Forward Stem MatrixProceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics10.1145/2506583.2506638(575-584)Online publication date: 22-Sep-2013

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media