[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

New tabulation and sparse dynamic programming based techniques for sequence similarity problems

Published: 30 October 2016 Publication History

Abstract

Calculating the length ź of a longest common subsequence (LCS) of two strings, A of length n and B of length m, is a classic research topic, with many known worst-case oriented results. We present three algorithms for LCS length calculation with respectively O ( m n lg lg n / lg 2 n ), O ( m n / lg 2 n + r ) and O ( n + r ) time complexity, where the second one works for r = o ( m n / ( lg n lg lg n ) ), and the third one for r = ź ( m n / lg k n ), for a real constant 1 ź k ź 3, and ź = O ( n / ( lg k - 1 n ( lg lg n ) 2 ) ), where r is the number of matches in the dynamic programming matrix. We also describe conditions for a given problem sufficient to apply our techniques, with several concrete examples presented, namely the edit distance, the longest common transposition-invariant subsequence (LCTS) and the merged longest common subsequence (MerLCS) problems.

References

[1]
A. Apostolico, String editing and longest common subsequences, in: Linear Modeling: Background and Application, vol. 2, Springer, 1997, pp. 361-398.
[2]
L. Bergroth, H. Hakonen, T. Raita, A survey of longest common subsequence algorithms, in: SPIRE, IEEE Computer Society, 2000, pp. 39-48.
[3]
P. Bille, M. Farach-Colton, Fast and compact regular expression matching, Theoret. Comput. Sci., 409 (2008) 486-496.
[4]
M. Crochemore, G.M. Landau, M. Ziv-Ukelson, A subquadratic sequence alignment algorithm for unrestricted scoring matrices, SIAM J. Comput., 32 (2003) 1654-1673.
[5]
S. Deorowicz, Speeding up transposition-invariant string matching, Inform. Process. Lett., 100 (2006) 14-20.
[6]
S. Deorowicz, A. Danek, Bit-parallel algorithms for the merged longest common subsequence problem, Internat. J. Found. Comput. Sci., 24 (2013) 1281-1298.
[7]
S. Deorowicz, S. Grabowski, Subcubic algorithms for the sequence excluded LCS problem, in: Man-Machine Interactions, vol. 3, 2014, pp. 503-510.
[8]
D. Eppstein, Z. Galil, R. Giancarlo, G.F. Italiano, Sparse dynamic programming I: Linear cost functions, J. ACM, 39 (1992) 519-545.
[9]
P. Gawrychowski, Faster algorithm for computing the edit distance between slp-compressed strings, in: SPIRE, 2012, pp. 229-236.
[10]
S. Grabowski, New tabulation and sparse dynamic programming based techniques for sequence similarity problems, in: Proceedings of the Prague Stringology Conference, PSC, 2014, Czech Technical University in Prague, Czech Republic, 2014, pp. 202-211.
[11]
D.S. Hirschberg, An information-theoretic lower bound for the longest common subsequence problem, Inform. Process. Lett., 7 (1978) 40-41.
[12]
K.-S. Huang, C.-B. Yang, K.-T. Tseng, H.-Y. Ann, Y.-H. Peng, Efficient algorithm for finding interleaving relationship between sequences, Inform. Process. Lett., 105 (2008) 188-193.
[13]
J.W. Hunt, T.G. Szymanski, A fast algorithm for computing longest common subsequences, Commun. ACM, 20 (1977) 350-353.
[14]
H. Hyyrö, Bit-parallel LCS-length computation revisited, in: AWOCA, University of Sydney, Australia, 2004, pp. 16-27.
[15]
W. Masek, M. Paterson, A faster algorithm computing string edit distances, J. Comput. System Sci., 20 (1980) 18-31.
[16]
V. Mäkinen, G. Navarro, E. Ukkonen, Transposition invariant string matching, J. Algorithms, 56 (2005) 124-153.
[17]
G. Navarro, S. Grabowski, V. Mäkinen, S. Deorowicz, Improved time and space complexities for transposition invariant string matching, in: Technical Report TR/DCC-2005-4, Department of Computer Science, University of Chile, 2005.
[18]
Y.-H. Peng, C.-B. Yang, K.-S. Huang, C.-T. Tseng, C.-Y. Hor, Efficient sparse dynamic programming for the merged lcs problem with block constraints, Inf. Control, 6 (2010) 1935-1947.
[19]
Y. Sakai, A fast on-line algorithm for the longest common subsequence problem with constant alphabet, IEICE Trans., 95-A (2012) 354-361.
[20]
D. Salomon, Springer, 2007.
[21]
C.K. Wong, A.K. Chandra, Bounds for the string editing problem, J. ACM, 23 (1976) 13-16.

Cited By

View all
  • (2024)Almost Linear Size Edit Distance SketchProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649783(956-967)Online publication date: 10-Jun-2024
  • (2023)Locally Consistent Decomposition of Strings with Applications to Edit Distance SketchingProceedings of the 55th Annual ACM Symposium on Theory of Computing10.1145/3564246.3585239(219-232)Online publication date: 2-Jun-2023
  • (2021)Fully dynamic approximation of LIS in polylogarithmic timeProceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing10.1145/3406325.3451137(654-667)Online publication date: 15-Jun-2021
  • Show More Cited By
  1. New tabulation and sparse dynamic programming based techniques for sequence similarity problems

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Discrete Applied Mathematics
    Discrete Applied Mathematics  Volume 212, Issue C
    October 2016
    127 pages

    Publisher

    Elsevier Science Publishers B. V.

    Netherlands

    Publication History

    Published: 30 October 2016

    Author Tags

    1. Longest common subsequence
    2. Sequence similarity
    3. Sparse dynamic programming
    4. Tabulation

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 31 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Almost Linear Size Edit Distance SketchProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649783(956-967)Online publication date: 10-Jun-2024
    • (2023)Locally Consistent Decomposition of Strings with Applications to Edit Distance SketchingProceedings of the 55th Annual ACM Symposium on Theory of Computing10.1145/3564246.3585239(219-232)Online publication date: 2-Jun-2023
    • (2021)Fully dynamic approximation of LIS in polylogarithmic timeProceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing10.1145/3406325.3451137(654-667)Online publication date: 15-Jun-2021
    • (2020)Approximating Edit Distance Within Constant Factor in Truly Sub-quadratic TimeJournal of the ACM10.1145/342282367:6(1-22)Online publication date: 29-Oct-2020
    • (2020)Constant factor approximations to edit distance on far input pairs in nearly linear timeProceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing10.1145/3357713.3384307(699-712)Online publication date: 22-Jun-2020
    • (2019)Approximating LCS in linear timeProceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms10.5555/3310435.3310507(1181-1200)Online publication date: 6-Jan-2019
    • (2019)Fine-grained complexity meets IP = PSPACEProceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms10.5555/3310435.3310436(1-20)Online publication date: 6-Jan-2019
    • (2018)Dynamic Time Warping and Geometric Edit DistanceACM Transactions on Algorithms10.1145/323073414:4(1-17)Online publication date: 21-Aug-2018
    • (2018)A hardness result and new algorithm for the longest common palindromic subsequence problemInformation Processing Letters10.1016/j.ipl.2017.08.006129:C(11-15)Online publication date: 1-Jan-2018

    View Options

    View options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media