[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3519935.3519990acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article
Open access

Almost-optimal sublinear-time edit distance in the low distance regime

Published: 10 June 2022 Publication History

Abstract

We revisit the task of computing the edit distance in sublinear time. In the (k,K)-gap edit distance problem we are given oracle access to two strings of length n and the task is to distinguish whether their edit distance is at most k or at least K. It has been established by Goldenberg, Krauthgamer and Saha (FOCS ’19), with improvements by Kociumaka and Saha (FOCS ’20), that the (k,k2)-gap problem can be solved in time O(n/k + poly(k)). One of the most natural questions in this line of research is whether the (k,k2)-gap is best-possible for the running time O(n/k + poly(k)).
In this work we answer this question by significantly improving the gap. Specifically, we show that in time O(n/k + poly(k)) we can even solve the (k,k1+o(1))-gap problem. This is the first algorithm that breaks the (k,k2)-gap in this running time. Our algorithm is almost optimal in the following sense: In the low distance regime (kn0.19) our running time becomes O(n/k), which matches a known n/k1+o(1) lower bound for the (k,k1+o(1))-gap problem up to lower order factors.
Our result also reveals a surprising similarity of Hamming distance and edit distance in the low distance regime: For both, the (k,k1+o(1))-gap problem has time complexity n/ko(1) for small k.
In contrast to previous work, which employed a subsampled variant of the Landau-Vishkin algorithm, we instead build upon the algorithm of Andoni, Krauthgamer and Onak (FOCS ’10) which approximates the edit distance in almost-linear time O(n1+ε) within a polylogarithmic factor. We first simplify their approach and then show how to to effectively prune their computation tree in order to obtain a sublinear-time algorithm in the given time bound. Towards that, we use a variety of structural insights on the (local and global) patterns that can emerge during this process and design appropriate property testers to effectively detect these patterns.

References

[1]
Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. 2015. Tight Hardness Results for LCS and Other Sequence Similarity Measures. In Proceedings of the 56th IEEE Annual Symposium on Foundations of Computer Science (FOCS ’15). IEEE Computer Society, 59–78.
[2]
Amir Abboud, Thomas Dueholm Hansen, Virginia Vassilevska Williams, and Ryan Williams. 2016. Simulating branching programs with edit distance and friends: or: a polylog shaved is a lower bound made. In Proceedings of the 48th ACM Symposium on Theory of Computing (STOC ’16). ACM, 375–388.
[3]
Alexandr Andoni. 2017. High frequency moments via max-stability. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2017, New Orleans, LA, USA, March 5-9, 2017. IEEE, 6364–6368.
[4]
Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak. 2010. Polylogarithmic Approximation for Edit Distance and the Asymmetric Query Complexity. In Proceedings of the 51st IEEE Annual Symposium on Foundations of Computer Science (FOCS ’10). IEEE Computer Society, 377–386.
[5]
Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak. 2011. Streaming Algorithms via Precision Sampling. In Proceedings of the 52nd IEEE Annual Symposium on Foundations of Computer Science (FOCS ’11). IEEE Computer Society, 363–372.
[6]
Alexandr Andoni and Huy L. Nguyen. 2010. Near-Optimal Sublinear Time Algorithms for Ulam Distance. In Proceedings of the 21st ACM-SIAM Symposium on Discrete Algorithms (SODA ’10). SIAM, 76–86.
[7]
Alexandr Andoni and Negev Shekel Nosatzki. 2020. Edit Distance in Near-Linear Time: it’s a Constant Factor. In Proceedings of the 61st IEEE Annual Symposium on Foundations of Computer Science (FOCS ’20). IEEE Computer Society, 990–1001.
[8]
Alexandr Andoni and Krzysztof Onak. 2012. Approximating Edit Distance in Near-Linear Time. SIAM J. Comput., 41, 6 (2012), 1635–1648.
[9]
Arturs Backurs and Piotr Indyk. 2015. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). In Proceedings of the 47th ACM Symposium on Theory of Computing (STOC ’15). ACM, 51–58.
[10]
Ziv Bar-Yossef, S. Thathachar Jayram, Robert Krauthgamer, and Ravi Kumar. 2004. Approximating Edit Distance Efficiently. In Proceedings of the 45th IEEE Annual Symposium on Foundations of Computer Science (FOCS ’04). IEEE Computer Society, 550–559.
[11]
Tugkan Batu, Funda Ergün, Joe Kilian, Avner Magen, Sofya Raskhodnikova, Ronitt Rubinfeld, and Rahul Sami. 2003. A sublinear algorithm for weakly approximating edit distance. In Proceedings of the 35th ACM Symposium on Theory of Computing (STOC ’03). ACM, 316–324.
[12]
Tugkan Batu, Funda Ergun, and Cenk Sahinalp. 2006. Oblivious string embeddings and edit distance approximations. In Proceedings of the 17th ACM-SIAM Symposium on Discrete Algorithms (SODA ’06). SIAM, 792–801.
[13]
Joshua Brakensiek and Aviad Rubinstein. 2020. Constant-factor approximation of near-linear edit distance in near-linear time. In Proceedings of the 52nd ACM Symposium on Theory of Computing (STOC ’20). ACM, 685–698.
[14]
Karl Bringmann and Marvin Künnemann. 2015. Quadratic Conditional Lower Bounds for String Problems and Dynamic Time Warping. In Proceedings of the 56th IEEE Annual Symposium on Foundations of Computer Science (FOCS ’15). IEEE Computer Society, 79–97.
[15]
Diptarka Chakraborty, Debarati Das, Elazar Goldenberg, Michal Koucký, and Michael Saks. 2020. Approximating Edit Distance Within Constant Factor in Truly Sub-Quadratic Time. J. ACM, 67, 6 (2020), 22 pages.
[16]
Elazar Goldenberg, Tomasz Kociumaka, Robert Krauthgamer, and Barna Saha. 2021. Gap Edit Distance via Non-Adaptive Queries: Simple and Optimal. CoRR, abs/2111.12706 (2021).
[17]
Elazar Goldenberg, Robert Krauthgamer, and Barna Saha. 2019. Sublinear Algorithms for Gap Edit Distance. In Proceedings of the 61st IEEE Annual Symposium on Foundations of Computer Science (FOCS ’20). IEEE Computer Society, 1101–1120.
[18]
Dan Gusfield. 1997. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press.
[19]
Donald E. Knuth, James H. Morris, and Vaughan R. Pratt. 1977. Fast Pattern Matching in Strings. SIAM J. Comput., 6, 2 (1977), 323–350.
[20]
Tomasz Kociumaka and Barna Saha. 2020. Sublinear-Time Algorithms for Computing & Embedding Gap Edit Distance. In Proceedings of the 61st IEEE Annual Symposium on Foundations of Computer Science (FOCS ’20). IEEE Computer Society, 1168–1179.
[21]
Michal Koucký and Michael E. Saks. 2020. Constant factor approximations to edit distance on far input pairs in nearly linear time. In Proceedings of the 52nd ACM Symposium on Theory of Computing (STOC ’20). ACM, 699–712.
[22]
Gad M. Landau, Eugene W. Myers, and Jeanette P. Schmidt. 1998. Incremental String Comparison. SIAM J. Comput., 27, 2 (1998), 557–582.
[23]
Gad M. Landau and Uzi Vishkin. 1988. Fast String Matching with k Differences. J. Comput. Syst. Sci., 37, 1 (1988), 63–78.
[24]
Vladimir Iosifovich Levenshtein. 1966. Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady, 10, 8 (1966), 707–710.
[25]
Rafail Ostrovsky and Yuval Rabani. 2007. Low distortion embeddings for edit distance. J. ACM, 54, 5 (2007), 23.
[26]
T. K. Vintsyuk. 1968. Speech discrimination by dynamic programming. Cybernetics, 4, 1 (1968), 52–57. Russian Kibernetika 4(1):81-88 (1968)
[27]
Robert A. Wagner and Michael J. Fischer. 1974. The String-to-String Correction Problem. J. ACM, 21, 1 (1974), 168–173.

Cited By

View all
  • (2023)Approximating Edit Distance in the Fully Dynamic Model2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS57990.2023.00098(1628-1638)Online publication date: 6-Nov-2023
  • (2022)Gap Edit Distance via Non-Adaptive Queries: Simple and Optimal2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS54457.2022.00070(674-685)Online publication date: Oct-2022

Index Terms

  1. Almost-optimal sublinear-time edit distance in the low distance regime

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    STOC 2022: Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing
    June 2022
    1698 pages
    ISBN:9781450392648
    DOI:10.1145/3519935
    This work is licensed under a Creative Commons Attribution 4.0 International License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 June 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Edit Distance
    2. Precision Sampling
    3. Sublinear Algorithms

    Qualifiers

    • Research-article

    Conference

    STOC '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

    Upcoming Conference

    STOC '25
    57th Annual ACM Symposium on Theory of Computing (STOC 2025)
    June 23 - 27, 2025
    Prague , Czech Republic

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)59
    • Downloads (Last 6 weeks)10
    Reflects downloads up to 31 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Approximating Edit Distance in the Fully Dynamic Model2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS57990.2023.00098(1628-1638)Online publication date: 6-Nov-2023
    • (2022)Gap Edit Distance via Non-Adaptive Queries: Simple and Optimal2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS54457.2022.00070(674-685)Online publication date: Oct-2022

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media