[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3564246.3585239acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article
Open access

Locally Consistent Decomposition of Strings with Applications to Edit Distance Sketching

Published: 02 June 2023 Publication History

Abstract

In this paper we provide a new locally consistent decomposition of strings. Each string x is decomposed into blocks that can be described by grammars of size O(k) (using some amount of randomness). If we take two strings x and y of edit distance at most k then their block decomposition uses the same number of grammars and the i-th grammar of x is the same as the i-th grammar of y except for at most k indexes i. The edit distance of x and y equals to the sum of edit distances of pairs of blocks where x and y differ. Our decomposition can be used to design a sketch of size O(k2) for edit distance, and also a rolling sketch for edit distance of size O(k2). The rolling sketch allows to update the sketched string by appending a symbol or removing a symbol from the beginning of the string.

References

[1]
Alexandr Andoni and Negev Shekel Nosatzki. 2020. Edit Distance in Near-Linear Time: it’s a Constant Factor. In 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS 2020, Durham, NC, USA, November 16-19, 2020, Sandy Irani (Ed.). IEEE, 990–1001. https://doi.org/10.1109/FOCS46700.2020.00096
[2]
Arturs Backurs and Piotr Indyk. 2015. Edit Distance Cannot Be Computed in Strongly Subquadratic Time (Unless SETH is False). In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing (STOC ’15). ACM, New York, NY, USA. 51–58. isbn:978-1-4503-3536-2
[3]
Tuğkan Batu, Funda Ergun, and Cenk Sahinalp. 2006. Oblivious String Embeddings and Edit Distance Approximations. In Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm (SODA ’06). Society for Industrial and Applied Mathematics, Philadelphia, PA, USA. 792–801. isbn:0-89871-605-5
[4]
Djamal Belazzougui and Qin Zhang. 2016. Edit Distance: Sketching, Streaming, and Document Exchange. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS). 51–60. https://doi.org/10.1109/FOCS.2016.15
[5]
Sudatta Bhattacharya and Michal Koucký. 2023. Locally consistent decomposition of strings with applications to edit distance sketching. CoRR, abs/2302.04475 (2023), arxiv:2302.04475.
[6]
Or Birenzwige, Shay Golan, and Ely Porat. 2020. Locally Consistent Parsing for Text Indexing in Small Space. In Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, Salt Lake City, UT, USA, January 5-8, 2020, Shuchi Chawla (Ed.). SIAM, 607–626. https://doi.org/10.1137/1.9781611975994.37
[7]
Joshua Brakensiek and Aviad Rubinstein. 2020. Constant-factor approximation of near-linear edit distance in near-linear time. In Proccedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, Konstantin Makarychev, Yury Makarychev, Madhur Tulsiani, Gautam Kamath, and Julia Chuzhoy (Eds.). ACM, 685–698. https://doi.org/10.1145/3357713.3384282
[8]
Diptarka Chakraborty, Debarati Das, Elazar Goldenberg, Michal Koucký, and Michael E. Saks. 2018. Approximating Edit Distance within Constant Factor in Truly Sub-Quadratic Time. In 59th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2018. 979–990. https://doi.org/10.1109/FOCS.2018.00096
[9]
Diptarka Chakraborty, Elazar Goldenberg, and Michal Koucký. 2016. Streaming algorithms for embedding and computing edit distance in the low distance regime. In Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016, Cambridge, MA, USA, June 18-21, 2016. 712–725.
[10]
Raphaël Clifford, Tomasz Kociumaka, and Ely Porat. 2019. The streaming k-mismatch problem. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019. SIAM, 1106–1125. https://doi.org/10.1137/1.9781611975482.68
[11]
Richard Cole and Uzi Vishkin. 1986. Deterministic coin tossing and accelerating cascades: micro and macro techniques for designing parallel algorithms. In Proceedings of the eighteenth annual ACM symposium on Theory of computing (STOC). 206–219. https://doi.org/10.1145/12130.12151
[12]
Graham Cormode and S. Muthukrishnan. 2002. The string edit distance matching problem with moves. In Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 6-8, 2002, San Francisco, CA, USA. 667–676.
[13]
Joan Feigenbaum, Yuval Ishai, Tal Malkin, Kobbi Nissim, Martin J Strauss, and Rebecca N Wright. 2006. Secure multiparty computation of approximations. ACM transactions on Algorithms (TALG), 2, 3 (2006), 435–472.
[14]
Arun Ganesh, Tomasz Kociumaka, Andrea Lincoln, and Barna Saha. 2022. How Compression and Approximation Affect Efficiency in String Distance Measures. In Proceedings of the 2022 ACM-SIAM Symposium on Discrete Algorithms, SODA. 2867–2919. https://doi.org/10.1137/1.9781611977073.112
[15]
Szymon Grabowski. 2016. New tabulation and sparse dynamic programming based techniques for sequence similarity problems. Discrete Applied Mathematics, 212 (2016), 96–103.
[16]
Ce Jin, Jelani Nelson, and Kewen Wu. 2021. An Improved Sketching Algorithm for Edit Distance. In 38th International Symposium on Theoretical Aspects of Computer Science, STACS 2021, (LIPIcs, Vol. 187). 45:1–45:16. https://doi.org/10.4230/LIPIcs.STACS.2021.45
[17]
Hossein Jowhari. 2012. Efficient Communication Protocols for Deciding Edit Distance. In Algorithms - ESA 2012 - 20th Annual European Symposium, Ljubljana, Slovenia, September 10-12, 2012. Proceedings. 648–658.
[18]
Richard M. Karp and Michael O. Rabin. 1987. Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development, 31, 2 (1987), 249–260. https://doi.org/10.1147/rd.312.0249
[19]
Tomasz Kociumaka, Ely Porat, and Tatiana Starikovskaya. 2021. Small-space and streaming pattern matching with k edits. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS). 885–896. https://doi.org/10.1109/FOCS52979.2021.00090
[20]
Michal Koucký and Michael E. Saks. 2020. Constant factor approximations to edit distance on far input pairs in nearly linear time. In Proccedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, Konstantin Makarychev, Yury Makarychev, Madhur Tulsiani, Gautam Kamath, and Julia Chuzhoy (Eds.). ACM, 699–712. https://doi.org/10.1145/3357713.3384307
[21]
Eyal Kushilevitz, Rafail Ostrovsky, and Yuval Rabani. 1998. Efficient search for approximate nearest neighbor in high dimensional spaces. In Proceedings of the thirtieth annual ACM symposium on Theory of computing. 614–623.
[22]
Gad M. Landau, Eugene W. Myers, and Jeanette P. Schmidt. 1998. Incremental String Comparison. SIAM J. Comput., 27, 2 (1998), April, 557–582. issn:0097-5397
[23]
Nathan Linial. 1987. Distributive Graph Algorithms-Global Solutions from Local Data. In 28th Annual Symposium on Foundations of Computer Science,FOCS. IEEE Computer Society, 331–335. https://doi.org/10.1109/SFCS.1987.20
[24]
Nathan Linial. 1992. Locality in Distributed Graph Algorithms. SIAM J. Comput., 21, 1 (1992), 193–201. https://doi.org/10.1137/0221015
[25]
William J. Masek and Michael S. Paterson. 1980. A faster algorithm computing string edit distances. J. Comput. System Sci., 20, 1 (1980), 18 – 31. issn:0022-0000
[26]
Rafail Ostrovsky and Yuval Rabani. 2007. Low distortion embeddings for edit distance. J. ACM, 54, 5 (2007), 23. https://doi.org/10.1145/1284320.1284322
[27]
Ely Porat and Ohad Lipsky. 2007. Improved Sketching of Hamming Distance with Error Correcting. In Combinatorial Pattern Matching, 18th Annual Symposium, CPM. 4580, Springer, 173–182. https://doi.org/10.1007/978-3-540-73437-6_19
[28]
Süleyman Cenk Sahinalp and Uzi Vishkin. 1994. Symmetry breaking for suffix tree construction. In Proceedings of the Twenty-Sixth Annual ACM Symposium on Theory of Computing, 23-25 May 1994, Montréal, Québec, Canada. ACM, 300–309. https://doi.org/10.1145/195058.195164
[29]
Robert A. Wagner and Michael J. Fischer. 1974. The String-to-String Correction Problem. J. ACM, 21, 1 (1974), Jan., 168–173. issn:0004-5411
[30]
Haoyu Zhang and Qin Zhang. 2019. MinJoin: Efficient Edit Similarity Joins via Local Hash Minima. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’19). Association for Computing Machinery, New York, NY, USA. 1093–1103. isbn:9781450362016 https://doi.org/10.1145/3292500.3330853

Cited By

View all
  • (2023)Optimal Algorithms for Bounded Weighted Edit Distance2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS57990.2023.00135(2177-2187)Online publication date: 6-Nov-2023

Index Terms

  1. Locally Consistent Decomposition of Strings with Applications to Edit Distance Sketching

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    STOC 2023: Proceedings of the 55th Annual ACM Symposium on Theory of Computing
    June 2023
    1926 pages
    ISBN:9781450399135
    DOI:10.1145/3564246
    This work is licensed under a Creative Commons Attribution 4.0 International License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 June 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Edit distance
    2. locally consistent parsing
    3. sketching
    4. string decomposition

    Qualifiers

    • Research-article

    Funding Sources

    • Czech Science Foundation
    • Marie Sk?odowska-Curie Actions

    Conference

    STOC '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

    Upcoming Conference

    STOC '25
    57th Annual ACM Symposium on Theory of Computing (STOC 2025)
    June 23 - 27, 2025
    Prague , Czech Republic

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)147
    • Downloads (Last 6 weeks)19
    Reflects downloads up to 21 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Optimal Algorithms for Bounded Weighted Edit Distance2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS57990.2023.00135(2177-2187)Online publication date: 6-Nov-2023

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media