[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Efficient Enumeration of Distinct Factors Using Package Representations

  • Conference paper
  • First Online:
String Processing and Information Retrieval (SPIRE 2020)

Abstract

We investigate properties and applications of a new compact representation of string factors: families of packages. In a string T, each package \((i,\ell ,k)\) represents the factors of T of length \(\ell \) that start in the interval \([i,i+k]\). A family \(\mathcal {F}\) of packages represents the set \(\mathsf {Factors}(\mathcal {F})\) defined as the union of the sets of factors represented by individual packages in \(\mathcal {F}\). We show how to efficiently enumerate \(\mathsf {Factors}(\mathcal {F})\) and showcase that this is a generic tool for enumerating important classes of factors of T, such as powers and antipowers. Our approach is conceptually simpler than problem-specific methods and provides a unifying framework for such problems, which we hope can be further exploited.

We also consider a special case of the problem in which every occurrence of every factor represented by \(\mathcal {F}\) is captured by some package in \(\mathcal {F}\). For both applications mentioned above, we construct an efficient package representation that satisfies this property.

We develop efficient algorithms that, given a family \(\mathcal {F}\) of m packages in a string of length n, report all distinct factors represented by these packages in \(\mathcal {O}(n\log ^2 n+m\log n+|\mathsf {Factors}(\mathcal {F})|)\) time for the general case and in the optimal \(\mathcal {O}(n+m+|\mathsf {Factors}(\mathcal {F})|)\) time for the special case. We can also compute \(|\mathsf {Factors}(\mathcal {F})|\) in \(\mathcal {O}(n \log ^2 n + m \log n)\) time in the general case and in \(\mathcal {O}(n+m)\) time in the special case.

In particular, we improve over the state-of-the-art \(\mathcal {O}(nk^4\log k \log n)\)-time algorithm for computing the number of distinct k-antipower factors, by providing an algorithm that runs in \(\mathcal {O}(nk^2)\) time, and we obtain an alternative linear-time algorithm to enumerate distinct squares.

P. Charalampopoulos—Partially supported by ERC grant TOTAL under the EU’s Horizon 2020 Research and Innovation Programme (agreement no. 677651).

T. Kociumaka—Supported by ISF grants no. 1278/16 and 1926/19, by a BSF grant no. 2018364, and by an ERC grant MPM under the EU’s Horizon 2020 Research and Innovation Programme (grant no. 683064).

J. Radoszewski, T. Waleń and W. Zuba—Supported by the Polish National Science Center, grant no. 2018/31/D/ST6/03991.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 35.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 44.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The workhorse of Lemma 10 is computing the size of the union of certain 1D-intervals. For the reporting version, we simply have to report all elements of this union.

References

  1. Alamro, H., Badkobeh, G., Belazzougui, D., Iliopoulos, C.S., Puglisi, S.J.: Computing the antiperiod(s) of a string. In: 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). LIPIcs, vol. 128, pp. 32:1–32:11. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019). https://doi.org/10.4230/LIPIcs.CPM.2019.32

  2. Alzamel, M., et al.: Online algorithms on antipowers and antiperiods. In: Brisaboa, N.R., Puglisi, S.J. (eds.) SPIRE 2019. LNCS, vol. 11811, pp. 175–188. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32686-9_13

    Chapter  Google Scholar 

  3. Amir, A., Chencinski, E., Iliopoulos, C.S., Kopelowitz, T., Zhang, H.: Property matching and weighted matching. Theor. Comput. Sci. 395(2–3), 298–310 (2008). https://doi.org/10.1016/j.tcs.2008.01.006

    Article  MathSciNet  MATH  Google Scholar 

  4. Badkobeh, G., Fici, G., Puglisi, S.J.: Algorithms for anti-powers in strings. Inf. Process. Lett. 137, 57–60 (2018). https://doi.org/10.1016/j.ipl.2018.05.003

    Article  MathSciNet  MATH  Google Scholar 

  5. Bannai, H., Tomohiro, I., Inenaga, S., Nakashima, Y., Takeda, M., Tsuruta, K.: The “runs” theorem. SIAM J. Comput. 46(5), 1501–1514 (2017). https://doi.org/10.1137/15M1011032

    Article  MathSciNet  MATH  Google Scholar 

  6. Bannai, H., Inenaga, S., Köppl, D.: Computing all distinct squares in linear time for integer alphabets. In: 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). LIPIcs, vol. 78, pp. 22:1–22:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2017). https://doi.org/10.4230/LIPIcs.CPM.2017.22

  7. Barton, C., Kociumaka, T., Liu, C., Pissis, S.P., Radoszewski, J.: Indexing weighted sequences: neat and efficient. Inf. Comput. 270 (2020). https://doi.org/10.1016/j.ic.2019.104462

  8. Charalampopoulos, P., Iliopoulos, C.S., Liu, C., Pissis, S.P.: Property suffix array with applications in indexing weighted sequences. ACM J. Exp. Algorithm. 25(1) (2020). https://doi.org/10.1145/3385898

  9. Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings. Cambridge University Press (2007)

    Google Scholar 

  10. Crochemore, M., Ilie, L.: Computing longest previous factor in linear time and applications. Inf. Process. Lett. 106(2), 75–80 (2008). https://doi.org/10.1016/j.ipl.2007.10.006

    Article  MathSciNet  MATH  Google Scholar 

  11. Crochemore, M., Iliopoulos, C.S., Kubica, M., Radoszewski, J., Rytter, W., Waleń, T.: Extracting powers and periods in a word from its runs structure. Theor. Comput. Sci. 521, 29–41 (2014). https://doi.org/10.1016/j.tcs.2013.11.018

    Article  MathSciNet  MATH  Google Scholar 

  12. Deza, A., Franek, F., Thierry, A.: How many double squares can a string contain? Discrete Appl. Math. 180, 52–69 (2015). https://doi.org/10.1016/j.dam.2014.08.016

    Article  MathSciNet  MATH  Google Scholar 

  13. Farach, M.: Optimal suffix tree construction with large alphabets. In: 38th Annual Symposium on Foundations of Computer Science (FOCS 1997), pp. 137–143. IEEE Computer Society (1997). https://doi.org/10.1109/SFCS.1997.646102

  14. Fici, G., Postic, M., Silva, M.: Abelian antipowers in infinite words. Adv. Appl. Math. 108, 67–78 (2019). https://doi.org/10.1016/j.aam.2019.04.001

    Article  MathSciNet  MATH  Google Scholar 

  15. Fici, G., Restivo, A., Silva, M., Zamboni, L.Q.: Anti-powers in infinite words. J. Comb. Theory Ser. A 157, 109–119 (2018). https://doi.org/10.1016/j.jcta.2018.02.009

    Article  MathSciNet  MATH  Google Scholar 

  16. Fine, N.J., Wilf, H.S.: Uniqueness theorems for periodic functions. Proc. Am. Math. Soc. 16(1), 109–114 (1965). https://doi.org/10.2307/2034009

    Article  MathSciNet  MATH  Google Scholar 

  17. Fraenkel, A.S., Simpson, J.: How many squares can a string contain? J. Comb. Theory Ser. A 82(1), 112–120 (1998). https://doi.org/10.1006/jcta.1997.2843

    Article  MathSciNet  MATH  Google Scholar 

  18. Gabow, H.N., Tarjan, R.E.: A linear-time algorithm for a special case of disjoint set union. J. Comput. Syst. Sci. 30(2), 209–221 (1985). https://doi.org/10.1016/0022-0000(85)90014-5

    Article  MathSciNet  MATH  Google Scholar 

  19. Gawrychowski, P., I, T., Inenaga, S., Köppl, D., Manea, F.: Tighter bounds and optimal algorithms for all maximal \(\alpha \)-gapped repeats and palindromes. Theory Comput. Syst. 62(1), 162–191 (2017). https://doi.org/10.1007/s00224-017-9794-5

    Article  MATH  Google Scholar 

  20. Gusfield, D., Stoye, J.: Linear time algorithms for finding and representing all the tandem repeats in a string. J. Comput. Syst. Sci. 69(4), 525–546 (2004). https://doi.org/10.1016/j.jcss.2004.03.004

    Article  MathSciNet  MATH  Google Scholar 

  21. Hon, W., Patil, M., Shah, R., Thankachan, S.V.: Compressed property suffix trees. Inf. Comput. 232, 10–18 (2013). https://doi.org/10.1016/j.ic.2013.09.001

    Article  MathSciNet  MATH  Google Scholar 

  22. Kempa, D., Kociumaka, T.: String synchronizing sets: sublinear-time BWT construction and optimal LCE data structure. In: 51st Annual ACM SIGACT Symposium on Theory of Computing (STOC 2019), pp. 756–767. ACM (2019). https://doi.org/10.1145/3313276.3316368

  23. Kociumaka, T.: Efficient data structures for internal queries in texts. Ph.D. thesis, University of Warsaw (2018). https://mimuw.edu.pl/~kociumaka/files/phd.pdf

  24. Kociumaka, T., Kubica, M., Radoszewski, J., Rytter, W., Waleń, T.: A linear-time algorithm for seeds computation. ACM Trans. Algorithms 16(2) (2020). https://doi.org/10.1145/3386369

  25. Kociumaka, T., Radoszewski, J., Rytter, W., Straszyński, J., Waleń, T., Zuba, W.: Efficient representation and counting of antipower factors in words. In: Martín-Vide, C., Okhotin, A., Shapira, D. (eds.) LATA 2019. LNCS, vol. 11417, pp. 421–433. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-13435-8_31

    Chapter  MATH  Google Scholar 

  26. Kociumaka, T., Radoszewski, J., Rytter, W., Straszyński, J., Waleń, T., Zuba, W.: Efficient representation and counting of antipower factors in words (2020). https://arxiv.org/abs/1812.08101v3

  27. Kociumaka, T., Radoszewski, J., Rytter, W., Waleń, T.: Internal pattern matching queries in a text and applications. In: 26th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2015), pp. 532–551. SIAM (2015). https://doi.org/10.1137/1.9781611973730.36

  28. Kociumaka, T., Radoszewski, J., Rytter, W., Waleń, T.: String powers in trees. Algorithmica 79(3), 814–834 (2017). https://doi.org/10.1007/s00453-016-0271-3

    Article  MathSciNet  MATH  Google Scholar 

  29. Kolpakov, R.: Some results on the number of periodic factors in words. Inf. Comput. 270 (2020). https://doi.org/10.1016/j.ic.2019.104459

  30. Kolpakov, R.M., Kucherov, G.: Finding maximal repetitions in a word in linear time. In: 40th Annual Symposium on Foundations of Computer Science (FOCS 1999), pp. 596–604. IEEE Computer Society (1999). https://doi.org/10.1109/SFFCS.1999.814634

  31. Lothaire, M.: Algebraic Combinatorics on Words. Cambridge University Press (2002). https://doi.org/10.1017/cbo9781107326019

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wiktor Zuba .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Charalampopoulos, P., Kociumaka, T., Radoszewski, J., Rytter, W., Waleń, T., Zuba, W. (2020). Efficient Enumeration of Distinct Factors Using Package Representations. In: Boucher, C., Thankachan, S.V. (eds) String Processing and Information Retrieval. SPIRE 2020. Lecture Notes in Computer Science(), vol 12303. Springer, Cham. https://doi.org/10.1007/978-3-030-59212-7_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-59212-7_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-59211-0

  • Online ISBN: 978-3-030-59212-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics