Abstract
We investigate properties and applications of a new compact representation of string factors: families of packages. In a string T, each package \((i,\ell ,k)\) represents the factors of T of length \(\ell \) that start in the interval \([i,i+k]\). A family \(\mathcal {F}\) of packages represents the set \(\mathsf {Factors}(\mathcal {F})\) defined as the union of the sets of factors represented by individual packages in \(\mathcal {F}\). We show how to efficiently enumerate \(\mathsf {Factors}(\mathcal {F})\) and showcase that this is a generic tool for enumerating important classes of factors of T, such as powers and antipowers. Our approach is conceptually simpler than problem-specific methods and provides a unifying framework for such problems, which we hope can be further exploited.
We also consider a special case of the problem in which every occurrence of every factor represented by \(\mathcal {F}\) is captured by some package in \(\mathcal {F}\). For both applications mentioned above, we construct an efficient package representation that satisfies this property.
We develop efficient algorithms that, given a family \(\mathcal {F}\) of m packages in a string of length n, report all distinct factors represented by these packages in \(\mathcal {O}(n\log ^2 n+m\log n+|\mathsf {Factors}(\mathcal {F})|)\) time for the general case and in the optimal \(\mathcal {O}(n+m+|\mathsf {Factors}(\mathcal {F})|)\) time for the special case. We can also compute \(|\mathsf {Factors}(\mathcal {F})|\) in \(\mathcal {O}(n \log ^2 n + m \log n)\) time in the general case and in \(\mathcal {O}(n+m)\) time in the special case.
In particular, we improve over the state-of-the-art \(\mathcal {O}(nk^4\log k \log n)\)-time algorithm for computing the number of distinct k-antipower factors, by providing an algorithm that runs in \(\mathcal {O}(nk^2)\) time, and we obtain an alternative linear-time algorithm to enumerate distinct squares.
P. Charalampopoulos—Partially supported by ERC grant TOTAL under the EU’s Horizon 2020 Research and Innovation Programme (agreement no. 677651).
T. Kociumaka—Supported by ISF grants no. 1278/16 and 1926/19, by a BSF grant no. 2018364, and by an ERC grant MPM under the EU’s Horizon 2020 Research and Innovation Programme (grant no. 683064).
J. Radoszewski, T. Waleń and W. Zuba—Supported by the Polish National Science Center, grant no. 2018/31/D/ST6/03991.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The workhorse of Lemma 10 is computing the size of the union of certain 1D-intervals. For the reporting version, we simply have to report all elements of this union.
References
Alamro, H., Badkobeh, G., Belazzougui, D., Iliopoulos, C.S., Puglisi, S.J.: Computing the antiperiod(s) of a string. In: 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). LIPIcs, vol. 128, pp. 32:1–32:11. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019). https://doi.org/10.4230/LIPIcs.CPM.2019.32
Alzamel, M., et al.: Online algorithms on antipowers and antiperiods. In: Brisaboa, N.R., Puglisi, S.J. (eds.) SPIRE 2019. LNCS, vol. 11811, pp. 175–188. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32686-9_13
Amir, A., Chencinski, E., Iliopoulos, C.S., Kopelowitz, T., Zhang, H.: Property matching and weighted matching. Theor. Comput. Sci. 395(2–3), 298–310 (2008). https://doi.org/10.1016/j.tcs.2008.01.006
Badkobeh, G., Fici, G., Puglisi, S.J.: Algorithms for anti-powers in strings. Inf. Process. Lett. 137, 57–60 (2018). https://doi.org/10.1016/j.ipl.2018.05.003
Bannai, H., Tomohiro, I., Inenaga, S., Nakashima, Y., Takeda, M., Tsuruta, K.: The “runs” theorem. SIAM J. Comput. 46(5), 1501–1514 (2017). https://doi.org/10.1137/15M1011032
Bannai, H., Inenaga, S., Köppl, D.: Computing all distinct squares in linear time for integer alphabets. In: 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). LIPIcs, vol. 78, pp. 22:1–22:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2017). https://doi.org/10.4230/LIPIcs.CPM.2017.22
Barton, C., Kociumaka, T., Liu, C., Pissis, S.P., Radoszewski, J.: Indexing weighted sequences: neat and efficient. Inf. Comput. 270 (2020). https://doi.org/10.1016/j.ic.2019.104462
Charalampopoulos, P., Iliopoulos, C.S., Liu, C., Pissis, S.P.: Property suffix array with applications in indexing weighted sequences. ACM J. Exp. Algorithm. 25(1) (2020). https://doi.org/10.1145/3385898
Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings. Cambridge University Press (2007)
Crochemore, M., Ilie, L.: Computing longest previous factor in linear time and applications. Inf. Process. Lett. 106(2), 75–80 (2008). https://doi.org/10.1016/j.ipl.2007.10.006
Crochemore, M., Iliopoulos, C.S., Kubica, M., Radoszewski, J., Rytter, W., Waleń, T.: Extracting powers and periods in a word from its runs structure. Theor. Comput. Sci. 521, 29–41 (2014). https://doi.org/10.1016/j.tcs.2013.11.018
Deza, A., Franek, F., Thierry, A.: How many double squares can a string contain? Discrete Appl. Math. 180, 52–69 (2015). https://doi.org/10.1016/j.dam.2014.08.016
Farach, M.: Optimal suffix tree construction with large alphabets. In: 38th Annual Symposium on Foundations of Computer Science (FOCS 1997), pp. 137–143. IEEE Computer Society (1997). https://doi.org/10.1109/SFCS.1997.646102
Fici, G., Postic, M., Silva, M.: Abelian antipowers in infinite words. Adv. Appl. Math. 108, 67–78 (2019). https://doi.org/10.1016/j.aam.2019.04.001
Fici, G., Restivo, A., Silva, M., Zamboni, L.Q.: Anti-powers in infinite words. J. Comb. Theory Ser. A 157, 109–119 (2018). https://doi.org/10.1016/j.jcta.2018.02.009
Fine, N.J., Wilf, H.S.: Uniqueness theorems for periodic functions. Proc. Am. Math. Soc. 16(1), 109–114 (1965). https://doi.org/10.2307/2034009
Fraenkel, A.S., Simpson, J.: How many squares can a string contain? J. Comb. Theory Ser. A 82(1), 112–120 (1998). https://doi.org/10.1006/jcta.1997.2843
Gabow, H.N., Tarjan, R.E.: A linear-time algorithm for a special case of disjoint set union. J. Comput. Syst. Sci. 30(2), 209–221 (1985). https://doi.org/10.1016/0022-0000(85)90014-5
Gawrychowski, P., I, T., Inenaga, S., Köppl, D., Manea, F.: Tighter bounds and optimal algorithms for all maximal \(\alpha \)-gapped repeats and palindromes. Theory Comput. Syst. 62(1), 162–191 (2017). https://doi.org/10.1007/s00224-017-9794-5
Gusfield, D., Stoye, J.: Linear time algorithms for finding and representing all the tandem repeats in a string. J. Comput. Syst. Sci. 69(4), 525–546 (2004). https://doi.org/10.1016/j.jcss.2004.03.004
Hon, W., Patil, M., Shah, R., Thankachan, S.V.: Compressed property suffix trees. Inf. Comput. 232, 10–18 (2013). https://doi.org/10.1016/j.ic.2013.09.001
Kempa, D., Kociumaka, T.: String synchronizing sets: sublinear-time BWT construction and optimal LCE data structure. In: 51st Annual ACM SIGACT Symposium on Theory of Computing (STOC 2019), pp. 756–767. ACM (2019). https://doi.org/10.1145/3313276.3316368
Kociumaka, T.: Efficient data structures for internal queries in texts. Ph.D. thesis, University of Warsaw (2018). https://mimuw.edu.pl/~kociumaka/files/phd.pdf
Kociumaka, T., Kubica, M., Radoszewski, J., Rytter, W., Waleń, T.: A linear-time algorithm for seeds computation. ACM Trans. Algorithms 16(2) (2020). https://doi.org/10.1145/3386369
Kociumaka, T., Radoszewski, J., Rytter, W., Straszyński, J., Waleń, T., Zuba, W.: Efficient representation and counting of antipower factors in words. In: Martín-Vide, C., Okhotin, A., Shapira, D. (eds.) LATA 2019. LNCS, vol. 11417, pp. 421–433. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-13435-8_31
Kociumaka, T., Radoszewski, J., Rytter, W., Straszyński, J., Waleń, T., Zuba, W.: Efficient representation and counting of antipower factors in words (2020). https://arxiv.org/abs/1812.08101v3
Kociumaka, T., Radoszewski, J., Rytter, W., Waleń, T.: Internal pattern matching queries in a text and applications. In: 26th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2015), pp. 532–551. SIAM (2015). https://doi.org/10.1137/1.9781611973730.36
Kociumaka, T., Radoszewski, J., Rytter, W., Waleń, T.: String powers in trees. Algorithmica 79(3), 814–834 (2017). https://doi.org/10.1007/s00453-016-0271-3
Kolpakov, R.: Some results on the number of periodic factors in words. Inf. Comput. 270 (2020). https://doi.org/10.1016/j.ic.2019.104459
Kolpakov, R.M., Kucherov, G.: Finding maximal repetitions in a word in linear time. In: 40th Annual Symposium on Foundations of Computer Science (FOCS 1999), pp. 596–604. IEEE Computer Society (1999). https://doi.org/10.1109/SFFCS.1999.814634
Lothaire, M.: Algebraic Combinatorics on Words. Cambridge University Press (2002). https://doi.org/10.1017/cbo9781107326019
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Charalampopoulos, P., Kociumaka, T., Radoszewski, J., Rytter, W., Waleń, T., Zuba, W. (2020). Efficient Enumeration of Distinct Factors Using Package Representations. In: Boucher, C., Thankachan, S.V. (eds) String Processing and Information Retrieval. SPIRE 2020. Lecture Notes in Computer Science(), vol 12303. Springer, Cham. https://doi.org/10.1007/978-3-030-59212-7_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-59212-7_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59211-0
Online ISBN: 978-3-030-59212-7
eBook Packages: Computer ScienceComputer Science (R0)