Abstract
We propose a new LZ78-style grammar compression algorithm, named LZ-ABT, which is a simple online algorithm to create, given a string of length N over an alphabet of size \(\sigma \), an \(\alpha \)-balanced grammar in \(O(N \log N \log \sigma )\) time and O(n) space in addition to the input string, where n is the grammar size to output. LZ-ABT can avoid the lower-bound of \(\varOmega (N^{5/4})\) time of the naive algorithms for LZMW and LZD, other LZ78-style compression algorithms, which was observed in [Badkobeh et al. SPIRE 2017, pp. 51–67]. We also show that the algorithm can be executed in compressed space, i.e., without storing the whole input string explicitly in memory: in \(O(N \log ^2 N \log \sigma )\) time and O(n) space, or \(O(N \log N \log \sigma )\) time and \(O(n \log ^{*} N)\) space. We implement LZ-ABT running in \(O(N \log N \log \sigma )\) time and O(N) space and empirically show that its performance is competitive to LZD. This is the first practical implementation of \(\alpha \)-balanced grammar compression to the best of our knowledge.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
We remark that the \(\log \sigma \) multiplicative factor in the running time is the cost to conduct a binary search at internal nodes in the Patricia tree, and can be removed by using hash function if we allow its non-deterministic behavior.
- 2.
Of course, we ignore any trivial input string of length one or zero.
- 3.
Since \(S_{\ell }\) is represented in \(\mathcal {T}_{ V }\), we can shortcut by starting the traversal from the node representing \(S_{\ell }\), but it does not change the complexity.
- 4.
- 5.
- 6.
- 7.
References
Badkobeh, G., Gagie, T., Inenaga, S., Kociumaka, T., Kosolobov, D., Puglisi, S.J.: On two LZ78-style grammars: compression bounds and compressed-space computation. In: Fici, G., Sciortino, M., Venturini, R. (eds.) SPIRE 2017. LNCS, vol. 10508, pp. 51–67. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67428-5_5
Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inf. Theory 51(7), 2554–2576 (2005)
Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: A faster grammar-based self-index. In: Dediu, A.-H., Martín-Vide, C. (eds.) LATA 2012. LNCS, vol. 7183, pp. 240–251. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28332-1_21
Goto, K., Bannai, H., Inenaga, S., Takeda, M.: LZD Factorization: simple and practical online grammar compression with variable-to-fixed encoding. In: Cicalese, F., Porat, E., Vaccaro, U. (eds.) CPM 2015. LNCS, vol. 9133, pp. 219–230. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19929-0_19
Hucke, D., Lohrey, M., Reh, C.P.: The smallest grammar problem revisited. In: Inenaga, S., Sadakane, K., Sakai, T. (eds.) SPIRE 2016. LNCS, vol. 9954, pp. 35–49. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46049-9_4
Jez, A.: Approximation of grammar-based compression via recompression. Theor. Comput. Sci. 592, 115–134 (2015)
Jez, A.: A really simple approximation of smallest grammar. Theor. Comput. Sci. 616, 141–150 (2016)
Larsson, N.J., Moffat, A.: Offline dictionary-based compression. In: Data Compression Conference, DCC 1999, pp. 296–305 (1999)
Lohrey, M.: Algorithmics on SLP-compressed strings: a survey. Groups Complex. Cryptol. 4(2), 241–299 (2012)
Miller, V.S., Wegman, M.N.: Variations on a theme by Ziv and Lempel. In: Apostolico, A., Galil, Z. (eds.) Combinatorial Algorithms on Words. NATO ASI Series, vol. 12, pp. 131–140. Springer, Heidelberg (1985)
Nelson, G., Kieffer, J., Cosman, P.: An interesting hierarchical lossless data compression algorithm. In: IEEE Information Theory Society Workshop (1995)
Nevill-Manning, C.G., Witten, I.H.: Identifying hierarchical strcture in sequences: a linear-time algorithm. J. Artif. Intell. Res. (JAIR) 7, 67–82 (1997)
Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302(1–3), 211–222 (2003)
Sakamoto, H.: A fully linear-time approximation algorithm for grammar-based compression. J. Discrete Algorithms 3(2–4), 416–430 (2005)
Storer, J.A., Szymanski, T.G.: The macro model for data compression (extended abstract). In: Proceedings of the 10th Annual ACM Symposium on Theory of Computing, pp. 30–39 (1978)
Takabatake, Y., I, T., Sakamoto, H.: A space-optimal grammar compression. In: Proceedings of ESA 2017, pp. 67:1–67:15 (2017)
Ziv, J., Lempel, A.: Compression of individual sequences via variable-length coding. IEEE Trans. Inf. Theory 24(5), 530–536 (1978)
Acknowledgments
This work was supported by JST CREST (Grant Number JPMJCR1402), and KAKENHI (Grant Numbers 18K18111, 17H01791 and 16K16009).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Ohno, T., Goto, K., Takabatake, Y., I, T., Sakamoto, H. (2018). LZ-ABT: A Practical Algorithm for \(\alpha \)-Balanced Grammar Compression. In: Iliopoulos, C., Leong, H., Sung, WK. (eds) Combinatorial Algorithms. IWOCA 2018. Lecture Notes in Computer Science(), vol 10979. Springer, Cham. https://doi.org/10.1007/978-3-319-94667-2_27
Download citation
DOI: https://doi.org/10.1007/978-3-319-94667-2_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94666-5
Online ISBN: 978-3-319-94667-2
eBook Packages: Computer ScienceComputer Science (R0)