More Web Proxy on the site http://driver.im/

article

Large alphabets and incompressibility

Author:

Travis GagieAuthors Info & Claims

Information Processing Letters, Volume 99, Issue 6

Pages 246 - 251

https://doi.org/10.1016/j.ipl.2006.04.008

Published: 30 September 2006 Publication History

Abstract

We briefly survey some concepts related to empirical entropy--normal numbers, de Bruijn sequences and Markov processes-- and investigate how well it approximates Kolmogorov complexity. Our results suggest lth-order empirical entropy stops being a reasonable complexity metric for almost all strings of length m over alphabets of size n about when n^l surpasses m.

References

[1]

{1} V. Becher, S. Figueira, An example of a computable absolutely normal number, Theoretical Computer Science 270 (2002) 947-958.

[2]

{2} É. Borel, Les probabilités dénombrables et leur applications arithmétiques, Rendiconti del Circolo Matematico di Palermo 27 (1909) 247-271.

[3]

{3} M. Burrows, D.J. Wheeler, A block-sorting lossless data compression algorithm, Technical Report 24, Digital Equipment Corporation, 1994.

[4]

{4} G.J. Chaitin, On the length of programs for computing finite binary sequences: Statistical considerations, Journal of the ACM 16 (1969) 145-159.

Digital Library

[5]

{5} D.G. Champernowne, The construction of decimals normal in the scale of 10, Journal of the London Mathematical Society 8 (1933) 254-260.

[6]

{6} A.H. Copeland, P. Erdös, Note on normal numbers, Bulletin of the American Mathematical Society 52 (1946) 857-860.

[7]

{7} N.G. de Bruijn, A combinatorial problem, Koninklijke Nederlandse Akademie van Wetenschappen 49 (1946) 758-764.

[8]

{8} P. Ferragina, G. Manzini, V. Mäkinen, G. Navarro, Compressed representations of sequences and full-text indexes, ACM Transactions on Algorithms, submitted for publication.

[9]

{9} T. Gagie, Compressing probability distributions, Information Processing Letters 97 (2006) 133-136.

Digital Library

[10]

{10} R. Grossi, A. Gupta, J.S. Vitter, An algorithmic framework for compression and text indexing, submitted for publication.

[11]

{11} T. Hagerup, C. Rüb, A guided tour of Chernoff bounds, Information Processing Letters 33 (1990) 305-308.

Digital Library

[12]

{12} G. Kalai, S. Safra, Threshold phenomena and influence, in: A.G. Percus, G. Istrate, C. Moore (Eds.), Computational Complexity and Statistical Physics, Oxford University Press, Oxford, 2006.

[13]

{13} A.N. Kolmogorov, Three approaches to the quantitative definition of information, Problems in Information Transmission 1 (1965) 1-7.

[14]

{14} S. Kullback, R.A. Leibler, On information and sufficiency, Annals of Mathematical Statistics 22 (1951) 79-86.

[15]

{15} M. Li, P. Vitányi, An Introduction to Kolmogorov Complexity and its Applications, second ed., Springer-Verlag, Berlin, 1997.

[16]

{16} G. Manzini, An analysis of the Burrows-Wheeler Transform, Journal of the ACM 48 (2001) 407-430.

Digital Library

[17]

{17} R. Motwani, P. Raghavan, Randomized Algorithms, Cambridge University Press, Cambridge, 1995.

[18]

{18} J.I. Munro, P.M. Spira, Sorting and searching in multisets, SIAM Journal on Computing 5 (1976) 1-8.

[19]

{19} V.R. Rosenfeld, Enumerating De Bruijn sequences, MATCH Communications in Mathematical and in Computer Chemistry 45 (2002) 71-83.

[20]

{20} C.E. Shannon, A mathematical theory of communication, Bell System Technical Journal 27 (1948) 379-423, 623-656.

[21]

{21} M.W. Sierpinski, Démonstration élémentaire du théorème de M. Borel sur les nombres absolument normaux et détermination d'un tel nombre, Bulletin de la Société Mathtématiques de France 45 (1917) 127-132.

[22]

{22} D.D. Sleator, R.E. Tarjan, Self-adjusting binary search trees, Journal of the ACM 32 (1985) 652-686.

Digital Library

[23]

{23} R.J. Solomonoff, A formal theory of inductive inference, Information and Control 7 (1964) 1-22, 224-254.

[24]

{24} A.M. Turing, A note on normal numbers, in: J.L. Britton (Ed.), Collected Works of A.M. Turing: Pure Mathematics, North-Holland, Amsterdam, 1992, pp. 117-119.

Cited By

Kociumaka TNavarro GPrezza N(2023)Toward a Definitive Compressibility Measure for Repetitive SequencesIEEE Transactions on Information Theory10.1109/TIT.2022.322438269:4(2074-2092)Online publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1109/TIT.2022.3224382
Dinklage PEllert JFischer JKurpicz FLöbel M(2021)Practical Wavelet Tree ConstructionACM Journal of Experimental Algorithmics10.1145/345719726(1-67)Online publication date: 9-Jul-2021
https://dl.acm.org/doi/10.1145/3457197
Navarro G(2021)Indexing Highly Repetitive String Collections, Part IACM Computing Surveys10.1145/343439954:2(1-31)Online publication date: 5-Mar-2021
https://dl.acm.org/doi/10.1145/3434399
Show More Cited By

Index Terms

Large alphabets and incompressibility

Recommendations

Kolmogorov Complexity and Information Theory.With an Interpretation in Terms of Questions and Answers

We compare the elementary theories of Shannon information and Kolmogorov complexity, the extent to which they have a common purpose, and where they are fundamentally different. We discuss and relate the basic notions of both theories: Shannon entropy, ...
Compression of Low Entropy Strings with Lempel--Ziv Algorithms

We compare the compression ratio of the Lempel--Ziv algorithms with the empirical entropy of the input string. This approach makes it possible to analyze the performance of these algorithms without any assumption on the input and to obtain worst case ...
Entropic measures, Markov information sources and complexity

The concept of entropy plays a major part in communication theory. The Shannon entropy is a measure of uncertainty with respect to a priori probability distribution. In algorithmic information theory the information content of a message is measured in ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Information Processing Letters

Information Processing Letters Volume 99, Issue 6

30 September 2006

37 pages

ISSN:0020-0190

Issue’s Table of Contents

Publisher

Elsevier North-Holland, Inc.

United States

Publication History

Published: 30 September 2006

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kociumaka TNavarro GPrezza N(2023)Toward a Definitive Compressibility Measure for Repetitive SequencesIEEE Transactions on Information Theory10.1109/TIT.2022.322438269:4(2074-2092)Online publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1109/TIT.2022.3224382
Dinklage PEllert JFischer JKurpicz FLöbel M(2021)Practical Wavelet Tree ConstructionACM Journal of Experimental Algorithmics10.1145/345719726(1-67)Online publication date: 9-Jul-2021
https://dl.acm.org/doi/10.1145/3457197
Navarro G(2021)Indexing Highly Repetitive String Collections, Part IACM Computing Surveys10.1145/343439954:2(1-31)Online publication date: 5-Mar-2021
https://dl.acm.org/doi/10.1145/3434399
Hucke DLohrey MBenkner L(2021)Entropy Bounds for Grammar-Based Tree CompressorsIEEE Transactions on Information Theory10.1109/TIT.2021.311267667:11(7596-7615)Online publication date: 1-Nov-2021
https://dl.acm.org/doi/10.1109/TIT.2021.3112676
Kosolobov DValenzuela DNavarro GPuglisi S(2020)Lempel–Ziv-Like Parsing in Small SpaceAlgorithmica10.1007/s00453-020-00722-682:11(3195-3215)Online publication date: 1-Nov-2020
https://dl.acm.org/doi/10.1007/s00453-020-00722-6
Munro JNavarro GNekrich Y(2020)Fast Compressed Self-indexes with Deterministic Linear-Time ConstructionAlgorithmica10.1007/s00453-019-00637-x82:2(316-337)Online publication date: 1-Feb-2020
https://dl.acm.org/doi/10.1007/s00453-019-00637-x
Hucke DLohrey MBenkner L(2020)A Comparison of Empirical Tree EntropiesString Processing and Information Retrieval10.1007/978-3-030-59212-7_17(232-246)Online publication date: 13-Oct-2020
https://dl.acm.org/doi/10.1007/978-3-030-59212-7_17
Hucke DLohrey MBenkner L(2019)Entropy Bounds for Grammar-Based Tree Compressors2019 IEEE International Symposium on Information Theory (ISIT)10.1109/ISIT.2019.8849372(1687-1691)Online publication date: 7-Jul-2019
https://dl.acm.org/doi/10.1109/ISIT.2019.8849372
Kempa DPrezza NDiakonikolas IKempe DHenzinger M(2018)At the roots of dictionary compression: string attractorsProceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing10.1145/3188745.3188814(827-840)Online publication date: 20-Jun-2018
https://dl.acm.org/doi/10.1145/3188745.3188814
Policriti APrezza N(2018)LZ77 Computation Based on the Run-Length Encoded BWTAlgorithmica10.1007/s00453-017-0327-z80:7(1986-2011)Online publication date: 1-Jul-2018
https://dl.acm.org/doi/10.1007/s00453-017-0327-z
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents