[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Large alphabets and incompressibility

Published: 30 September 2006 Publication History

Abstract

We briefly survey some concepts related to empirical entropy--normal numbers, de Bruijn sequences and Markov processes-- and investigate how well it approximates Kolmogorov complexity. Our results suggest lth-order empirical entropy stops being a reasonable complexity metric for almost all strings of length m over alphabets of size n about when nl surpasses m.

References

[1]
{1} V. Becher, S. Figueira, An example of a computable absolutely normal number, Theoretical Computer Science 270 (2002) 947-958.
[2]
{2} É. Borel, Les probabilités dénombrables et leur applications arithmétiques, Rendiconti del Circolo Matematico di Palermo 27 (1909) 247-271.
[3]
{3} M. Burrows, D.J. Wheeler, A block-sorting lossless data compression algorithm, Technical Report 24, Digital Equipment Corporation, 1994.
[4]
{4} G.J. Chaitin, On the length of programs for computing finite binary sequences: Statistical considerations, Journal of the ACM 16 (1969) 145-159.
[5]
{5} D.G. Champernowne, The construction of decimals normal in the scale of 10, Journal of the London Mathematical Society 8 (1933) 254-260.
[6]
{6} A.H. Copeland, P. Erdös, Note on normal numbers, Bulletin of the American Mathematical Society 52 (1946) 857-860.
[7]
{7} N.G. de Bruijn, A combinatorial problem, Koninklijke Nederlandse Akademie van Wetenschappen 49 (1946) 758-764.
[8]
{8} P. Ferragina, G. Manzini, V. Mäkinen, G. Navarro, Compressed representations of sequences and full-text indexes, ACM Transactions on Algorithms, submitted for publication.
[9]
{9} T. Gagie, Compressing probability distributions, Information Processing Letters 97 (2006) 133-136.
[10]
{10} R. Grossi, A. Gupta, J.S. Vitter, An algorithmic framework for compression and text indexing, submitted for publication.
[11]
{11} T. Hagerup, C. Rüb, A guided tour of Chernoff bounds, Information Processing Letters 33 (1990) 305-308.
[12]
{12} G. Kalai, S. Safra, Threshold phenomena and influence, in: A.G. Percus, G. Istrate, C. Moore (Eds.), Computational Complexity and Statistical Physics, Oxford University Press, Oxford, 2006.
[13]
{13} A.N. Kolmogorov, Three approaches to the quantitative definition of information, Problems in Information Transmission 1 (1965) 1-7.
[14]
{14} S. Kullback, R.A. Leibler, On information and sufficiency, Annals of Mathematical Statistics 22 (1951) 79-86.
[15]
{15} M. Li, P. Vitányi, An Introduction to Kolmogorov Complexity and its Applications, second ed., Springer-Verlag, Berlin, 1997.
[16]
{16} G. Manzini, An analysis of the Burrows-Wheeler Transform, Journal of the ACM 48 (2001) 407-430.
[17]
{17} R. Motwani, P. Raghavan, Randomized Algorithms, Cambridge University Press, Cambridge, 1995.
[18]
{18} J.I. Munro, P.M. Spira, Sorting and searching in multisets, SIAM Journal on Computing 5 (1976) 1-8.
[19]
{19} V.R. Rosenfeld, Enumerating De Bruijn sequences, MATCH Communications in Mathematical and in Computer Chemistry 45 (2002) 71-83.
[20]
{20} C.E. Shannon, A mathematical theory of communication, Bell System Technical Journal 27 (1948) 379-423, 623-656.
[21]
{21} M.W. Sierpinski, Démonstration élémentaire du théorème de M. Borel sur les nombres absolument normaux et détermination d'un tel nombre, Bulletin de la Société Mathtématiques de France 45 (1917) 127-132.
[22]
{22} D.D. Sleator, R.E. Tarjan, Self-adjusting binary search trees, Journal of the ACM 32 (1985) 652-686.
[23]
{23} R.J. Solomonoff, A formal theory of inductive inference, Information and Control 7 (1964) 1-22, 224-254.
[24]
{24} A.M. Turing, A note on normal numbers, in: J.L. Britton (Ed.), Collected Works of A.M. Turing: Pure Mathematics, North-Holland, Amsterdam, 1992, pp. 117-119.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Information Processing Letters
Information Processing Letters  Volume 99, Issue 6
30 September 2006
37 pages

Publisher

Elsevier North-Holland, Inc.

United States

Publication History

Published: 30 September 2006

Author Tags

  1. Kolmogorov complexity
  2. Markov processes
  3. Shannon's entropy
  4. analysis of algorithms
  5. birthday paradox
  6. data compression
  7. de Bruijn sequences
  8. empirical entropy
  9. normal numbers
  10. relative entropy
  11. self-information
  12. threshold phenomena

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Toward a Definitive Compressibility Measure for Repetitive SequencesIEEE Transactions on Information Theory10.1109/TIT.2022.322438269:4(2074-2092)Online publication date: 1-Apr-2023
  • (2021)Practical Wavelet Tree ConstructionACM Journal of Experimental Algorithmics10.1145/345719726(1-67)Online publication date: 9-Jul-2021
  • (2021)Indexing Highly Repetitive String Collections, Part IACM Computing Surveys10.1145/343439954:2(1-31)Online publication date: 5-Mar-2021
  • (2021)Entropy Bounds for Grammar-Based Tree CompressorsIEEE Transactions on Information Theory10.1109/TIT.2021.311267667:11(7596-7615)Online publication date: 1-Nov-2021
  • (2020)Lempel–Ziv-Like Parsing in Small SpaceAlgorithmica10.1007/s00453-020-00722-682:11(3195-3215)Online publication date: 1-Nov-2020
  • (2020)Fast Compressed Self-indexes with Deterministic Linear-Time ConstructionAlgorithmica10.1007/s00453-019-00637-x82:2(316-337)Online publication date: 1-Feb-2020
  • (2020)A Comparison of Empirical Tree EntropiesString Processing and Information Retrieval10.1007/978-3-030-59212-7_17(232-246)Online publication date: 13-Oct-2020
  • (2019)Entropy Bounds for Grammar-Based Tree Compressors2019 IEEE International Symposium on Information Theory (ISIT)10.1109/ISIT.2019.8849372(1687-1691)Online publication date: 7-Jul-2019
  • (2018)At the roots of dictionary compression: string attractorsProceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing10.1145/3188745.3188814(827-840)Online publication date: 20-Jun-2018
  • (2018)LZ77 Computation Based on the Run-Length Encoded BWTAlgorithmica10.1007/s00453-017-0327-z80:7(1986-2011)Online publication date: 1-Jul-2018
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media