Abstract
Digital trees, also known as tries, are a general purpose flexible data structure that implements dictionaries built on sets of words. An analysis is given of three major representations of tries in the form of array-tries, list tries, and bst-tries (“ternary search tries”). The size and the search costs of the corresponding representations are analysed precisely in the average case, while a complete distributional analysis of the height of tries is given. The unifying data model used is that of dynamical sources and it encompasses classical models like those of memoryless sources with independent symbols, of finite Markov chains, and of nonuniform densities. The probabilistic behaviour of the main parameters, namely, size, path length, or height, appears to be determined by two intrinsic characteristics of the source: the entropy and the probability of letter coincidence. These characteristics are themselves related in a natural way to spectral properties of specific transfer operators of the Ruelle type.
Similar content being viewed by others
References
Allen, B., and Munro, I. Self-organizing binary search trees.Journal of the ACM 25(4) (Oct. 1978), 526–535.
Bedford, T., Keane, M., and Series, C.Ergodic Theory, Symbolic Dynamics and Hyperbolic Spaces. Oxford University Press, Oxford, 1991.
Bentley, J., and Sedgewick, R. Fast algorithms for sorting and searching strings. InProceedings of the Eighth Annual ACM-SIAM Symposium on Discrete Algorithms (New Orleans, January 1997), pp. 360–369. SIAM Press, Philadelphia, PA, 1997.
Bogomolny, E.B., and Carioli, M. Quantum maps from transfer operators.Physica D 67 (1993), 88–112.
Burge, W. H. An analysis of binary search trees formed from sequences of nondistinct keys.Journal of the ACM 23(3) (July 1976), 451–454.
Clampett, H. A. Randomized binary searching with tree structures.Communications of the ACM 7(3) (March 1964), 163–165.
Clément, J., Flajolet, P., and Vallée, B. The analysis of hybrid trie structures. InProceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms (Philadelphia, 1998), pp. 531–539. SIAM Press, Philadelphia, PA, 1988.
Clément, J., and Zimmermann, P. An efficient spell-checker based on ternary search tries (provisional title). Research report, Institut National de Recherche en Informatique et en Automatique, 2000. In preparation.
Daudé, H., Flajolet, P., and Vallée, B. An average-case analysis of the Gaussian algorithm for lattice reduction.Combinatorics, Probability and Computing 6 (1997), 397–433.
De Bruijn, N. G.Asymptotic Methods in Analysis. Dover, 1981. first edition, A reprint of the third North-Holland edition, Amsterdam, 1970 (1958).
Devroye, L. A probabilistic analysis of the height of tries and of the complexity of triesort.Acta Informatica 21 (1984), 229–237.
Devroye, L. A study of trie-like structures under the density model.The Annals of Applied Probability 2(2) (1992), 402–434.
Efrat, I. Dynamics of the continued fraction map and the spectral theory ofSL(2,Z).Inventiones Mathematicae 114 (1993), 207–218.
Fagin, R., Nievergelt, J., Pippenger, N., and Strong, R. Extendible hashing: a fast access method for dynamic files.ACM Transactions on Database Systems 4 (1979), 315–344.
Faivre, C. Distribution of Lévy’s constants for quadratic numbers.Acta Arithmetica 61(1) (1992), 13–34.
Fayolle, G., Flajolet, P., and Hofri, M. On a functional equation arising in the analysis of a protocol for a multi-accessbroadcast channel.Advances in Applied Probability 18 (1986), 441–472.
Flajolet, P. On the performance evaluation of extendible hashing and trie searching.Acta Informatica 20 (1983), 345–369.
Flajolet, P., Gardy, D., and Thimonier, L. Birthday paradox, coupon collectors, caching algorithms, and self-organizing search.Discrete Applied Mathematics 39 (1992), 207–229.
Flajolet, P., Gourdon, X., and Dumas, P. Mellin transforms and asymptotics: harmonic sums.Theoretical Computer Science 144(1–2) (June 1995), 3–58.
Flajolet, P., and Puech, C. Partial match retrieval of multidimensional data.Journal of the ACM 33(2) (1986), 371–407.
Flajolet, P., and Steyaert, J.-M. A branching process arising in dynamic hashing, trie searching and polynomial factorization. InAutomata, Languages and Programming, M. Nielsen and E. M. Schmidt, eds., vol. 140 of Lecture Notes in Computer Science, pp. 239–251. Springer-Verlag, Berlin, 1982. Proceedings of 9th ICALP Colloquium, Aarhus, Denmark, July 1982.
Flajolet, P., and Vallée, B. Continued fraction algorithms, functional operators and structure constants.Theoretical Computer Science 194 (1998), 1–34.
Gonnet, G. H., and Baeza-Yates, R.Handbook of Algorithms and Data Structures: in Pascal and C, second edn. Addison-Wesley, Reading, MA, 1991.
Grothendieck, A.Produits tensoriels topologiques et espaces nucléaires. No. 16 in Memoirs of the American Mathematical Society. A.M.S., Providence, RI, 1955.
Grothendieck, A. La théorie de Fredholm.Bulletin de la Société Mathématique de France 84 (1956), 319–384.
Jacquet, P., and Régnier, M. Trie partitioning process: limiting distributions. InCAAP ’86, P. Franchi-Zanetacchi, ed., vol. 214 of Lecture Notes in Computer Science, pp. 196–210. Springer-Verlag, Berlin, 1986. Proceedings of the 11th Colloquium on Trees in Algebra and Programming, Nice France, March 1986.
Jacquet, P., and Szpankowski, W. Analysis of digital tries with Markovian dependency.IEEE Transactions on Information Theory 37(5) (1991), 1470–1475.
Jacquet, P., and Szpankowski, W. Autocorrelation on words and its applications: analysis of suffix trees by string-ruler approach.Journal of Combinatorial Theory. Series A 66(2) (1994), 237–269.
Jacquet, P., and Szpankowski, W. Asymptotic behavior of the Lempel-Ziv parsing scheme and digital search trees.Theoretical Computer Science 144(1–2) (1995), 161–197.
Jacquet, P., and Szpankowski, W. Analytical de-Poissonization and its applications.Theoretical Computer Science 201(1–2) (1998), 1–62.
Janson, S. Private communication.
Kato, T.Perturbation Theory for Linear Operators. Springer-Verlag, New York, 1980.
Kirschenhofer, P., and Prodinger, H. Eine Anwendung der Theorie der Modulfunktionen in der Informatik.Sitzungsberichte der Österreichischen Akademie der Wissenschaften 197 (1988), 339–366.
Kirschenhofer, P., and Prodinger, H. On some applications of formulæ of Ramanujan in the analysis of algorithms.Mathematika 38 (1991), 14–33.
Kirschenhofer, P., Prodinger, H., and Szpankowski, W. Analysis of a splitting process arising in probabilistic counting and other related algorithms.Random Structures & Algorithms 9 (1996), 379–402.
Knuth, D. E.The Art of Computer Programming, third edn., vol. 3: Sorting and Searching. Addison-Wesley, Reading, MA, 1998.
Krasnoselskii, M.Positive Solutions of Operator Equations. P. Noordhoff, Groningen, 1964.
Larson, P. A. Dynamic hashing.BIT 18 (1978), 184–201.
Lind, D., and Marcus, B.An Introduction to Symbolic Dynamics and Coding. Cambridge University Press, Cambridge, 1995.
Lothaire, M.Combinatorics on Words, vol. 17 of Encyclopedia of Mathematics and its Applications. Addison-Wesley, Reading, MA, 1983.
Mahmoud, H.Evolution of Random Search Trees. Wiley, New York, 1992.
Mayer, D. H. Private communication.
Mayer, D. H. Spectral properties of certain composition operators arising in statistical mechanics.Communications in Mathematical Physics 68 (1979), 1–8.
Mayer, D. H. On composition operators on Banach spaces of holomorphic functions.Journal of Functional Analysis 35 (1980), 191–206.
Pittel, B. Paths in a random digital tree: limiting distributions.Advances in Applied Probability 18(1) (1986), 139–155.
Pollicott, M. A complex Ruelle-Perron-Frobenius theorem and two counterexamples.Ergodic Theory and Dynamical Systems 4 (1984), 135–146.
Prodinger, H. Combinatorics of geometrically distributed random variables: left-to-right maxima.Discrete Mathematics 153 (1996), 253–270.
Régnier, M. On the average height of trees in in digital search and dynamic hashing.Information Processing Letters 13 (1982), 64–66.
Rivest, R. L. Partial match retrieval algorithms.SIAM Journal on Computing 5 (1976), 19–50.
Ruelle, D.Thermodynamic Formalism. Addison-Wesley, Reading, MA, 1978.
Ruelle, D.Dynamical Zeta Functions for Piecewise Monotone Maps of the Interval, vol. 4 of CRM Monograph Series. American Mathematical Society, Providence, RI, 1994.
Schwartz, H. Composition operators in ℋp. Ph.D. thesis, University of Toledo, 1969.
Sedgewick, R.Algorithms in C: Fundamentals, Data Structures, Sorting, Searching, third edn. Addison-Wesley, Reading, MA, 1988.
Shapiro, J. Compact composition operators on spaces of boundary regular holomorphic functions.Proceedings of the AMS 100 (1997), 49–57.
Shapiro, J., and Taylor, P. Compact, nuclear, and Hilbert-Schmidt composition operators on ℋ2.Indiana University Mathematics Journal 23 (1973), 471–496.
Szpankowski, W. Some results onV-ary asymmetric tries.Journal of Algorithms 9 (1988), 224–244.
Szpankowski, W. On the height of digital trees and related problems.Algorithmica 6(2) (1991), 256–277.
Titchmarsh, E. C., and Heath-Brown, D. R.The Theory of the Riemann Zeta-Function, second edn. Oxford Science Publications, Oxford, 1986.
Trabb Pardo, L. Set representation and set intersection. Technical report, Stanford University, 1978.
Vallée, B. Algorithms for computing signs of 2×2 determinants: dynamics and average-case analysis. InAlgorithms—ESA ’97, G. W. R. Burkhard, ed., no. 1136 in Lecture Notes in Computer Science, pp. 486–499. Springer-Verlag, Berlin, 1997. Proceedings of the Fifth European Symposium on Algorithms, Graz, September 1997.
Vallée, B. Opérateurs de Ruelle-Mayer généralisés et analyse des algorithmes d’Euclide et de Gauss.Acta Arithmetica 81(2) (1997), 101–144.
Vallée, B. Dynamics of the binary euclidean algorithm: functional analysis and operators.Algorithmica 22(4) (1998), 660–685.
Vallée, B. Dynamical sources in information theory: fundamental intervals and words prefixes.Algorithmica, this issue.
Vallée, B., and Lemée, C. Average-case analyses of three algorithms for computing the jacobi symbol. Rapport de Recherche de l’Université de Caen, Les Cahiers du GREYC # 1, 1999.
Yao, A. C.-C. A note on the analysis of extendible hashing.Information Processing Letters 11 (1980), 84–86.
Author information
Authors and Affiliations
Additional information
Communicated by H. Prodinger and W. Szpankowski.
This work was supported in part by the Long Term Research Project ALCOM-IT (# 20244) of the European Union.
Online publication October 6, 2000.
Rights and permissions
About this article
Cite this article
Clément, J., Flajolet, P. & Vallée, B. Dynamical sources in information theory: A general analysis of trie structures. Algorithmica 29, 307–369 (2001). https://doi.org/10.1007/BF02679623
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02679623