Abstract
Hash functions are usually composed of a mode of operation on top of a concrete primitive with fixed input-length and fixed output-length, such as a block cipher or a permutation. In practice, the mode is often sequential, although parallel (or tree) hashing modes are also possible. The former requires less memory, while the latter has several advantages such as its inherent parallelism and a lower cost of hash value recomputation when only a small part of the input changes. In this paper, we consider the general case of (tree or sequential) hashing modes that make use of an underlying hash function, which may in turn be sequential. We formulate a set of three simple conditions for such a (tree or sequential) hashing mode to be sound. By sound, we mean that the advantage in differentiating a hash function obtained by applying a tree hashing mode to an ideal underlying hash function from an ideal monolithic hash function is upper bounded by \(q^2/2^{n+1}\) with \(q\) the number of queries to the underlying hash function and \(n\) the length of the chaining values. We provide a proof of soundness in the indifferentiability framework. The conditions we formulate are easy to implement and to verify and can be used by the practitioner to build a tree hashing mode on top of an existing hash function. We show how to apply tree hashing modes to sequential hash functions in an optimal way, demonstrate the applicability of our conditions with two efficient and simple tree hashing modes and provide a simple method to take the union of tree hashing modes that preserves soundness. It turns out that sequential hashing modes using a compression function (i.e., a hash function with fixed input-length) can be considered as particular cases and, as a by-product, our results also apply to them. We discuss the different techniques for satisfying the three conditions, thereby shedding a new light on several published modes.
Similar content being viewed by others
References
Andreeva, E., Mennink, B., Preneel, B.: Security reductions of the second round SHA-3 candidates. Cryptology ePrint Archive, Report 2010/381, 2010, http://eprint.iacr.org/
Bagheri, N., Gauravaram, P., Knudsen, L.R., Zenner, E.: The suffix-free-prefix-free hash function construction and its indifferentiability security analysis. Int. J. Inf. Secur. 11(6), 419–434 (2012)
Bellare, M., Canetti, R., Krawczyk, H.: Keying hash functions for message authentication. In: Koblitz, N. (ed.) Advances in Cryptology—Crypto ’96, LNCS, no. 1109, pp. 1–15. Springer (1996)
Bellare, M., Ristenpart, T.: Multi-property-preserving hash domain extension and the EMD transform. In: Lai, X. and Chen, K. (eds.) Advances in Cryptology—Asiacrypt 2006, LNCS, no. 4284, pp. 299–314. Springer (2006)
Bellare, M., Rogaway, P.: Random oracles are practical: A paradigm for designing efficient protocols. In: ACM (ed.) ACM Conference on Computer and Communications Security 1993, pp. 62–73 (1993)
Bertoni, G., Daemen, J., Peeters, M., Van Assche, G., Van Keer, R.: Keccak implementation overview, May 2012, http://keccak.noekeon.org/
Bertoni, G., Daemen, J., Peeters, M., Van Assche, G.: On the indifferentiability of the sponge construction. In: Smart, N.P. (eds.) Advances in Cryptology—Eurocrypt 2008. Lecture Notes in Computer Science, vol. 4965, pp. 181–197. Springer (2008) http://sponge.noekeon.org/
Bertoni, G., Daemen, J., Peeters, M., Van Assche, G.: Sakura: A flexible coding for tree hashing. Cryptology ePrint Archive, Report 2013/231, 2013, http://eprint.iacr.org/
Bertoni, G., Daemen, J., Peeters, M., Van Assche, G.: Sponge functions. Ecrypt Hash Workshop 2007, May 2007, also available as public comment to NIST from http://www.csrc.nist.gov/pki/HashWorkshop/Public_Comments/2007_May.html
Bertoni, G., Daemen, J., Peeters, M., Van Assche, G.: Sufficient conditions for sound tree hashing modes. In: Handschuh, H., Lucks, S., Preneel, B., Rogaway, P. (eds.) Symmetric Cryptography (Dagstuhl, Germany), Dagstuhl Seminar Proceedings, no. 09031, Schloss Dagstuhl—Leibniz-Zentrum fuer Informatik, Germany (2009)
Bitcoin Portal, Bitcoin protocol specification. 2013, https://en.bitcoin.it/wiki/Protocol_specification
Chang, D., Nandi, M.: Improved indifferentiability security analysis of chopMD hash function, In: Nyberg, K. (ed.) Fast Software Encryption. Lecture Notes in Computer Science, vol. 5086, Springer, pp. 429–443 (2008)
Chapweske, J., Mohr, G.: Tree Hash EXchange format (THEX). 2003, http://adc.sourceforge.net/draft-jchapweske-thex-02.html
Coron, J., Dodis, Y., Malinaud, C., Puniya, P.: Merkle-Damgård revisited: How to construct a hash function. In: Shoup, V. (ed.) Advances in Cryptology—Crypto 2005, LNCS, no. 3621, pp. 430–448. Springer (2005)
Damgård, I.: A design principle for hash functions. In: Brassard, G. (ed.) Advances in Cryptology—Crypto ’89, LNCS, no. 435, pp. 416–427. Springer (1989)
Dodis, Y., Reyzin, L., Rivest, R., Shen, E.: Indifferentiability of permutation-based compression functions and tree-based modes of operation, with applications to MD6. In: O. Dunkelman, (ed.) Fast Software Encryption. Lecture Notes in Computer Science, vol. 5665, pp. 104–121. Springer (2009)
Hirose, S., Park, J., Yun, A.: A simple variant of the Merkle-Damgård scheme with a permutation. Asiacrypt, pp. 113–129 (2007)
Maurer, U., Renner, R., Holenstein, C.: Indifferentiability, impossibility results on reductions, and applications to the random oracle methodology. In: Naor, M. (ed.) Theory of Cryptography—TCC 2004. Lecture Notes in Computer Science, no. 2951, pp. 21–39. Springer (2004)
Merkle, R.C.: Secrecy, authentication, and public key systems, PhD thesis. UMI Research Press (1982)
Nakamoto, S.: Bitcoin: A peer-to-peer electronic cash system (2008) http://bitcoin.org/bitcoin.pdf
NIST, Federal information processing standard 180–2, secure hash standard. August 2002
Ristenpart, T., Shacham, H., Shrimpton, T.: Careful with composition: Limitations of the indifferentiability framework. In: Paterson, K.G. (ed.) Eurocrypt 2011. Lecture Notes in Computer Science, vol. 6632, pp. 487–506. Springer (2011)
Rivest, R., Agre, B., Bailey, D.V., Cheng, S., Crutchfield, C., Dodis, Y., Fleming, K.E., Khan, A., Krishnamurthy, J., Lin, Y., Reyzin, L., Shen, E., Sukha, J., Sutherland, D., Tromer, E., Yin, Y.L.: The MD6 hash function—a proposal to NIST for SHA-3. Submission to NIST, (2008) http://groups.csail.mit.edu/cis/md6/
Sarkar, P., Schellenberg, P.J.: A parallelizable design principle for cryptographic hash functions. Cryptology ePrint Archive, Report 2002/031, 2002, http://eprint.iacr.org/
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: Illustrations
In this section, we illustrate two undesired properties of tree hashing modes explained in Sect. 4 to introduce two of the three conditions for sound tree hashing. We give some figures of templates generated by some mode of use. The way these templates have been generated by the mode of use are out of scope of this section. Note also that these templates illustrate undesired properties, and hence, the modes of use that would produce them are per definition not sound.
We use the following conventions. Instead of depicting individual bits, we depict message-, chaining- or frame blocks, where a block is just a sequence of consecutive bits. Frame blocks are depicted by white rectangles with its value indicated, message blocks by light gray rectangles and their position in the message indicated, and chaining blocks by dark gray rectangles with an indication of their child. An output is depicted by a rounded rectangle. The nodes are identified with their indices and the relation between the nodes is additionally indicated by arrows, symbolizing the application of \(\mathcal {F}\) during template execution for a concrete input \(M\).
The first property is related to the existence of inner collisions in the absence of collisions in the output of \(\mathcal {F}\) and is illustrated in Fig. 4. The figure depicts two templates that are generated by a mode of use \(\mathcal {T}\) for two different message lengths. All nodes have as first two bits frame bits with value 01. The template on the left has four nodes: three leaf nodes of height 1 and a final node that takes an input block and the chaining values corresponding to the three leaf nodes. The template on the right has three nodes: two leaf nodes of height 1 and a final node that takes an input block and the chaining values corresponding to the two leaf nodes. Note that the final node of the right template has a message block (indicated by \(M'_0\)) in the place where the final node of the left template has the concatenation of a message block \(M_0\) and a chaining block \(CV_2\). We can exploit this fact to construct an inner collision from any message \(M\) with length matching the left template. As can be seen in the figure, it suffices to form \(M'\) by replacing in \(M\) the block \(M_1\) by \(\mathcal {F}(01|M_1)\).
The second property, a generalization of length extension to tree hashing, is illustrated in Fig. 5. Given the output of \(h = \mathcal {T}[\mathcal {F}](M)\) of some message \(M\), length extension is the possibility to compute the output of \(\mathcal {T}[\mathcal {F}](M')\) with \(M\) a substring of \(M'\), only knowing the output \(h\) and not \(M\) itself. Figure 5 depicts two templates corresponding with two different message lengths. The templates have a binary tree structure. The template at the left has three nodes: two leaf nodes and a final node containing the chaining values corresponding to the two leaf nodes. The template at the right has seven nodes: four leaf nodes, two intermediate nodes each containing the chaining values corresponding to two leaf nodes and a final node containing the chaining values of the intermediate nodes. Note that the chaining block \(CV_0\) in the final node of the right template corresponds with the hashing output of the left template. As can be seen in the figure, given the hash output \(h\) of a message \(M\) with length matching the left template, one can compute the hash output of any message \(M' = M | M_2 | M_3\) with length matching the right template without knowledge of \(M\).
Appendix B: Remarks on the cost
The cost measure introduced in Sect. 5.2 aims at counting on an equal footing both queries to \(\mathcal {H}\) and queries to \(\mathcal {I}\). We wish to illustrate this by comparing two examples of distinguisher.
The first distinguisher uses only the \(\mathcal {I}\) interface to produce a collision in \(\mathcal {F}_n\) (or in the simulator). Assuming a collision is produced, two messages can be built, so as to turn this collision into an inner collision in \(\mathcal {T}\) but not in \(\mathcal {G}\). This attack takes about \(2^{n/2}\) queries. (If after \(2^{n/2}\) attempts no collision has been found, the distinguisher may suspect it is not querying \(\mathcal {F}\) but a simulator.)
The second distinguisher uses only the \(\mathcal {H}\) interface and attempts to exhibit an inner collision directly. When talking to \(\mathcal {T}\), such an inner collision can occur, but when talking to \(\mathcal {G}\), an inner collision does not even exist (with the requested output-length sufficiently large to detect such an inner collision with arbitrary certainty). More specifically, the distinguisher queries the \(\mathcal {H}\) interface with equal tree parameters \(A\) and messages \(M_i\) that vary only in one leaf, which is chosen to have the maximum height \(H\) in the tree. To obtain an inner collision, it is sufficient to get a collision at any of the \(H\) nodes on the way from the leaf to the final node. The distinguisher needs about \(2^{n/2}/H\) queries to hit an inner collision. Hence, in this context a query to \(\mathcal {H}\) appears to be a factor \(H\) more powerful than a query to \(\mathcal {I}\).
The cost function that counts calls to \(\mathcal {F}\) and discards duplicate queries as one brings the two distinguishers to a more equal footing. The first distinguisher succeeds at a cost of about \(2^{n/2}\). The queries of the second distinguisher could be performed at the level of the \(\mathcal {I}\) interface, the tree mode \(\mathcal {T}\) being simulated by the distinguisher. In this case, each query to \(\mathcal {H}\) translates into \(f_\mathcal {T}(|M|,A)\) queries to \(\mathcal {I}\). However, the strategy of the second distinguisher is such that only \(H\) \(Q_\mathcal {I}\) queries differ for each of the \(2^{n/2}/H\) \(Q_\mathcal {H}\) queries. Hence, the cost of \(Q_\mathcal {I}\) for the second distinguisher is also about \(2^{n/2}\).
Rights and permissions
About this article
Cite this article
Bertoni, G., Daemen, J., Peeters, M. et al. Sufficient conditions for sound tree and sequential hashing modes. Int. J. Inf. Secur. 13, 335–353 (2014). https://doi.org/10.1007/s10207-013-0220-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10207-013-0220-y