Abstract
Catena is a password-scrambling framework characterized by its high flexibility. The user (defender) can simply adapt the underlying (cryptographic) primitives, the underlying memory-hard function, and the time (\(\lambda \)) and memory (garlic) parameters, to render it suitable for a wide range of applications. This enables Catena to maximize the defense against specific adversaries, their capabilities and goals, and to cope with a high variation of hardware and constraints on the side of the defender. Catena has obtained special recognition of the Password Hashing Competition (PHC), alongside of the winner Argon2.
In addition to the default instantiations presented in the PHC submission, we want to use this document to introduce further variants of Catena, or rather, further instantiations of the Catena framework. Our instantiations use different hash functions, and we evaluate their influence on the computational time and the throughput. Next, we discuss how instantiations of the memory-hard graph-based algorithm influence the computational time and resistance against low-memory attacks. Furthermore, we introduce possible extensions of Catena accommodating strong resistance against GPU- and ASIC-based adversaries, e.g., by providing sequential memory-hardness due to a data-dependent indexing function. At the end, we combine particular instantiations discussed so far to construct full-fledged variants of Catena for certain goals. Hence, this document can be seen as an additional guide to the PHC submission of Catena when considering its usage under certain restrictions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Note that the tweak is given as the hash value of an instantiation-specific identifier, a mode, \(\lambda \), the output size and the size of the salt in bit, and the associated data. Thus, the tweak is unique for each system and each user within a system.
References
Alwen, J., Serbinenko, V.: High parallel complexity graphs and memory-hard functions. In: Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC 2015, Portland, OR, USA, June 14–17, 2015, pp. 595–603 (2015)
Aumasson, J.-P.: Password Hashing Competition (2015). https://password-hashing.net/call.html. Accessed 3 September 2015
Aumasson, J.-P.: Password Hashing Competition - Candidates. https://password-hashing.net/candidates.html
Aumasson, J.-P., Neves, S., Wilcox-O’Hearn, Z., Winnerlein, C.: BLAKE2: Simpler, Smaller, Fast as MD5. In: Jacobson, M., Locasto, M., Mohassel, P., Safavi-Naini, R. (eds.) ACNS 2013. LNCS, vol. 7954, pp. 119–135. Springer, Heidelberg (2013)
Bernstein, D.J.: Cache-timing attacks on AES (2005)
Biryukov, A., Dinu, D., Khovratovich, D.: Argon2. Password Hashing Competition, Winner (2015). https://www.cryptolux.org/index.php/Argon2
Biryukov, A., Khovratovich, D.: Tradeoff cryptanalysis of Catena. PHC mailing list: discussions@password-hashing.net
Biryukov, A., Khovratovich, D.: Tradeoff cryptanalysis of memory-hard functions. IACR Cryptol. ePrint Arch. 2015, 227 (2015)
Bonneau, J.: The science of guessing: analyzing an anonymized corpus of 70 million passwords. In: 2012 IEEE Symposium on Security and Privacy, May 2012
Brent, R.P., Gaudry, P., Thomé, E., Zimmermann, P.: Faster Multiplication in GF(2) [x]. In: ANTS, pp. 153–166 (2008)
Cox, B.: TwoCats (and SkinnyCat): A Compute Time and Sequential Memory Hard Password Hashing Scheme (2014). https://password-hashing.net/submissions/specs/TwoCats-v0.pdf
Forler, C., List, E., Lucks, S., Wenzel, J.: Overview of the candidates for the password hashing competition - and their resistance against garbage-collector attacks. IACR Cryptol. ePrint Arch. 2014, 881 (2014)
Forler, C., Lucks, S., Wenzel, J.: Catena: A Memory-Consuming Password Scrambler. Cryptology ePrint Archive, Report 2013/525 (2013). http://eprint.iacr.org/
Forler, Christian, Lucks, Stefan, Wenzel, Jakob: Memory-demanding password scrambling. In: Sarkar, Palash, Iwata, Tetsu (eds.) ASIACRYPT 2014, Part II. LNCS, vol. 8874, pp. 289–305. Springer, Heidelberg (2014)
Forler, C., Lucks, S., Wenzel, J.: The Catena Password-Scrambling Framework. Password Hashing Competition, 2nd round submission (2015). https://password-hashing.net/submissions/specs/Catena-v3.pdf
funkysash. catena-variants (2015). https://github.com/medsec/catena-variants
Gray, F.: Pulse Code Communication. Bell Telephone Labor Inc., New York (1953). US Patent 2,632,058,
Gueron, S., Kounavis, M.E.: Intel carry-less multiplication instruction and its usage for computing the GCM Mode - Rev 2.01. Intel White Paper. Technical report, Intel corporation, September 2012
Harris, B.: Replacement index function for data-independent schemes (Catena) (2015). http://article.gmane.org/gmane.comp.security.phc/2457/match=grey
HPSchilling. catena-variants (2015). https://github.com/HPSchilling/catena-variants
Kaliski, B.: RFC 2898 - PKCS #5: Password-Based cryptography specification Version 2.0. Technical report, IETF (2000)
Lengauer, T., Tarjan, R.E.: Asymptotically tight bounds on time-space trade-offs in a pebble game. J. ACM 29(4), 1087–1130 (1982)
Lystad, T.A.: Leaked password lists and dictionaries - The Password Project. http://thepasswordproject.com/leaked_password_lists_and_dictionaries. Accessed 16 May 2013
McGrew, D.A., Viega, J.: The security and performance of the Galois/Counter Mode (GCM) of operation. In: Canteaut, A., Viswanathan, K. (eds.) INDOCRYPT 2004. LNCS, vol. 3348, pp. 343–355. Springer, Heidelberg (2004)
Percival, C.: Stronger Key Derivation via Sequential Memory-Hard Functions. presented at BSDCan 2009, May 2009
Peslyak, A.: yescrypt - a Password Hashing Competition submission (2015). https://password-hashing.net/submissions/specs/yescrypt-v1.pdf
Pornin, T.: The MAKWA Password Hashing Function (2015). https://password-hashing.net/submissions/specs/Makwa-v1.pdf
Provos, N., Mazières, D.: A future-adaptable password scheme. In: USENIX Annual Technical Conference, FREENIX Track, pp. 81–91. USENIX (1999)
Shand, M., Bertin, P., Vuillemin, J.: Hardware speedups in long integer multiplication. In: SPAA, pp. 138–145 (1990)
Simplicio, M., Almeida, L., dos Santos, P., Barreto, P.: The Lyra2 reference guide. Password Hashing Competition, 2nd round submission (2015). https://password-hashing.net/submissions/specs/Lyra2-v2.pdf
Soderquist, P., Leeser, M.: An area/performance comparison of subtractive and multiplicative divide/square root implementations. In: 12th Symposium on Computer Arithmetic ARITH-12 1995, July 19–21, 1995, Bath, England, UK, pp. 132–139 (1995)
Cox, B.: MultHash - A simple multiplication speed limited hash function (2014). https://github.com/medsec/catena/blob/3a3ce823d4c54f2da33757bf8f6389488c31bd93/src/catena-multhash.c. (waywardgeek)
Acknowledgement
We would like to thank S. Schmidt and H. Schilling for their work on the reference implementation of Catena as well as on the tool Catena-Variants, E. List for his helpful comments and fruitful discussions, and H. Schilling for his analysis of the underlying graph-based structures. Furthermore, we would like to thank the reviewers of the Passwords 2015 for their helpful comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Memory-Hardness and Garbage-Collector Attacks
1.1 A.1 Memory-Hardness
In this part of the paper we show the definition of memory-hardness as it was described in [15]. The intuition is that for any parallelized attack (using \(b \) cores) the required memory is decreased by a factor of \(1/b \) per core, and vice versa.
Definition 1
(Memory-Hard Function). Let g denote the memory cost factor. For all \(\alpha >0\), a memory-hard function f can be computed on a Random Access Machine using S(g) space and T(g) operations, where \(S(g) \in \varOmega (T(g)^{1-\alpha })\).
Thus, for \(S\cdot T=G^2\) with \(G = 2^g\), using \(b \) cores, we have
A formal generalization of this notion is given in the following definition.
Definition 2
(\(\lambda -\)Memory-Hard Function). Let g denote the memory cost factor. For a \(\lambda -\)memory-hard function f, which is computed on a Random Access Machine using S(g) space and T(g) operations with \(G = 2^g\), it holds that
Thus, we have
Since we also consider instantiations of Catena which fulfill the definition of sequential memory-hardness (see [25]), we highly recommend the reader to have a look at the discussion \(\lambda \)-Memory-Hard vs. Sequential Memory-Hard given in Sect. 2.2 of [15].
1.2 A.2 (Weak) Garbage-Collector Attacks
The motivation behind these attacks is to exploit the fact that a password scrambler may leave its password-dependent internal state in memory for a long time (during its invocation). An adversary then gains benefits if either the memory access to the internal state depends on the password (Garbage-Collector Attack) or if a value directly derived from the password (e.g., using a fast hash function) remains in memory (Weak Garbage-Collector Attack). Based on such attacks, an adversary is able to filter password candidates using significant less time/memory in comparison to the original algorithm.
Definition 3
(Garbage-Collector Attack). Let \({P}_{G}(\cdot ) \) be a memory-consuming password scrambler that depends on a memory-cost parameter G and let Q be a positive constant. Furthermore, let v denote the internal state of \({P}_{G}(\cdot ) \) after its termination. Let \(\mathcal {A}\) be a computationally unbounded but always-halting adversary conducting a garbage-collector attack. We say that \(\mathcal {A}\) is successful if some knowledge about v reduces the runtime of \(\mathcal {A}\) for testing a password candidate x from \(\mathcal {O}({P}_{G}(x))\) to \(\mathcal {O}(f(x))\) with \(\mathcal {O}(f(x)) \lll \mathcal {O}({P}_{G}(x))/Q, \forall x \in \{0,1\}^{*}\).
In the following we define the Weak Garbage-Collector (WGC) Attack.
Definition 4
(Weak Garbage-Collector Attack). Let \({P}_{G}(\cdot ) \) be a password scrambler that depends on a memory-cost parameter G, and let \(R(\cdot )\) be an underlying function of \({P}_{G}(\cdot ) \) that can be computed efficiently. We say that an adversary \(\mathcal {A}\) is successful in terms of a weak garbage-collector attack if a value \(y=R({\textit{pwd}})\) remains in memory during (almost) the entire runtime of \({P}_{G}({\textit{pwd}})\), where pwd denotes the secret input.
An adversary that is capable of reading the internal memory of a password scrambler during its invocation gains knowledge about y. Thus, it can reduce the effort for filtering invalid password candidates by just computing \(y' = R(x)\) and checking whether \(y = y'\), where x denotes the current password candidate. Note that the function R can also be the identity function. Then, the plain password remains in memory, rendering WGC attacks trivial.
B Hash-Function Instantiations
1.1 B.1 Compression Function of Argon2
Here we consider the underlying compression function of the PHC winner Argon2 [6]. The function CF (within the specification of Argon2, this function is called G, but since we use G already for the variants of the round function of BLAKE2b, we rename it here to CF) is built upon a permutation P using the BLAKE2b round function \(G_L\) as defined in Lyra2 and is formally defined in Algorithm 4. Note that the function \(G_L\), in comparison to the original one of BLAKE2b (\(G_O\)), does neither consider the message schedule nor handling of the message input. The input state words are directly written to the internal state (a, b, c, d) and then processed as shown in Algorithm 5 (left).
The function CF, using \(G_L\), takes two 8192-bit values X, Y as input and outputs a 8192-bit value Z. The value \(R = X \oplus Y\) (Line 1) is represented as an \(8\times 8\) matrix consisting of 64 128-bit values \(R_0,\ldots ,R_{63}\) ordered from top to down and left to right. First, all rows are updated using the permutation P generating an intermediate matrix Q (Lines 2–4). Then, all columns of Q are updated using P (Lines 5–7). The output of CF is then given by the XOR operation of the input matrix R and the intermediate matrix Q. Based on the large state of CF, it is possible to fill memory quite fast, e.g., about 1 GB memory is filled in less than 1 s [6]. For a formal definition of P we refer to the specification of Argon2 [6].
Even if the function CF requires two calls to the underlying round function of BLAKE2b to process 1024 bit, it has a significant speed-up in comparison to the way BLAKE2b-1 is used within Catena (see Table 2). The speed-up comes from the fact that CF processes 16384 bit per invocation whereas BLAKE2b-1 processes only 1024 bit. Thus, Catena requires significant more loading operations in comparison to Argon2 when processing inputs of equal size.
Note that we provide another instantiation of \(H '\) in Sect. 3 called \(\text {P}_{\text {compress}}\). This is a modification of the permutation P, where we split the output of P into two 512-bit halves and combine them using the XOR operation. This is a simple way to build a compression function out of the permutation P. We introduce \(\text {P}_{\text {compress}}\) for comparison with BLAKE2b-1, which includes the regular message schedule, the initialization, and finalization of the round function of BLAKE2b (as shown in Algorithm 6), whereas \(\text {P}_{\text {compress}}\) does not. Nevertheless, we do not consider it as a recommended instantiation of \(H '\) since, due to the smaller size of the state words, it will never achieve the same throughput as the function CF.
1.2 B.2 BlaMka
The function \(G_B\) was introduced by the authors of Lyra2 [30]. It described a slightly adjusted variant of \(G_L\) (see Algorithm 5 for a comparison) and is there used as the underlying permutation of a sponge-based structure. Due to the missing cryptanalysis of BlaMka, the authors do not recommend its usage. Nevertheless, since it consists of a neat combination of ARX (Addition, Rotation, XOR) and integer multiplication, we also consider it as an alternative to \(G_L\).
The only difference between the functions \(G_L\) (see Algorithm 5 (left)) and \(G_B\) (see Algorithm 5 (right)) is that each addition \(a+b\) is replaced by a variant of the latin-square operation, namely \(a+b+2\cdot lsw(a)\cdot lsw(b)\), where lsw(x) denotes the least significant 32-bit word of x. The idea behind the so called multiplication.-hardening is to provide a similar runtime for hardware- and software-based adversaries [29, 31].
Note that for comparison, we restate BLAKE2b-1 as used in the default instantiations of Catena (see Algorithm 6). It conducts \(G_O\) ensuring that 12 invocations of BLAKE2b-1 are quite similar to one invocation of full BLAKE2b.
1.3 B.3 Galois-Field Multiplication
We took the Galois-Field multiplication into account since it allows to be used as a really fast compression function, especially when considering binary Galois Fields with an order that is a power of 2, since its coefficients can be represented by bits and addition/subtraction is equivalent to the XOR operation. We decided to conduct the multiplication in \(GF(2^{128})\) using the reduction polynom \(f(x) = x^{128} + x^7 + x^2 + x + 1\) from the well-analyzed and widely used Galois/Counter Mode (GCM) [24]. We provide the optimized version of the right-to-left multiplication as suggested in [10] or, if available, a version that uses the pclmulqdq instruction described in [18]. Note that in contrast to the multiplication used within GCM, we do not use bit reflection.
1.4 B.4 MultHash
The function MultHash is an experimental hash-function design by Cox [32] with the goal to max out the memory bandwidth. It processes two \(\ell \)-bit inputs A and B by splitting them into \(\ell /32\)-bit chunks and then conducts \(\ell /32\) 32-bit multiplications of the form \(C_i = (C_{i-1} \cdot (A\ |\ 3) \cdot B)\,{\text {mod}}\,2^{32}\), where \(C_{i-1}\) denotes the former computed chaining value. Note that, especially in comparison to the Galois-Field multiplication presented before, the MultHash operation should be considered insecure. Nevertheless, due to its neat property of filling the internal state significantly fast, we include it in the comparison of possible instantiations of \(H '\) even if it will not be part of our recommended variants of Catena.
C Extensions of Catena
1.1 C.1 Password-Independent Random Layer
This part of the section is devoted to the function \(\varGamma \), which reflects a graph-based structure generated by a password-independent indexing function R as shown in Algorithm 7 (left). As defined in the specification of Catena, it overwrites \(2^{\lceil 3g/4 \rceil }\) randomly chosen state words. In respect of the memory requirement in Table 5, it is intuitive to say that the penalty for a \(\text {GR}\ell _{\lambda }^{g}\) will not increase significantly since almost the whole state is already stored. On the other hand, when considering a \(\text {BRG}_\lambda ^g\), one would expect an increased penalty.
First, we wanted to know if the currently designated position of \(\varGamma \) (invoked before \(F\), see Algorithm 2) does provide the best possible resistance (in comparison to other chosen positions) against the precomputation method discussed in the former section. Therefore, we integrated the function (layer) \(\varGamma \) into \(F\), fixed \(g = 12\) (for time reasons), and recomputed the penalties for different positions of \(\varGamma \) within \(F\), different values of \(\lambda \), and the graph instantiations presented in Sect. 4. Further, we set the public input \(\gamma \) to the salt and fixed it to a constant value (\(\gamma = 0\)). We see this as a valid approach since if R is instantiated with a reasonable strong pseudorandom number generator, all indices computed during the invocation of \(\varGamma \) are randomly distributed values. The penalties depending on the positions of \(\varGamma \) are shown in Table 8, where it is easy to see that placing the random layer \(\varGamma \) at the third-last position leads, in general, to the highest penalty.
Thus, we assumed this configuration when computing the impact of the random layer for the recommended values of the garlic, i.e., \(g \in \{18,21\}\). Our first intuition was confirmed by the results shown in Table 9. Note that due to time issues, the values for the precomputation method with \(\lambda = 3\) and \(g = 21\) are a rough estimation based on the values for \(\lambda = 3\) in Table 6.
By comparing the values presented in Tables 6 and 9, we observed that the penalties only clearly increase for a \(\text {BRG}_\lambda ^g\). Thus, we can conclude that the additional invocation of \(\varGamma \) does help substantially against TMTO attacks based on the precomputation method, since the penalties given for a \(\text {BRG}_\lambda ^g\) are still much smaller than that of a \(\text {GR}\ell _{\lambda }^{g}\). Nevertheless, the main aspect of including \(\varGamma \) in our core function \(flap\) was the increased resistance against ASIC-based adversaries, and not to thwart TMTO attacks. Furthermore, the penalties presented in Table 9 only hold if an adversary does not have the additional \(2^{3g/4} \cdot n\) bit of memory available.
1.2 C.2 Password-Dependent Random Layer
This extension is described by introducing a further graph-based layer below the call to \(F\) (see Algorithm 8, Line 7). In general, the function \(\varPhi \) is used to update the internal state v of size \(2^g\cdot n\) bit by sequentially updating each state word. An update of a state word \(v_i\) depends on two inputs (see Algorithm 7 (right)): First, the immediate predecessor \(v_{i-1}\) (or \(v_{2^g-1}\) if \(i = 0\)) and second, a value chosen uniformly at random from the state, where the index is determined by the indexing function R.
More detailed, the first update depends on the secret value \(\mu \), which we set per default to the value \(v_{2^g-1}\) of the output state of \(F\). All subsequent updates dependent on the previous computed state value \(v_{i-1}\) (see Line 4 of Algorithm 7 (right)). Thus, we basically follow a slightly more generic approach in comparison to the ROMix function used within scrypt [25]. The function R can, for example, be given by xorshift1024star as used in the default instantiations of Catena [15], or a similar function to the one proposed by the designers of Argon2 [6], which uses the g least significant bits of the previous computed value to determine the new index.
It is easy to see that \(\varPhi \) conducts sequential writes, whereas \(\varGamma \) conducts writes to randomly determined words of the state. Thus, by calling \(\varPhi \), Catena provides sequential memory-hardness as defined in [25]. Note that if the extension \(\varPhi \) is added to \(flap\), the function \(F\) must return the full state v (see Line 6 of Algorithm 8) instead of a single value.
D Penalties Caused by Shifting Sampling Points
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Lucks, S., Wenzel, J. (2016). Catena Variants. In: Stajano, F., Mjølsnes, S.F., Jenkinson, G., Thorsheim, P. (eds) Technology and Practice of Passwords. PASSWORDS 2015. Lecture Notes in Computer Science(), vol 9551. Springer, Cham. https://doi.org/10.1007/978-3-319-29938-9_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-29938-9_7
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-29937-2
Online ISBN: 978-3-319-29938-9
eBook Packages: Computer ScienceComputer Science (R0)