Abstract
In this article, we present WARP, a lightweight 128-bit block cipher with a 128-bit key. It aims at small-footprint circuit in the field of 128-bit block ciphers, possibly for a unified encryption and decryption functionality. The overall structure of WARP is a variant of 32-nibble Type-2 Generalized Feistel Network (GFN), with a permutation over nibbles designed to optimize the security and efficiency. We conduct a thorough security analysis and report comprehensive hardware and software implementation results. Our hardware results show that WARP is the smallest 128-bit block cipher for most of typical hardware implementation strategies. A serialized circuit of WARP achieves around 800 Gate Equivalents (GEs), which is much smaller than previous state-of-the-art implementations of lightweight 128-bit ciphers (they need more than 1, 000 GEs). While our primary metric is hardware size, WARP also enjoys several other features, most notably low energy consumption. This is somewhat surprising, since GFN generally needs more rounds than substitution permutation network (SPN), and thus GFN has been considered to be less advantageous in this regard. We show a multi-round implementation of WARP is quite low-energy. Moreover, WARP also performs well on software: our SIMD implementation is quite competitive to known hardware-oriented 128-bit lightweight ciphers for long input, and even much better for small inputs due to the small number of parallel blocks. On 8-bit microcontrollers, the results of our assembly implementations show that WARP is flexible to achieve various performance characteristics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Alternatively, we could use beyond-birthday-bound (BBB) secure modes, however they are generally more complex than the birthday-secure ones, and using complex modes may nullify the merit of using lightweight primitive.
- 2.
- 3.
- 4.
The name comes from the resemblance of the cipher structure to strings in a loom.
References
Andreeva, E., et al.: COLM v1. a CAESAR portfolio (2016)
Avanzi, R.: The QARMA block cipher family. IACR Trans. Symmetric Cryptol. 2017(1), 4–44 (2017)
Banik, S., et al.: Midori: a block cipher for low energy. In: Iwata, T., Cheon, J.H. (eds.) ASIACRYPT 2015. Part II. LNCS, vol. 9453, pp. 411–436. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48800-3_17
Banik, S., Bogdanov, A., Luykx, A., Tischhauser, E.: SUNDAE: small universal deterministic authenticated encryption for the internet of things. IACR Trans. Symmetric Cryptol. 2018(3), 1–35 (2018)
Banik, S., et al.: SUNDAE-GIFT. A Submission to NIST Lightweight Cryptography Project (2019)
Banik, S., Bogdanov, A., Regazzoni, F.: Exploring energy efficiency of lightweight block ciphers. In: Dunkelman, O., Keliher, L. (eds.) SAC 2015. LNCS, vol. 9566, pp. 178–194. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31301-6_10
Banik, S., Bogdanov, A., Regazzoni, F.: Atomic-AES: a compact implementation of the AES encryption/decryption core. In: Dunkelman, O., Sanadhya, S.K. (eds.) INDOCRYPT 2016. LNCS, vol. 10095, pp. 173–190. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49890-4_10
Banik, S., et al.: GIFT-COFB. A Submission to NIST Lightweight Cryptography Project (2019)
Banik, S., et al.: Towards low energy stream ciphers. IACR Trans. Symmetric Cryptol. 2018(2), 1–19 (2018)
Banik, S., Pandey, S.K., Peyrin, T., Sasaki, Y., Sim, S.M., Todo, Y.: GIFT: a small present. In: Fischer, W., Homma, N. (eds.) CHES 2017. LNCS, vol. 10529, pp. 321–345. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66787-4_16
Beaulieu, R., Shors, D., Smith, J., Treatman-Clark, S., Weeks, B., Wingers, L.: The SIMON and SPECK families of lightweight block ciphers. Cryptology ePrint Archive, report 2013/404 (2013). http://eprint.iacr.org/2013/404
Beierle, C., Canteaut, A., Leander, G., Rotella, Y.: Proving resistance against invariant attacks: how to choose the round constants. In: Katz, J., Shacham, H. (eds.) CRYPTO 2017. LNCS, vol. 10402, pp. 647–678. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63715-0_22
Beierle, C., et al.: The SKINNY family of block ciphers and its low-latency variant MANTIS. In: Robshaw, M., Katz, J. (eds.) CRYPTO 2016. Part II. LNCS, vol. 9815, pp. 123–153. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-53008-5_5
Beierle, C., et al.: The SKINNY family of block ciphers and its low-latency variant MANTIS. Cryptology ePrint Archive, report 2016/660 (2016). http://eprint.iacr.org/2016/660
Beierle, C., Leander, G., Moradi, A., Rasoolzadeh, S.: CRAFT: lightweight tweakable block cipher with efficient protection against DFA attacks. IACR Trans. Symmetric Cryptol. 2019(1), 5–45 (2019)
Benadjila, R., Guo, J., Lomné, V., Peyrin, T.: Implementing lightweight block ciphers on x86 architectures. In: Lange, T., Lauter, K., Lisoněk, P. (eds.) SAC 2013. LNCS, vol. 8282, pp. 324–351. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43414-7_17
Berger, T.P., Francq, J., Minier, M., Thomas, G.: Extended generalized Feistel networks using matrix representation to propose a new lightweight block cipher: Lilliput. IEEE Trans. Comput. 65(7), 2074–2089 (2016)
Berger, T.P., Minier, M., Thomas, G.: Extended generalized Feistel networks using matrix representation. In: Lange, T., Lauter, K., Lisoněk, P. (eds.) SAC 2013. LNCS, vol. 8282, pp. 289–305. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43414-7_15
Bhargavan, K., Leurent, G.: On the practical (in-)security of 64-bit block ciphers: collision attacks on HTTP over TLS and OpenVPN. In: Weippl, E.R., Katzenbeisser, S., Kruegel, C., Myers, A.C., Halevi, S. (eds.) ACM CCS 2016, pp. 456–467. ACM Press, October 2016
Biham, E., Biryukov, A., Shamir, A.: Cryptanalysis of Skipjack reduced to 31 rounds using impossible differentials. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 12–23. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48910-X_2
Biham, E., Shamir, A.: Differential cryptanalysis of the full 16-round DES. In: Brickell, E.F. (ed.) CRYPTO 1992. LNCS, vol. 740, pp. 487–496. Springer, Heidelberg (1993). https://doi.org/10.1007/3-540-48071-4_34
Bilgin, B., Nikova, S., Nikov, V., Rijmen, V., Stütz, G.: Threshold implementations of all 3 \(\times \) 3 and 4 \(\times \) 4 S-boxes. In: Prouff, E., Schaumont, P. (eds.) CHES 2012. LNCS, vol. 7428, pp. 76–91. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33027-8_5
Biryukov, A., Derbez, P., Perrin, L.: Differential analysis and meet-in-the-middle attack against round-reduced TWINE. In: Leander, G. (ed.) FSE 2015. LNCS, vol. 9054, pp. 3–27. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48116-5_1
Biryukov, A., Nikolić, I.: Complementing Feistel ciphers. In: Moriai, S. (ed.) FSE 2013. LNCS, vol. 8424, pp. 3–18. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43933-3_1
Bogdanov, A., et al.: PRESENT: an ultra-lightweight block cipher. In: Paillier, P., Verbauwhede, I. (eds.) CHES 2007. LNCS, vol. 4727, pp. 450–466. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74735-2_31
Borghoff, J., et al.: PRINCE – a low-latency block cipher for pervasive computing applications. In: Wang, X., Sako, K. (eds.) ASIACRYPT 2012. LNCS, vol. 7658, pp. 208–225. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34961-4_14
Boura, C., Naya-Plasencia, M., Suder, V.: Scrutinizing and improving impossible differential attacks: applications to CLEFIA, Camellia, LBlock and Simon. In: Sarkar, P., Iwata, T. (eds.) ASIACRYPT 2014. LNCS, vol. 8873, pp. 179–199. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45611-8_10
Cauchois, V., Gomez, C., Thomas, G.: General diffusion analysis: how to find optimal permutations for generalized type-II Feistel schemes. IACR Trans. Symmetric Cryptol. 2019(1), 264–301 (2019)
Chakraborti, A., Iwata, T., Minematsu, K., Nandi, M.: Blockcipher-based authenticated encryption: how small can we go? In: Fischer, W., Homma, N. (eds.) CHES 2017. LNCS, vol. 10529, pp. 277–298. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66787-4_14
Daemen, J., Knudsen, L., Rijmen, V.: The block cipher Square. In: Biham, E. (ed.) FSE 1997. LNCS, vol. 1267, pp. 149–165. Springer, Heidelberg (1997). https://doi.org/10.1007/BFb0052343
Daemen, J., Peeters, M., Van Assche, G., Rijmen, V.: Nessie proposal: NOEKEON (2000). http://gro.noekeon.org/Noekeon-spec.pdf
De Cannière, C., Dunkelman, O., Knežević, M.: KATAN and KTANTAN—a family of small and efficient hardware-oriented block ciphers. In: Clavier, C., Gaj, K. (eds.) CHES 2009. LNCS, vol. 5747, pp. 272–288. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04138-9_20
Derbez, P.: Note on impossible differential attacks. In: Peyrin, T. (ed.) FSE 2016. LNCS, vol. 9783, pp. 416–427. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-52993-5_21
Derbez, P., Fouque, P.-A., Lambin, B., Mollimard, V.: Efficient search for optimal diffusion layers of generalized Feistel networks. IACR Trans. Symmetric Cryptol. 2019(1), 218–240 (2019)
Dinu, D., Le Corre, Y., Khovratovich, D., Perrin, L., Großschädl, J., Biryukov, A.: Triathlon of lightweight block ciphers for the internet of things. J. Cryptogr. Eng. 9(3), 283–302 (2019). https://doi.org/10.1007/s13389-018-0193-x
Grosso, V., Leurent, G., Standaert, F.-X., Varıcı, K.: LS-designs: bitslice encryption for efficient masked software implementations. In: Cid, C., Rechberger, C. (eds.) FSE 2014. LNCS, vol. 8540, pp. 18–37. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46706-0_2
Guo, J., Peyrin, T., Poschmann, A., Robshaw, M.: The LED block cipher. In: Preneel, B., Takagi, T. (eds.) CHES 2011. LNCS, vol. 6917, pp. 326–341. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23951-9_22
Gupta, K.C., Pandey, S.K., Venkateswarlu, A.: Almost involutory recursive MDS diffusion layers. Des. Codes Cryptogr. 87(2), 609–626 (2018). https://doi.org/10.1007/s10623-018-0582-2
Hong, D., et al.: HIGHT: a new block cipher suitable for low-resource device. In: Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS, vol. 4249, pp. 46–59. Springer, Heidelberg (2006). https://doi.org/10.1007/11894063_4
Standard for cryptographic protection of data on block-oriented storage devices
Gurobi Optimization Inc.: Gurobi optimizer 6.5 (2015). http://www.gurobi.com/
Iwata, T., Minematsu, K., Guo, J., Morioka, S.: CLOC: authenticated encryption for short input. In: Cid, C., Rechberger, C. (eds.) FSE 2014. LNCS, vol. 8540, pp. 149–167. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46706-0_8
Jean, J., Moradi, A., Peyrin, T., Sasdrich, P.: Bit-sliding: a generic technique for bit-serial implementations of SPN-based primitives. In: Fischer, W., Homma, N. (eds.) CHES 2017. LNCS, vol. 10529, pp. 687–707. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66787-4_33
Knudsen, L., Leander, G., Poschmann, A., Robshaw, M.J.B.: PRINTcipher: a block cipher for IC-printing. In: Mangard, S., Standaert, F.-X. (eds.) CHES 2010. LNCS, vol. 6225, pp. 16–32. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15031-9_2
Knudsen, L., Wagner, D.: Integral cryptanalysis. In: Daemen, J., Rijmen, V. (eds.) FSE 2002. LNCS, vol. 2365, pp. 112–127. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45661-9_9
Kölbl, S.: AVX implementation of the Skinny block cipher (2019). https://github.com/kste/skinny_avx
Krovetz, T., Rogaway, P.: The software performance of authenticated-encryption modes. In: Joux, A. (ed.) FSE 2011. LNCS, vol. 6733, pp. 306–327. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21702-9_18
Matsui, M.: Linear cryptanalysis method for DES cipher. In: Helleseth, T. (ed.) EUROCRYPT 1993. LNCS, vol. 765, pp. 386–397. Springer, Heidelberg (1994). https://doi.org/10.1007/3-540-48285-7_33
Mikhalev, V., Armknecht, F., Müller, C.: On ciphers that continuously access the non-volatile key. IACR Trans. Symmetric Cryptol. 2016(2), 52–79 (2016). http://tosc.iacr.org/index.php/ToSC/article/view/565
Moradi, A., Poschmann, A., Ling, S., Paar, C., Wang, H.: Pushing the limits: a very compact and a threshold implementation of AES. In: Paterson, K.G. (ed.) EUROCRYPT 2011. LNCS, vol. 6632, pp. 69–88. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20465-4_6
Mouha, N., Wang, Q., Gu, D., Preneel, B.: Differential and linear cryptanalysis using mixed-integer linear programming. In: Wu, C.-K., Yung, M., Lin, D. (eds.) Inscrypt 2011. LNCS, vol. 7537, pp. 57–76. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34704-7_5
Naito, Y., Matsui, M., Sugawara, T., Suzuki, D.: SAEB: a lightweight blockcipher-based AEAD mode of operation. IACR TCHES 2018(2), 192–217 (2018). https://tches.iacr.org/index.php/TCHES/article/view/885
Nyberg, K.: Generalized Feistel networks. In: Kim, K., Matsumoto, T. (eds.) ASIACRYPT 1996. LNCS, vol. 1163, pp. 91–104. Springer, Heidelberg (1996). https://doi.org/10.1007/BFb0034838
Poschmann, A., Moradi, A., Khoo, K., Lim, C.-W., Wang, H., Ling, S.: Side-channel resistant crypto for less than 2,300 GE. J. Cryptol. 24(2), 322–345 (2011). https://doi.org/10.1007/s00145-010-9086-6
Sasaki, Yu., Aoki, K.: Finding preimages in full MD5 faster than exhaustive search. In: Joux, A. (ed.) EUROCRYPT 2009. LNCS, vol. 5479, pp. 134–152. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01001-9_8
Sasaki, Y., Todo, Y.: New impossible differential search tool from design and cryptanalysis aspects. In: Coron, J.-S., Nielsen, J.B. (eds.) EUROCRYPT 2017. Part III. LNCS, vol. 10212, pp. 185–215. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56617-7_7
Shibutani, K., Isobe, T., Hiwatari, H., Mitsuda, A., Akishita, T., Shirai, T.: Piccolo: an ultra-lightweight blockcipher. In: Preneel, B., Takagi, T. (eds.) CHES 2011. LNCS, vol. 6917, pp. 342–357. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23951-9_23
Shirai, T., Shibutani, K., Akishita, T., Moriai, S., Iwata, T.: The 128-bit blockcipher CLEFIA (extended abstract). In: Biryukov, A. (ed.) FSE 2007. LNCS, vol. 4593, pp. 181–195. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74619-5_12
Sun, S., Hu, L., Wang, P., Qiao, K., Ma, X., Song, L.: Automatic security evaluation and (related-key) differential characteristic search: application to SIMON, PRESENT, LBlock, DES(L) and other bit-oriented block ciphers. In: Sarkar, P., Iwata, T. (eds.) ASIACRYPT 2014. Part I. LNCS, vol. 8873, pp. 158–178. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45611-8_9
Suzaki, T., Minematsu, K.: Improving the generalized Feistel. In: Hong, S., Iwata, T. (eds.) FSE 2010. LNCS, vol. 6147, pp. 19–39. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13858-4_2
Suzaki, T., Minematsu, K., Morioka, S., Kobayashi, E.: TWINE: a lightweight block cipher for multiple platforms. In: Knudsen, L.R., Wu, H. (eds.) SAC 2012. LNCS, vol. 7707, pp. 339–354. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35999-6_22
Todo, Y.: Structural evaluation by generalized integral property. In: Oswald, E., Fischlin, M. (eds.) EUROCRYPT 2015. Part I. LNCS, vol. 9056, pp. 287–314. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46800-5_12
Wingers, L.: SUPERCOP: SUPERCOP-20190110/crypto\_stream/simon128128ctr/avx2 (2019). https://bench.cr.yp.to/supercop/supercop-20190110.tar.xz
Wu, W., Zhang, L.: LBlock: a lightweight block cipher. In: Lopez, J., Tsudik, G. (eds.) ACNS 2011. LNCS, vol. 6715, pp. 327–344. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21554-4_19
Xiang, Z., Zhang, W., Bao, Z., Lin, D.: Applying MILP method to searching integral distinguishers based on division property for 6 lightweight block ciphers. In: Cheon, J.H., Takagi, T. (eds.) ASIACRYPT 2016. Part I. LNCS, vol. 10031, pp. 648–678. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-53887-6_24
Zheng, Y., Matsumoto, T., Imai, H.: On the construction of block ciphers provably secure and not relying on any unproved hypotheses. In: Brassard, G. (ed.) CRYPTO 1989. LNCS, vol. 435, pp. 461–480. Springer, New York (1990). https://doi.org/10.1007/0-387-34805-0_42
Acknowledgement
Subhadeep Banik is supported by the Swiss National Science Foundation (SNSF) through the Ambizione Grant PZ00P2_179921. Zhenzhen Bao is partially supported by Nanyang Technological University in Singapore under Grant 04INS000397C230, and Singapore’s Ministry of Education under Grants RG18/19 and MOE2019-T2-1-060. Takanori Isobe is supported by Grant-in-Aid for Scientific Research (B) (KAKENHI 19H02141) for Japan Society for the Promotion of Science, Support Center for Advanced Telecommunications Technology Research (SCAT), and SECOM science and technology foundation. Kosei Sakamoto is supported by Grant-in-Aid for JSPS Fellows (KAKENHI 20J23526) for Japan Society for the Promotion of Science.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Appendices
A Test Vector
Table 6 shows the test vectors of WARP.
B Security Evaluation
In this section, we provide the security evaluations of WARP against differential, linear, integral, impossible differential, invariant and meet-in-the-middle attacks.
1.1 B.1 Differential/Linear Attack
Differential cryptanalsis [21] and linear cryptanalsis [48] are among the most powerful techniques available for block ciphers. To evaluate the security against differential and linear attacks, we compute the lower bound for the number of differentially and linearly active S-boxes with a MILP-aided automatic search method, which was proposed by Mouha et al. [51]. We use Gurobi [41] as the solver and search for all nibble-wise truncated differential and linear characteristics.
Table 7 shows the minimum number of differentially and linearly active S-boxes for up to 19 rounds in the single-key setting, where \(AS_D\) and \(AS_L\) denote the number of differentially and linearly active S-boxes, respectively. It can be observed from Table 7 that WARP has more than 64 active S-boxes after 19 rounds. Since the maximum differential probability and absolute linear bias of the S-box of WARP are both \(2^{-2}\) and the nibble-wise full diffusion requires 10 rounds, even with a 19-round differential distinguisher, we expect that an effective key-recovery attack cannot reach up to \(19+12=31\) rounds. In a word, the full-round WARP is secure against differential and linear attacks.
1.2 B.2 Impossible Differential Attack
Generally, an impossible differential attack [20] is one of the most powerful attacks against Feistel-type ciphers. The impossible differential attack exploits a pair of input-output difference denoted by \(\varDelta _{in}\) and \(\varDelta _{out}\) such that \(\varDelta _{in}\) will never reach \(\varDelta _{out}\) after several rounds.
As mentioned in Sect. 3, WARP achieves the full diffusion after 10 rounds at both the encryption and decryption sides in nibble-wise. Based on a more detailed investigation, we found that the full diffusion requires 12 rounds at both the encryption and decryption sides in bit-wise. Hence, there should be no probability-1 bit-wise impossible differential over 24 rounds.
In order to obtain the longest impossible differential distinguisher, we utilize an impossible differential search tool based on MILP designed by Sasaki and Todo [56]. Specifically, we evaluate the search space such that the plaintext difference and ciphertext difference activate only one bit, respectively. To model the propagation of differences through the 4-bit S-box, we take into account the differential distribution table for the 4-bit S-box. Based on the method as proposed in [59], it can be modeled with the linear inequalities.
As a result, we find the following 21-round impossible differential distinguisher.
According to Boura et al.’s work [27] in ASIACRYPT 2014 and the corresponding interpretation [33] by Derbez in FSE 2016, when extending the 21-round impossible differential distinguisher for 10 rounds, the required time complexity of the key-recovery attack is almost close to a pure exhaustive key search. Therefore, we do not expect an effective key-recovery attack on up to 32 rounds of WARP by using this 21-round impossible differential distinguisher. In a word, we expect that the full-round WARP is secure against the impossible differential attack.
1.3 B.3 Integral Attack
The integral attack was first proposed by Daemen et al. [30] and it was later formalized to the integral property by Knudsen and Wagner [45]. We define the four states for a set of \(2^n\) n-bit cell: A: if \(\forall i, j\) \(i \ne j \Leftrightarrow x_i \ne x_j\), C: if \(\forall i, j\) \(i \ne j \Leftrightarrow x_i = x_j\), B: \(\bigoplus ^{2^n -1}_i x_i \), and U: Other. The integral attack was further generalized to the division property by Todo [62], which can exploit the hidden feature between A and B states.
To evaluate the nibble-based division property, we use a MILP-aided automatic search method proposed by Xiang et al. [65], which enables us to efficiently explore the propagation of the division property. Specifically, we evaluate all the cases where 1, 2, 3 nibbles out of 32 nibbles are C and the others are all \(\mathbf{A}\) in plaintexts. Thus, we need to evaluate \(2^{23.2} \left( = \left( {\begin{array}{c}32\\ 1\end{array}}\right) + \left( {\begin{array}{c}32\\ 2\end{array}}\right) + \left( {\begin{array}{c}32\\ 3\end{array}}\right) \right) \) nibble-wise patterns.
In this way, we find the following 20-round integral distinguisher.
However, due to the high data complexity of this integral distinguisher, one can extend only 1 round to achieve a key-recovery attack and its time complexity is almost close to an exhaustive key search. One may find an integral distinguisher by using a lower data complexity, but the number of rounds will be reduced. Considering that the nibble-wise full diffusion requires 10 rounds and that a common integral distinguisher covering larger rounds always requires a higher data complexity, we expect that the full-round WARP is secure against integral attacks.
1.4 B.4 Meet-in-the Middle Attack
We evaluate the security against the meet-in-the-middle attack following the method appeared in the self-evaluation of MIDORI [3] and CRAFT [15]. The 10-round full diffusion property guarantees that any inserted key-bit non-linearly affects all branches after the 10 rounds in the forward and the backward directions, respectively. Thus, the possible number of rounds used for the partial matching (PM) [55] is estimated as 19 \(( = (10 - 1) + (10 - 1) + 1)\). The condition for the initial structure (IS) [55] is that key differential trails in the forward direction and those in the backward direction do not share active non-linear components. For WARP, since any key differential affects all 16 S-boxes after at least 10 rounds in the forward and the backward directions, there is no such differential which shares active S-box in more than 10 rounds. Thus, the number of rounds used for IS is upper bounded by 9. Assuming that the splice-and-cut technique allows an attacker to add more 3 rounds in the worst case, at most 32-round (19 + 10 + 3) MitM attack may be feasible. However, because of the iterated key insertions of \(K_0\) and \(K_1\) for every two round, we consider that it is difficult to mount a 32-round attack on WARP.
1.5 B.5 Invariant Subspace Attack
We use LFSR-based round constants in each round. Following the notions presented in [12], we first tried to find the smallest L-invariant subspace that contains all roundkey differences. Here, L denotes the transformation that describes the linear layer in WARP. Since WARP adds two key halves in an alternating fashion, the master keys repeat every other round, the set of roundkey differences in the even/odd rounds are given by
where is the 128-bit vector defined as \((0^4\,\Vert \,RC_0^r\,\Vert \,0^4\,\Vert \,RC_1^r\,\Vert \,0^{112})\). Denoting \(D=D_{even}\cup D_{odd}\), we try to find \(W_L(D)\), which denotes the smallest L-invariant subspace containing D. We found that \(W_L(D)\) is a subspace of dimension 124, which does not automatically guarantee resistance to subspace attacks. As the invariant attack applies only if there is a non-trivial invariant g for the S-box layer such that \(W_L (D) \subset LS(g)\), where LS(g) is the subspace of all linear structures of the function g. We ran Algorithm 1 of [12] on \(Z=W_L(D)\) first to see if S(Z) hits all the cosets of Z. Experimentally we found that it does indeed, leading us to conclude that g is the constant function which guarantees security against subspace attacks.
C32-Branch Permutations with 9-Round Full Diffusion [34]
Table 8 shows four equivalent classes of 32-branch permutations of \(\pi '_{0}(x)\), \(\pi '_{1}(x)\), \(\pi '_{2}(x)\), and \(\pi '_{3}(x)\) achieving 9-round full diffusion found by Derbez et al. [34].
Table 9 is a comparison of lower bounds on the number of active S-boxes for WARP and four permutations by our MILP-based Active S-boxes counting. As shown in Table 9, the number of active S-boxes of \(\pi '_{0}(x)\), \(\pi '_{1}(x)\), \(\pi '_{2}(x)\), and \(\pi '_{3}(x)\) grows much slower than WARP. Specifically, \(\pi '_{0}(x)\), \(\pi '_{1}(x)\) and \(\pi '_{2}(x)\), \(\pi '_{3}(x)\) require at least 32 and 48 rounds for achieving AS-box of \({\ge } 64\), respectively.
D More Details About Hardware Implementations
1.1 D.1 Bit Serial Architecture
The nibble serial architecture can be converted to a bit serial architecture, with some simple circuit-level transformations. The first is explained in Fig. 8. Any nibblewise scan flip-flop can be serialized as shown in Fig. 8, so that only one scan flip-flop per nibble is utilized. Whereas in the nibble serial architecture, the circuit can transfer one of the 2 nibble signals in one clock cycle, the same can be done over 4 cycles in the bit-serial architecture. Thus the bit-serial circuit can perform the same set of round operations in \(48\times 4=192\) cycles, in 3 sets of 64 cycle operations as in the nibble serial circuit. We save \(18\times 3=54\) scan flip-flops in the bit-serial architecture, and also 4-bit xor gates and multiplexers can be replaced with corresponding single bit gates.
1.2 D.2 Unified Circuit for Encryption and Decryption
Implementing the functionalities of encryption and decryption (ED) on the same circuit can be beneficial in some instances. Various modes of operations like CBC, XTS, OCB and COLM [1], that use block ciphers as the underlying primitive, require access to both its encryption and decryption functionalities. Thus it is useful to have an implementation that achieves both functionalities of a block cipher with minimal overhead. There are several features in the Feistel network structure, that make it easier to construct the ED architecture. Some of them are as follows:
-
1.
SPN structures generally require involutive S-boxes to ensure efficient ED implementation [3, 15]. If not, they require the circuits for the forward and inverse S-box to be implemented together, which increases area [7]. However the inverse round function in Feistel networks can be described with the forward S-box only. This gives us more freedom to search for S-boxes with lightweight characteristics.
-
2.
Some SPN block ciphers, e.g., [15] require the decryption key to be equal to \(L\cdot K\), where L denotes the matrix that forms the linear layer of the cipher. Thus additional circuit for matrix multiplication is required. Also a multiplexer is required to filter these keys for encryption/decryption. In the architecture for WARP, this is not necessary.
-
3.
Let \(F_K\) denote the function that performs S-box function and the roundkey addition on the left branch. Then by slight abuse of notation we can write the round function as
$$\begin{aligned} Y_a= P(F_K(X_a))\oplus (X_b\lll 6), Y_b=X_a \end{aligned}$$(1)where P is the function that performs the nibblewise shuffle of the left branch by moving the i-th nibble to \(\uppi [i]\). Then it is easy to see that the inverse round function is \(X_a=Y_b, X_b=( P(F_K(Y_b))\oplus (Y_a))\ggg 6\). However we do omit the left-right swap in the last round, and as a result, decryption can be computed by iterating the following round function 40 times, followed by a “swapless” final round:
$$\begin{aligned} X_a=( P(F_K(Y_a))\oplus (Y_b))\ggg 6, X_b=Y_a. \end{aligned}$$(2)Equations (1) and (2) are similar except that in encryption, the right branch is left rotated by 6 nibbles before addition, whereas in decryption, rotation done is after xoring left and right branches, this time by 6 nibbles towards the right. Since WARP uses each half of the master key in alternate rounds for key addition, it has been designed to have odd number of rounds. This means the first and last round encryption keys are the same, which implies that the encryption and decryption uses the left and right halves of the key in the same order.
Thus the only real overhead in the ED circuit for WARP is to accommodate left and right rotation by six nibbles in different times of the decryption cycle, and arrange for round constants to be generated in the reverse direction, which only requires some strategically placed multiplexers to accommodate the timing of these operations in the decryption cycle.
In essence, one approach would be to not rotate the right branch during cycles 0 to 15, and do rotation only during cycles 31 to 47 when the xoring of right and left branches has been completed. However, this approach is slightly problematic to adopt, as cycles 31 to 47 are used to not only swap left and right branches but also to apply \(\uppi ^{-1}\) to the left branch as it is being moved to bottom rows of flip-flops. In such a situation the bottom rows cannot accommodate two different types of permutation operations at the same time.
As a result, we need to exercise some fine-grained control over the ED circuit. For decryption, we rotate the right branch by 10 nibbles left in cycles 0–15 (same as 6 nibbles right rotation), although this rotation is not required as per Eq. (2). Thus to maintain functionality, in cycles 16–31, we drive nibbles out through register 11 to do xor between left and right branches (this was done through flip-flop 17 during encryption). After xor, the incoming nibbles are driven in through flip-flop 12 (this was done through 00 during encryption). This method has the added advantage that after round 31, the flip-flops \(17-00\) already contain \(( P(F_K(Y_a))\oplus (Y_b))\ggg 6\). This allows us to have the decryption operations in cycles 32–47 exactly the same in encryption.
For completeness, we discuss two more issues. First, for decryption we choose, cycles 0 and 1 for round constant addition, as this operation has to precede the non-linear operations. To rotate left by 10 nibbles, we need to freeze rotation for 6 cycles. Like in encryption, we divide the bottom row into groups of 6, 6, 2, 2 flip-flops and do internal rotation in these for 6 cycles. To accommodate this operation we need to replace 3 normal flip-flop nibbles with scan flip-flop nibbles.
1.3 D.3 Threshold Implementations
The s-box belongs to the cubic class \(\mathcal {C}_{266}\) as per the classification in [22] and as such it can be decomposed into 2 quadratic s-boxes \(F~\circ ~ G\), where
Since a minimum of \(d+1\) shares are required to implement the 1st order threshold implementation (TI) of a degree d s-box, we can thus implement a 3 share 1st-order TI in the manner shown in [54]. The idea is to implement the TI of G and F separated by a register bank in between, which suppresses the glitches produced by the TI of G.
Since the s-box has degree 3, a straightforward 4-share TI can also be implemented using a direct sharing approach. With regards to circuit architecture, the 4 share versions would only consist of 4 copies of the unshared circuit combined through a shared s-box. The 3-share circuit is slightly complicated owing to the fact that the shared G, F functions have to be executed one after the other. We implemented it in the manner shown in Fig. 9. In the unprotected circuit described earlier, the S-box layer is computed in cycles 16 to 31. And so in the shared circuit, we implement the shared function G in cycles 15 to 30 and the shared function F in cycles 16 to 31. However this creates another problem, as the entire left branch is overwritten by the output of the shared G layer before being fed back into register 20. Since the current left branch is required to serve the role of the right branch in the subsequent round, we need to invert the G layer before we proceed to the next round. This is done by implementing a shared implementation of the quadratic s-box \(G^{-1}\) between registers 20 and 21, which is operated from cycles 17 to 32.
Table 10 shows the performance results for the 3-share and 4-share implementations of WARP. The smallest 3-share implementation stands at 1964 GE which is smaller than known implementations of SKINNY and PRESENT (although these are computed at different level of serialization).
E More Details of Software Implementations
1.1 E.1 Details of Software Implementations on 8-Bit AVR
Our implementations of WARP on 8-bit AVR are in assembly and compiled using AVR macro assembler 2.2.7 in Atmel Studio 7.0. All implementations use the LBlock-type structure (see Fig. 4). Detailed implementation choices are: 1) One-round or two-round unrolling: combining two rounds save the shuffle operation between left and right branches, trading ROM for CPU cycles; 2) One-S-box or two-S-box: combining two S-boxes can reduce times of memory accesses, trading ROM or RAM for CPU cycles; 3) Storing the LUT of the S-box in RAM or ROM. Table 11 presents the results and comparisons.
The target device is ATmega128; The scenario is encryption/decryption of 128 bytes of data in CBC mode [35]. For ROM, that consumed by the main function for initializing data and calling the enc/dec functions are subtracted. For RAM, that required for storing the data to be processed, the master key, and the initialization vector are subtracted. WARP [1R] is for the one-round-based implementation storing the LUT of two-S-box in ROM. WARP [2R] is for the two-round-unrolled implementation storing the LUT of two-S-box in RAM. Results of other ciphers are from https://www.cryptolux.org/index.php/FELICS_Block_Ciphers_Brief_Results.
1.2 E.2 Details of Software Implementations on x64 CPUs
Similar to TWINE [61], WARP can be transformed into an equivalent form, in which only half branches go through a nibble shuffle per round. Accordingly, our SIMD implementations use this equivalent form. Besides, to take advantage of the pipelined execution unit on modern CPUs, we provide additional options on parallelism besides the double-block using 256-bit registers – compute each atomic step on quadruple/octuple data blocks. Our benchmark results of WARP, together with that of SIMON and SKINNY, are reported in Fig. 10.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Banik, S. et al. (2021). WARP : Revisiting GFN for Lightweight 128-Bit Block Cipher. In: Dunkelman, O., Jacobson, Jr., M.J., O'Flynn, C. (eds) Selected Areas in Cryptography. SAC 2020. Lecture Notes in Computer Science(), vol 12804. Springer, Cham. https://doi.org/10.1007/978-3-030-81652-0_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-81652-0_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-81651-3
Online ISBN: 978-3-030-81652-0
eBook Packages: Computer ScienceComputer Science (R0)