ABSTRACT
Recent processor architectures such as Intel Westmere (and later) and ARMv8 include instruction-level support for the Advanced Encryption Standard (AES), for the Secure Hashing Standard (SHA-1, SHA2) and for carry-less multiplication. These crypto-instruction sets provide specialized hardware processing at the top of the memory hierarchy, and provide significant performance improvements over general-purpose software for common cryptographic operations. We propose a crypto-instruction set for the Keccak cryptographic sponge and for the Keccak duplex construction. Our design is integrated on a 128 bit SIMD interface, applicable to the ARM NEON and Intel AVX (128 bit) architecture. The proposed instruction set is optimized for flexibility and supports multiple variants of the Keccak-f[b] permutation, for b equal to 200, 400, 800, or 1600 bit. We investigate the performance of the design using the GEM5 micro-architecture simulator. Compared to the latest hand-optimized results, we demonstrate a performance improvement of 2 times (over NEON programming) to 6 times (over Assembly programming). For example, an optimized NEON implementation of SHA3-512 computes a hash at 48.1 instructions per byte, while our design uses 21.9 instructions per byte. The NEON optimized version of the Lake Keyak AEAD uses 13.4 instructions per byte, while our design uses 7.7 instructions per byte. We provide comprehensive performance evaluation for multiple configurations of the Keccak-f[b] permutation in multiple applications (Hash, Encryption, AEAD). We also analyze the hardware cost of the proposed instructions in gate-equivalent of 90nm standard cells, and show that the proposed instructions only require 4658 GE, a fraction of the cost of a full ARM Cortex-A9.
- ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition Issue C. online at infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0406c/index.html, 2014.Google Scholar
- L. Batina, D. Hwang, A. Hodjat, B. Preneel, and I. Verbauwhede. Hardware/software co-design for hyperelliptic curve cryptography (HECC) on the 8051μp. In Cryptographic Hardware and Embedded Systems - CHES 2005, 7th International Workshop, Edinburgh, UK, August 29 - September 1, 2005, Proceedings, pages 106--118, 2005. Google ScholarDigital Library
- G. Bertoni, J. Daemen, M. Peeters, G. V. Assche, and R. V. Keer. KECCAK implementation overview. online at http://keccak.noekeon.org/Keccak-implementation-3.2.pdf, May 2012.Google Scholar
- G. Bertoni, J. Daemen, M. Peeters, and G. V. Assche. Sponge functions. Ecrypt Hash Workshop, May 2007.Google Scholar
- G. Bertoni, J. Daemen, M. Peeters, and G. V. Assche. Sponge-Based Pseudo-Random Number Generators. In S. Mangard and F.-X. Standaert, editors, CHES, volume 6225 of Lecture Notes in Computer Science, pages 33--47. Springer, 2010. Google ScholarDigital Library
- G. Bertoni, J. Daemen, M. Peeters, and G. V. Assche. Duplexing the sponge: single-pass authenticated encryption and other applications. In Selected Areas in Cryptography (SAC), 2011. Google ScholarDigital Library
- G. Bertoni, J. Daemen, M. Peeters, and G. V. Assche. On the security of the keyed sponge construction. Symmetric Key Encryption Workshop (SKEW), February 2011.Google Scholar
- G. Bertoni, J. Daemen, M. Peeters, and G. V. Assche. The Keccak reference. online at http://keccak.noekeon.org/Keccak-reference-3.0.pdf, January 2011.Google Scholar
- G. Bertoni, J. Daemen, M. Peeters, and G. V. Assche. CAESAR submission: Ketje v2. online at http://ketje.noekeon.org/Ketje-1.1.pdf, March 2014.Google Scholar
- G. Bertoni, J. Daemen, M. Peeters, and G. V. Assche. CAESAR submission: Keyak v2. online at http://keyak.noekeon.org/Keyak-2.1.pdf, December 2015.Google Scholar
- G. Bertoni, J. Daemen, M. Peeters, G. V. Assche, and R. V. Keer. The Keccak Code Package. online at https://github.com/gvanas/KeccakCodePackage, 2016.Google Scholar
- N. L. Binkert, B. M. Beckmann, G. Black, S. K. Reinhardt, A. G. Saidi, et al. The gem5 simulator. SIGARCH Computer Architecture News, 39(2):1--7, 2011. Google ScholarDigital Library
- R. Buchty, N. Heintze, and D. Oliva. Cryptonite - A programmable crypto processor architecture for high-bandwidth applications. In Organic and Pervasive Computing - ARCS 2004, International Conference on Architecture of Computing Systems, Augsburg, Germany, March 23--26, 2004, Proceedings, pages 184--198, 2004.Google Scholar
- J. Constantin, A. Burg, and F. K. Gürkaynak. Investigating the potential of custom instruction set extensions for SHA-3 candidates on a 16-bit microcontroller architecture. IACR Cryptology ePrint Archive, 2012:50, 2012.Google Scholar
- Intel. Corporation. Intel 64 and IA-32 Architectures Software Developers Manual. online at http://download.intel.com/design/processor/manuals/253665.pdf, May 2011.Google Scholar
- Samsung. Corporation. Samsung Foundry 32/28nm Low-Power High-K Metal Gate Logic Process and Design Ecosystem. online at http://www.samsung.com/us/business/oem-solutions/pdfs/Foundry_32-28nm_Final_0311.pdf, March 2011.Google Scholar
- C. Demerjian. A long look at how ARM licenses chips. online at http://semiaccurate.com/2013/08/07/a-long-look-at-how-arm-licenses-chips/, August 2013.Google Scholar
- M. J. Dworkin. Sha-3 standard: Permutation-based hash and extendable-output functions. Federal Inf. Process. Stds. (NIST FIPS) - 202, August 2015.Google ScholarCross Ref
- A. Frumusanu. Qualcomm Announces Snapdragon 625, 425 & 435 Mid- and Low-end SoCs. online at http://anandtech.com/show/10030/qualcomm-announces-snapdragon-625-425-435-mid-and-owend-socs, February 2016.Google Scholar
- B. Jungk and J. Apfelbeck. Area-efficient fpga implementations of the sha-3 finalists. In Reconfigurable Computing and FPGAs (ReConFig), 2011 International Conference on, pages 235--241, Nov 2011. Google ScholarDigital Library
- E. Käsper and P. Schwabe. Faster and timing-attack resistant AES-GCM. In Cryptographic Hardware and Embedded Systems - CHES 2009, 11th International Workshop, Lausanne, Switzerland, September 6--9, 2009, Proceedings, pages 1--17, 2009. Google ScholarDigital Library
- K. Sakiyama, L. Batina, B. Preneel, and I. Verbauwhede. Superscalar coprocessor for high-speed curve-based cryptography. In Cryptographic Hardware and Embedded Systems - CHES 2006, 8th International Workshop, Yokohama, Japan, October 10--13, 2006, Proceedings, pages 415--429, 2006. Google ScholarDigital Library
- K. Seto and M. Fujita. Custom instruction generation with high-level synthesis. In Application Specific Processors, 2008. SASP 2008. Symposium on, pages 14--19, June 2008. Google ScholarDigital Library
- A. L. Shimpi. Apple's Cyclone Microarchitecture Detailed. online at http://www.anandtech.com/show/7910/apples-cyclone-microarchitecture-detailed, March 2014.Google Scholar
- Y. Shin, K. Shin, P. Kenkare, R. Kashyap, H. J. Lee, et al. 28nm high- metal-gate heterogeneous quad-core cpus for high-performance and energy-efficient mobile application processor. In Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2013 IEEE International, pages 154--155, Feb 2013.Google ScholarCross Ref
- Y. Wang, Y. Shi, C. Wang, and Y. Ha. Fpga-based sha-3 acceleration on a 32-bit processor via instruction set extension. In Electron Devices and Solid-State Circuits (EDSSC), 2015 IEEE International Conference on, pages 305--308, June 2015.Google ScholarCross Ref
- P. Yalla, E. Homsirikamol, and J. Kaps. Comparison of multi-purpose cores of keccak and AES. In Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, DATE 2015, Grenoble, France, March 9--13, 2015, pages 585--588, 2015. Google ScholarDigital Library
Recommendations
Practical Collision Attacks against Round-Reduced SHA-3
AbstractThe Keccak hash function is the winner of the SHA-3 competition (2008–2012) and became the SHA-3 standard of NIST in 2015. In this paper, we focus on practical collision attacks against round-reduced SHA-3 and some Keccak variants. Following the ...
Code Size Reduction in Heterogeneous-Connectivity-Based DSPs Using Instruction Set Extensions
Existing trend of processors shows a progress toward customizable and reconfigurable architectures. In this paper, we study the benefit of combining the architectural design of a VLIW DSP and the concepts of modern customizable processors like ASIPs (...
Compiling for automatically generated instruction set extensions
CGO '12: Proceedings of the Tenth International Symposium on Code Generation and OptimizationThe automatic generation of instruction set extensions (ISEs) to provide application-specific acceleration for embedded processors has been a productive area of research in recent years. The use of automatic algorithms, however, results in instructions ...
Comments