skip to main content
10.1145/2948618.2948622acmotherconferencesArticle/Chapter ViewAbstractPublication PageshaspConference Proceedingsconference-collections
research-article

SIMD Instruction Set Extensions for Keccak with Applications to SHA-3, Keyak and Ketje

Authors Info & Claims
Published:18 June 2016Publication History

ABSTRACT

Recent processor architectures such as Intel Westmere (and later) and ARMv8 include instruction-level support for the Advanced Encryption Standard (AES), for the Secure Hashing Standard (SHA-1, SHA2) and for carry-less multiplication. These crypto-instruction sets provide specialized hardware processing at the top of the memory hierarchy, and provide significant performance improvements over general-purpose software for common cryptographic operations. We propose a crypto-instruction set for the Keccak cryptographic sponge and for the Keccak duplex construction. Our design is integrated on a 128 bit SIMD interface, applicable to the ARM NEON and Intel AVX (128 bit) architecture. The proposed instruction set is optimized for flexibility and supports multiple variants of the Keccak-f[b] permutation, for b equal to 200, 400, 800, or 1600 bit. We investigate the performance of the design using the GEM5 micro-architecture simulator. Compared to the latest hand-optimized results, we demonstrate a performance improvement of 2 times (over NEON programming) to 6 times (over Assembly programming). For example, an optimized NEON implementation of SHA3-512 computes a hash at 48.1 instructions per byte, while our design uses 21.9 instructions per byte. The NEON optimized version of the Lake Keyak AEAD uses 13.4 instructions per byte, while our design uses 7.7 instructions per byte. We provide comprehensive performance evaluation for multiple configurations of the Keccak-f[b] permutation in multiple applications (Hash, Encryption, AEAD). We also analyze the hardware cost of the proposed instructions in gate-equivalent of 90nm standard cells, and show that the proposed instructions only require 4658 GE, a fraction of the cost of a full ARM Cortex-A9.

References

  1. ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition Issue C. online at infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0406c/index.html, 2014.Google ScholarGoogle Scholar
  2. L. Batina, D. Hwang, A. Hodjat, B. Preneel, and I. Verbauwhede. Hardware/software co-design for hyperelliptic curve cryptography (HECC) on the 8051μp. In Cryptographic Hardware and Embedded Systems - CHES 2005, 7th International Workshop, Edinburgh, UK, August 29 - September 1, 2005, Proceedings, pages 106--118, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. G. Bertoni, J. Daemen, M. Peeters, G. V. Assche, and R. V. Keer. KECCAK implementation overview. online at http://keccak.noekeon.org/Keccak-implementation-3.2.pdf, May 2012.Google ScholarGoogle Scholar
  4. G. Bertoni, J. Daemen, M. Peeters, and G. V. Assche. Sponge functions. Ecrypt Hash Workshop, May 2007.Google ScholarGoogle Scholar
  5. G. Bertoni, J. Daemen, M. Peeters, and G. V. Assche. Sponge-Based Pseudo-Random Number Generators. In S. Mangard and F.-X. Standaert, editors, CHES, volume 6225 of Lecture Notes in Computer Science, pages 33--47. Springer, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. Bertoni, J. Daemen, M. Peeters, and G. V. Assche. Duplexing the sponge: single-pass authenticated encryption and other applications. In Selected Areas in Cryptography (SAC), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Bertoni, J. Daemen, M. Peeters, and G. V. Assche. On the security of the keyed sponge construction. Symmetric Key Encryption Workshop (SKEW), February 2011.Google ScholarGoogle Scholar
  8. G. Bertoni, J. Daemen, M. Peeters, and G. V. Assche. The Keccak reference. online at http://keccak.noekeon.org/Keccak-reference-3.0.pdf, January 2011.Google ScholarGoogle Scholar
  9. G. Bertoni, J. Daemen, M. Peeters, and G. V. Assche. CAESAR submission: Ketje v2. online at http://ketje.noekeon.org/Ketje-1.1.pdf, March 2014.Google ScholarGoogle Scholar
  10. G. Bertoni, J. Daemen, M. Peeters, and G. V. Assche. CAESAR submission: Keyak v2. online at http://keyak.noekeon.org/Keyak-2.1.pdf, December 2015.Google ScholarGoogle Scholar
  11. G. Bertoni, J. Daemen, M. Peeters, G. V. Assche, and R. V. Keer. The Keccak Code Package. online at https://github.com/gvanas/KeccakCodePackage, 2016.Google ScholarGoogle Scholar
  12. N. L. Binkert, B. M. Beckmann, G. Black, S. K. Reinhardt, A. G. Saidi, et al. The gem5 simulator. SIGARCH Computer Architecture News, 39(2):1--7, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Buchty, N. Heintze, and D. Oliva. Cryptonite - A programmable crypto processor architecture for high-bandwidth applications. In Organic and Pervasive Computing - ARCS 2004, International Conference on Architecture of Computing Systems, Augsburg, Germany, March 23--26, 2004, Proceedings, pages 184--198, 2004.Google ScholarGoogle Scholar
  14. J. Constantin, A. Burg, and F. K. Gürkaynak. Investigating the potential of custom instruction set extensions for SHA-3 candidates on a 16-bit microcontroller architecture. IACR Cryptology ePrint Archive, 2012:50, 2012.Google ScholarGoogle Scholar
  15. Intel. Corporation. Intel 64 and IA-32 Architectures Software Developers Manual. online at http://download.intel.com/design/processor/manuals/253665.pdf, May 2011.Google ScholarGoogle Scholar
  16. Samsung. Corporation. Samsung Foundry 32/28nm Low-Power High-K Metal Gate Logic Process and Design Ecosystem. online at http://www.samsung.com/us/business/oem-solutions/pdfs/Foundry_32-28nm_Final_0311.pdf, March 2011.Google ScholarGoogle Scholar
  17. C. Demerjian. A long look at how ARM licenses chips. online at http://semiaccurate.com/2013/08/07/a-long-look-at-how-arm-licenses-chips/, August 2013.Google ScholarGoogle Scholar
  18. M. J. Dworkin. Sha-3 standard: Permutation-based hash and extendable-output functions. Federal Inf. Process. Stds. (NIST FIPS) - 202, August 2015.Google ScholarGoogle ScholarCross RefCross Ref
  19. A. Frumusanu. Qualcomm Announces Snapdragon 625, 425 & 435 Mid- and Low-end SoCs. online at http://anandtech.com/show/10030/qualcomm-announces-snapdragon-625-425-435-mid-and-owend-socs, February 2016.Google ScholarGoogle Scholar
  20. B. Jungk and J. Apfelbeck. Area-efficient fpga implementations of the sha-3 finalists. In Reconfigurable Computing and FPGAs (ReConFig), 2011 International Conference on, pages 235--241, Nov 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. E. Käsper and P. Schwabe. Faster and timing-attack resistant AES-GCM. In Cryptographic Hardware and Embedded Systems - CHES 2009, 11th International Workshop, Lausanne, Switzerland, September 6--9, 2009, Proceedings, pages 1--17, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K. Sakiyama, L. Batina, B. Preneel, and I. Verbauwhede. Superscalar coprocessor for high-speed curve-based cryptography. In Cryptographic Hardware and Embedded Systems - CHES 2006, 8th International Workshop, Yokohama, Japan, October 10--13, 2006, Proceedings, pages 415--429, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. Seto and M. Fujita. Custom instruction generation with high-level synthesis. In Application Specific Processors, 2008. SASP 2008. Symposium on, pages 14--19, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. L. Shimpi. Apple's Cyclone Microarchitecture Detailed. online at http://www.anandtech.com/show/7910/apples-cyclone-microarchitecture-detailed, March 2014.Google ScholarGoogle Scholar
  25. Y. Shin, K. Shin, P. Kenkare, R. Kashyap, H. J. Lee, et al. 28nm high- metal-gate heterogeneous quad-core cpus for high-performance and energy-efficient mobile application processor. In Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2013 IEEE International, pages 154--155, Feb 2013.Google ScholarGoogle ScholarCross RefCross Ref
  26. Y. Wang, Y. Shi, C. Wang, and Y. Ha. Fpga-based sha-3 acceleration on a 32-bit processor via instruction set extension. In Electron Devices and Solid-State Circuits (EDSSC), 2015 IEEE International Conference on, pages 305--308, June 2015.Google ScholarGoogle ScholarCross RefCross Ref
  27. P. Yalla, E. Homsirikamol, and J. Kaps. Comparison of multi-purpose cores of keccak and AES. In Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, DATE 2015, Grenoble, France, March 9--13, 2015, pages 585--588, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    HASP '16: Proceedings of the Hardware and Architectural Support for Security and Privacy 2016
    June 2016
    96 pages
    ISBN:9781450347693
    DOI:10.1145/2948618

    Copyright © 2016 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 18 June 2016

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate9of13submissions,69%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader