Question

An example, in x86 are Instruction Set to hardware acceleration AES. But are there in x86 any instructions to accelerate SHA (SHA1/2/256/512) encoding, and what library is the fastet to encoding SHA on x86?

Was it helpful?

Solution 2

Are there in x86 any instructions to accelerate SHA (SHA1/2/256/512) encoding?

It's November 2016 and the answer is finally Yes. But its only SHA-1 and SHA-256 (and by extension, SHA-224).

Intel CPUs with SHA extensions hit the market recently. It looks like processors which support it are Goldmont microarchitecture:

  • Pentium J4205 (desktop)
  • Pentium N4200 (mobile)
  • Celeron J3455 (desktop)
  • Celeron J3355 (desktop)
  • Celeron N3450 (mobile)
  • Celeron N3350 (mobile)

I looked through offerings at Amazon for machines with the architecture or the processor numbers, but I did not find any available (yet). I believe HP Acer had one laptop with Pentium N4200 expected to be available in November 2016 December 2016 that would meet testing needs.

For some of the technical details why it's only SHA-1, SHA-224 and SHA-256, then see crypto: arm64/sha256 - add support for SHA256 using NEON instructions on the kernel crypto mailing list. The short answer is, above SHA-256, things are not easily parallelizable.


You can find source code for both Intel SHA intrinsics and ARMv8 SHA intrinsics at Noloader GitHub | SHA-Intrinsics. They are C source files, and provide the compress function for SHA-1, SHA-224 and SHA-256. The intrinsic-based implementations increase throughput approximately 3× to 4× for SHA-1, and approximately 6× to 12× for SHA-224 and SHA-256.

OTHER TIPS

Intel has upcoming instructions for accelerating the calculation of SHA1 /256 hashes.

enter image description here

You can read more about them, how to detect if your CPU support them and how to use them here.

(But not SHA-512, you'll still need to manually vectorize that with regular SIMD instructions. AVX512 should help for SHA-512 (and for SHA-1 / SHA-256 on CPUs with AVX512 but not SHA extensions), providing SIMD rotates as well as shifts, for example https://github.com/minio/sha256-simd)

It was hoped that Intel's Skylake microarchitecture would have them, but it doesn't. Intel CPU's with it are low-power Goldmont in 2016, then Goldmont Plus in 2017. Intel's first mainstream CPU with SHA extensions will be Cannon Lake. Skylake / Kaby Lake / Coffee Lake do not.

AMD Ryzen (2017) has SHA extension.

A C/C++ programmer is probably best off using OpenSSL, which will use whatever CPU features it can to hash quickly. (Including SHA extensions on CPUs that have them, if your version of OpenSSL is new enough.)

2019 Update:

OpenSSL does use H/W acceleration when present.

On Intel's side Goldmont µarch has (Atom-series) and from Cannonlake (desktop/mobile, 10nm) onwards have SHA-NI support, Cascade Lake server CPUs and older do not support it. Yes, support is non-linear on timeline due to parallel CPU/µarch lines present.

In 2017 AMD released their Zen µarch, so all current server and desktop CPUs based on Zen fully support it.


My benchmark of OpenSSL speed SHA256 showed a 550% speed increase with a block size of 8KiB.

For real 1GB and 5GB files loaded to RAM the hashing was roughly 3x times faster.

(Benchmarked on Ryzen 1700 @ 3.6 GHz, 2933CL16 RAM; OpenSSL: 1.0.1 no support vs 1.1.1 with support)


Absolute values for comparison against other hash functions:

sha1   (1.55GHz):  721,1 MiB/s
sha256 (1.55GHz):  668.8 MiB/s
sha1   (3.8GHz) : 1977,9 MiB/s
sha256 (3.8GHz) : 1857,7 MiB/s

See this for details until there's a way to add tables on SO.


CPUID identification, page 298: 07h in EAX → EBX Bit 29 == 1.

Intel's Instruction Set Reference, page 1264ff.

Agner Fog's Instruction tables where he benchmarks instruction latency/µops etc. (currently Zen, Goldmont, Goldmont Plus available)

Code example, SIMD comparison: minio/sha256-simd

Try something open source such as OpenSSL I have personally used their MD5 hashing functions and those worked pretty well. You might also want to take a look at hashlib2++.

As far as I know Intel hasn't made dedicated instruction set for SHA-1 or two. They may in upcoming architectures as CodesInChaos indicated in a comment. The major component in most hashing algorithms is the XOR operation which is already in the instruction set.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top