Monocypher

Speed Benchmarks

For those who care about speed, Monocypher comes with a couple
benchmarks. Run them on your platform if you're not sure this is fast
enough. There are also benchmarks for Libsodium and TweetNaCl.

This page reports the results on my core i5 Skylake laptop (x86-64), and
on a Raspberry pi model 3B (ARM core of an BCM2837). Everything is
single threaded, and compiled with GCC. The versions measured here are
Monocypher 2.0.0 and Libsodium 1.0.16.

To avoid a false sense of accuracy, most reported numbers are rounded to
the nearest two significant digits. Absolute numbers are expressed in
megabytes per seconds, exchanges per seconds, or signatures per seconds.

The --enable-opt and --disable-asm options are set during
Libsodium's configure step. The other options are set by overriding
the CFLAGS variable during the make step. Note that the
--disable-asm doesn't disable 128-bit arithmetic.

Effect of compilation options

Not everyone can afford, or even trust, -O3 -march=native. Here we
measure the effect of compilation options. The baseline is -O3
-march=native. The other are compared to it. Libsodium has another
line, ASM, where it takes advantage of non-portable implementations
(typically compiler intrinsics).

I did not have the courage to test all build options for the R-pi.
Compilation was too damn slow, and I didn't feel like setting up my
first cross compilation tool-chain.

Libsodium doesn't seem to have dedicated optimisations for the R-pi.

TweetNaCl

Note that TweetNaCl uses Salsa20 and SHA-512. Monocypher and Libsodium
use Chacha20 and Blake2b instead. While Salsa20 and Chacha20 are mostly
comparable, Blake2b is much faster than SHA-512. This puts TweetNaCl at
a disadvantage.

Libsodium is consistently fast on key exchanges and signatures because
of 128-bit arithmetic and precomputed tables (signatures only).
Monocypher doesn't use 128-bit arithmetic to stay portable, and it
doesn't use precomputed tables because they cost way too much code.

More interesting is the performance of Blake2b and Argon2i. Monocypher
actually beats Libsodium's reference implementations. For Blake2b,
this is because Monocypher forcibly unrolls the inner loop, which
enables better constant propagation. This costs about 4Kb of generated
code. For Argon2i, I'm not sure. I suspect the reference
implementation performs extraneous copies and allocations.

No more 128-bit arithmetic on the R-pi, so key exchange performs the
same. Signatures still benefit from their pre-computed table.

Monocypher lost its edge for Blake2b. I suspect the unrolled inner loop
strains the instruction cache of the R-pi's smaller processor.

Libsodium's Poly1305 is much fasten than Monocypher's. I have no idea
why, I haven't seen any 32-bit specific implementation. This gives it a
significant edge for authenticated encryption, though not nearly as
impressive as x86-64.

TweetNaCl is much slower than Monocypher. This might seem strange,
considering both Monocypher and TweetNaCl restrict themselves to
portable C.

The main reason is, TweetNaCl sacrificed performance to shrink its
source code. Its modular multiplication (Poly1305 and curve25519) is
very slow. Signatures perform a regular double and add ladder in
Twisted Edwards space, while Monocypher switches to Montgomery space and
back to compensate for the lack of a precomputed table.

Something more devious is going on as well: encryption and hashing
aren't as slow as they look. With the maximum optimisation level, they
are actually as fast as Monocypher's. Alas, encryption performance is
offset by the slow Poly1305 authentication, and hashing performance
looks bad because Monocypher cheats by using a faster algorithm.

Conclusion

Monocypher is closer in performance to Libsodium, and closer in size to
TweetNaCl, even on a logarithmic scale. Not quite the best of both
worlds, but still a nice sweet spot.

Raw data

Monocypher (core i5 Skylake, Ubuntu 16.04)

-O3 -march=native

Chacha20 : 390 megabytes per second
Poly1305 : 1271 megabytes per second
Auth'd encryption: 298 megabytes per second
Blake2b : 685 megabytes per second
SHA-512 : 287 megabytes per second
Argon2i, 3 passes: 484 megabytes per second
x25519 : 7864 exchanges per second
EdDSA(sign) : 6883 signatures per second
EdDSA(check) : 3579 checks per second

-O2

Chacha20 : 361 megabytes per second
Poly1305 : 1065 megabytes per second
Auth'd encryption: 270 megabytes per second
Blake2b : 579 megabytes per second
SHA-512 : 228 megabytes per second
Argon2i, 3 passes: 354 megabytes per second
x25519 : 7718 exchanges per second
EdDSA(sign) : 6757 signatures per second
EdDSA(check) : 3519 checks per second

-Os

Chacha20 : 307 megabytes per second
Poly1305 : 944 megabytes per second
Auth'd encryption: 231 megabytes per second
Blake2b : 457 megabytes per second
SHA-512 : 228 megabytes per second
Argon2i, 3 passes: 353 megabytes per second
x25519 : 7586 exchanges per second
EdDSA(sign) : 6607 signatures per second
EdDSA(check) : 3453 checks per second

Libsodium 1.0.16 (core i5, Ubuntu 16.04)

--enable-opt, default flags

Chacha20 : 2129 megabytes per second
Poly1305 : 2475 megabytes per second
Auth'd encryption: 1147 megabytes per second
Blake2b : 782 megabytes per second
SHA-512 : 347 megabytes per second
Argon2i, 3 passes: 731 megabytes per second
x25519 : 20618 exchanges per second
EdDSA(sign) : 36150 signatures per second
EdDSA(check) : 13207 checks per second

--disable-asm, default flags

Chacha20 : 403 megabytes per second
Poly1305 : 1161 megabytes per second
Auth'd encryption: 298 megabytes per second
Blake2b : 576 megabytes per second
SHA-512 : 294 megabytes per second
Argon2i, 3 passes: 358 megabytes per second
x25519 : 15465 exchanges per second
EdDSA(sign) : 32750 signatures per second
EdDSA(check) : 13211 checks per second

--disable-asm, -O3 -march=native

Chacha20 : 393 megabytes per second
Poly1305 : 1113 megabytes per second
Auth'd encryption: 292 megabytes per second
Blake2b : 604 megabytes per second
SHA-512 : 347 megabytes per second
Argon2i, 3 passes: 433 megabytes per second
x25519 : 15317 exchanges per second
EdDSA(sign) : 36949 signatures per second
EdDSA(check) : 13910 checks per second

--disable-asm, -O2

Chacha20 : 403 megabytes per second
Poly1305 : 1161 megabytes per second
Auth'd encryption: 300 megabytes per second
Blake2b : 569 megabytes per second
SHA-512 : 294 megabytes per second
Argon2i, 3 passes: 352 megabytes per second
x25519 : 15486 exchanges per second
EdDSA(sign) : 32709 signatures per second
EdDSA(check) : 13451 checks per second

--disable-asm, -Os

Chacha20 : 333 megabytes per second
Poly1305 : 1139 megabytes per second
Auth'd encryption: 258 megabytes per second
Blake2b : 530 megabytes per second
SHA-512 : 290 megabytes per second
Argon2i, 3 passes: 243 megabytes per second
x25519 : 15328 exchanges per second
EdDSA(sign) : 29652 signatures per second
EdDSA(check) : 13166 checks per second

TweetNaCl (core i5 Skylake, Ubuntu 16.04)

-O3 -march=native

Salsa20 : 232 megabytes per second
Poly1305 : 82 megabytes per second
Auth'd encryption: 60 megabytes per second
SHA-512 : 213 megabytes per second
x25519 : 1739 exchanges per second
EdDSA(sign) : 646 signatures per second
EdDSA(check) : 323 checks per second

-O2

Salsa20 : 59 megabytes per second
Poly1305 : 40 megabytes per second
Auth'd encryption: 24 megabytes per second
SHA-512 : 86 megabytes per second
x25519 : 857 exchanges per second
EdDSA(sign) : 505 signatures per second
EdDSA(check) : 253 checks per second

-Os

Salsa20 : 60 megabytes per second
Poly1305 : 39 megabytes per second
Auth'd encryption: 23 megabytes per second
SHA-512 : 82 megabytes per second
x25519 : 843 exchanges per second
EdDSA(sign) : 497 signatures per second
EdDSA(check) : 249 checks per second

Monocypher (Raspberry-Pi, model 3B )

-O3 march=native

Chacha20 : 63 megabytes per second
Poly1305 : 67 megabytes per second
Auth'd encryption: 32 megabytes per second
Blake2b : 26 megabytes per second
SHA-512 : 13 megabytes per second
Argon2i, 3 passes: 19 megabytes per second
x25519 : 679 exchanges per second
EdDSA(sign) : 597 signatures per second
EdDSA(check) : 309 checks per second

-O2

Chacha20 : 59 megabytes per second
Poly1305 : 67 megabytes per second
Auth'd encryption: 31 megabytes per second
Blake2b : 26 megabytes per second
SHA-512 : 13 megabytes per second
Argon2i, 3 passes: 20 megabytes per second
x25519 : 656 exchanges per second
EdDSA(sign) : 579 signatures per second
EdDSA(check) : 299 checks per second

-Os

Chacha20 : 57 megabytes per second
Poly1305 : 69 megabytes per second
Auth'd encryption: 31 megabytes per second
Blake2b : 32 megabytes per second
SHA-512 : 14 megabytes per second
Argon2i, 3 passes: 0 megabytes per second
x25519 : 776 exchanges per second
EdDSA(sign) : 667 signatures per second
EdDSA(check) : 354 checks per second

Libsodium (Raspberry-Pi, model 3B )

--enable-opt

Chacha20 : 72 megabytes per second
Poly1305 : 166 megabytes per second
Auth'd encryption: 50 megabytes per second
Blake2b : 26 megabytes per second
SHA-512 : 11 megabytes per second
Argon2i, 3 passes: 19 megabytes per second
x25519 : 686 exchanges per second
EdDSA(sign) : 1702 signatures per second
EdDSA(check) : 618 checks per second

--disable-asm

Chacha20 : 73 megabytes per second
Poly1305 : 166 megabytes per second
Auth'd encryption: 50 megabytes per second
Blake2b : 26 megabytes per second
SHA-512 : 11 megabytes per second
Argon2i, 3 passes: 19 megabytes per second
x25519 : 677 exchanges per second
EdDSA(sign) : 1696 signatures per second
EdDSA(check) : 601 checks per second

TweetNaCl (Raspberry-Pi, model 3B )

-O3 march=native

Salsa20 : 64 megabytes per second
Poly1305 : 9 megabytes per second
Auth'd encryption: 7 megabytes per second
SHA-512 : 11 megabytes per second
x25519 : 78 exchanges per second
EdDSA(sign) : 44 signatures per second
EdDSA(check) : 22 checks per second

-O2

Salsa20 : 8 megabytes per second
Poly1305 : 4 megabytes per second
Auth'd encryption: 3 megabytes per second
SHA-512 : 7 megabytes per second
x25519 : 72 exchanges per second
EdDSA(sign) : 42 signatures per second
EdDSA(check) : 21 checks per second

-Os

Salsa20 : 8 megabytes per second
Poly1305 : 4 megabytes per second
Auth'd encryption: 3 megabytes per second
SHA-512 : 7 megabytes per second
x25519 : 58 exchanges per second
EdDSA(sign) : 34 signatures per second
EdDSA(check) : 17 checks per second