Go crypto: bridging the performance gap

by Vlad Krasnov.

It is no secret that we at CloudFlare love Go. We use it, and we use it a LOT. There are many things to love about Go, but what I personally find appealing is the ability to write assembly code!

CC BY 2.0 image by Jon Curnow

That is probably not the first thing that pops to your mind when you think of Go, but yes, it does allow you to write code "close to the metal" if you need the performance!

Another thing we do a lot in CloudFlare is... cryptography. To keep your data safe we encrypt everything. And everything in CloudFlare is a LOT.

Unfortunately the built-in cryptography libraries in Go do not perform nearly as well as state-of-the-art implementations such as OpenSSL. That is not acceptable at CloudFlare's scale, therefore we created assembly implementations of Elliptic Curves and AES-GCM for Go on the amd64 architecture, supporting the AES and CLMUL NI to bring performance up to par with the OpenSSL implementation we use for Universal SSL.

We have been using those improved implementations for a while, and attempting to make them part of the official Go build for the good of the community. For now you can use our special fork of Go to enjoy the improved performance.

Both implementations are constant-time and side-channel protected. In addition the fork includes small improvements to Go's RSA implementation.

The performance benefits of this fork over the standard Go 1.4.2 library on the tested Haswell CPU are:

                         CloudFlare          Go 1.4.2        Speedup
AES-128-GCM           2,138.4 MB/sec          91.4 MB/sec     23.4X

P256 operations:  
Base Multiply           26,249 ns/op        737,363 ns/op     28.1X  
Multiply/ECDH comp     110,003 ns/op      1,995,365 ns/op     18.1X  
Generate Key/ECDH gen   31,824 ns/op        753,174 ns/op     23.7X  
ECDSA Sign              48,741 ns/op      1,015,006 ns/op     20.8X  
ECDSA Verify           146,991 ns/op      3,086,282 ns/op     21.0X

Sign                 3,733,747 ns/op      7,979,705 ns/op      2.1X  
Sign 3-prime         1,973,009 ns/op      5,035,561 ns/op      2.6X  

AES-GCM in a brief

So what is AES-GCM and why do we care? Well, it is an AEAD - Authenticated Encryption with Associated Data. Specifically AEAD is a special combination of a cipher and a MAC (Message Authentication Code) algorithm into a single robust algorithm, using a single key. This is different from the other method of performing authenticated encryption "encrypt-then-MAC" (or as TLS does it with CBC-SHAx, "MAC-then-encrypt"), where you can use any combination of cipher and MAC.

Using a dedicated AEAD reduces the dangers of bad combinations of ciphers and MACs, and other mistakes, such as using related keys for encryption and authentication.

Given the many vulnerabilities related to the use of AES-CBC with HMAC, and the weakness of RC4, AES-GCM is the de-facto secure standard on the web right now, as the only IETF-approved AEAD to use with TLS at the moment.

Another AEAD you may have heard of is ChaCha20-Poly1305, which CloudFlare also supports, but it is not a standard just yet.

That is why we use AES-GCM as the preferred cipher for customer HTTPS only prioritizing ChaCha20-Poly1305 for mobile browsers that support it. You can see it in our configuration. As such today more than 60% of our client facing traffic is encrypted with AES-GCM, and about 10% is encrypted with ChaCha20-Poly1305. This percentage grows every day, as browser support improves. We also use AES-GCM to encrypt all the traffic between our data centers.

CC BY 2.0 image by 3:19


As I mentioned AEAD is a special combination of a cipher and a MAC. In the case of AES-GCM the cipher is the AES block cipher in Counter Mode (AES-CTR). For the MAC it uses a universal hash called GHASH, encrypted with AES-CTR.

The inputs to the AES-GCM AEAD encryption are as follows:

  • The secret key (K), that may be 128, 192 or 256 bit long. In TLS, the key is usually valid for the entire connection.
  • A special unique value called IV (initialization value) - in TLS it is 96 bits. The IV is not secret, but the same IV may not be used for more than one message with the same key under any circumstance! To achieve that, usually part of the IV is generated as a nonce value, and the rest of it is incremented as a counter. In TLS the IV counter is also the record sequence number. The IV of GCM is unlike the IV in CBC mode, which must also be unpredictable. The disadvantage of using this type of IV, is that in order to avoid collisions, one must change the encryption key, before the IV counter overflows.
  • The additional data (A) - this data is not secret, and therefore not encrypted, but it is being authenticated by the GHASH. In TLS the additional data is 13 bytes, and includes data such as the record sequence number, type, length and the protocol version.
  • The plaintext (P) - this is the secret data, it is both encrypted and authenticated.

The operation outputs the ciphertext (C) and the authentication tag (T). The length of C is identical to that of P, and the length of T is 128 bits (although some applications allow for shorter tags). The tag T is computed over A and C, so if even a single bit of either of them is changed, the decryption process will detect the tampering attempt and refuse to use the data. In TLS, the tag T is attached at the end of the ciphertext C.

When decrypting the data, the function will receive A, C and T and compute the authentication tag of the received A and C. It will compare the resulting value to T, and only if both are equal it will output the plaintext P.

By supporting the two state of the art AEADs - AES-GCM and ChaCha20-Poly1305, together with ECDSA and ECDH algorithms, CloudFlare is able to provide the fastest, most flexible and most secure TLS experience possible to date on all platforms, be it a PC or a mobile phone.

Bottom line

Go is a very easy to learn and fun to use, yet it is one of the most powerful languages for system programming. It allows us to deliver robust web-scale software in short time frames. With the performance improvements CloudFlare brings to its crypto stack, Go can now be used for high performance TLS servers as well!

comments powered by Disqus