Not long ago we introduced support for TLS cipher suites based on the ChaCha20-Poly1305 AEAD, for all our customers. Back then those cipher suites were only supported by the Chrome browser and Google's websites, but were in the process of standardization. We introduced these cipher suites to give end users on mobile devices the best possible performance and security.
CC BY-ND 2.0 image by Edwin Lee
Today the standardization process is all but complete and implementations of the most recent specification of the cipher suites have begun to surface. Firefox and OpenSSL have both implemented the new cipher suites for upcoming versions, and Chrome updated its implementation as well.
We, as pioneers of ChaCha20-Poly1305 adoption on the web, also updated our open sourced patch for OpenSSL. It implements both the older "draft" version, to keep supporting millions of users of the existing versions of Chrome, and the newer "RFC" version that supports the upcoming browsers from day one.
In this blog entry I review the history of ChaCha20-Poly1305, its standardization process, as well as its importance for the future of the web. I will also take a peek at its performance, compared to the other standard AEAD.
What is an AEAD?
ChaCha20-Poly1305 is an AEAD, Authenticated Encryption with Additional Data cipher. AEADs support two operations: "seal" and "open". Another common AEAD in use for TLS connections is AES-GCM.
CC BY 2.0 image by Jeremy Keith
The "seal" operation receives the following as input:
The message to be encrypted - this is the plaintext.
A secret key.
A unique initialization value - aka the IV. It must be unique between invocations of the "seal" operation with the same key, otherwise the secrecy of the cipher is completely compromised.
Optionally some other, non-secret, additional data. This data will not be encrypted, but will be authenticated - this is the AD in AEAD.
The "seal" operation uses the key and the IV to encrypt the plaintext into a ciphertext of equal length, using the underlying cipher. For ChaCha20-Poly1305 the cipher is ChaCha20 and for AES-GCM (the other AEAD in common use) the cipher is AES in CounTeR mode (AES-CTR).
After the data is encrypted, "seal" uses the key (and optionally the IV) to generate a secondary key. The secondary key is used to generate a keyed hash of the AD, the ciphertext and the individual lengths of each. The hash used in ChaCha20-Poly1305, is Poly1305 and in AES-GCM the hash is GHASH.
The final step is to take the hash value and encrypt it too, generating the final MAC (Message Authentication Code) and appending it to the ciphertext.
CC BY-SA 2.0 image by Kai Schreiber
The "open" operation is the reverse of "seal". It takes the same key and IV and generates the MAC of the ciphertext and the AD, similarly to the way "seal" did. It then reads the MAC appended after the ciphertext, and compares the two. Any difference in the MAC values would mean the ciphertext or the AD was tampered with, and they should be discarded as unsafe. If the two match, however, the operation decrypts the ciphertext, returning the original plaintext.
What makes AEADs special?
AEADs are special in the sense that they combine two algorithms - cipher and MAC, into a single primitive, with provable security guarantees. Before AEADs, it was acceptable to take some cipher and some MAC, which were considered secure independently, and combine them into an insecure combination. For example some combinations were broken by reusing the same keys for encryption and MAC (AES-CBC with CBC-MAC), while others by performing MAC over plaintext instead of the ciphertext AES-CBC with HMAC in TLS.
The new kid in the block
Up until recently the only standard AEAD used was AES-GCM. The problem with that, is that if someone breaks the GCM mode, we are left with no alternative - you wouldn't want to jump out of a plane without a backup parachute, would you? ChaCha20-Poly1305 is this backup.
We can't really call either ChaCha20 or Poly1305 really new. Both are the brain children of Daniel J. Bernstein (DJB). ChaCha20 is based upon an earlier cipher developed by DJB called Salsa, that dates back to 2005, and was submitted to the eSTREAM competition. ChaCha20 itself was published in 2008. It slightly modifies the Salsa round, and the number 20 indicates that it repeats for 20 rounds in total. Similar to AES-CTR, ChaCha20 is a stream cipher. It generates a pseudo-random stream of bits from an incremented counter, the stream is then "XORed" with plaintext to encrypt it (or "XORed" with ciphertext to decrypt). Because you do not need to know the plaintext in advance to generate the stream, this approach allows both to be very efficient and parallelizable. ChaCha20 is a 256-bit cipher.
Poly1305 was published in 2004. Poly1305 is a MAC, and can be used with any encrypted or unencrypted message, to generate a keyed authentication token. The purpose of such tokens is to guarantee the integrity of a given message. Originally Poly1305 used AES as the underlying cipher (Poly1305-AES); now it uses ChaCha20. Again, similarly to GHASH, it is a polynomial evaluation hash. Unlike GHASH, its key changes for each new message, because it depends on the IV. When DJB developed this MAC, he made it especially suited for efficient execution on the floating point hardware present on the CPUs back then. Today it is much more efficient to execute using 64-bit integer instructions, and might have been designed slightly differently.
Both have received considerable scrutiny from the crypto community in the years since, and today are considered completely safe, albeit there is a concern about the monoculture that forms when one person is responsible for so many standards in the industry (DJB is also responsible for Curve25519 key exchange).
From zero to hero
The body that governs internet standards is the IETF - Internet Engineering Task Force. All the standards we use on the internet, including TLS, come from that organization. All standards that relate to encryption come from the TLS and CFRG workgroups of IETF. The standardization process is open to all, and the correspondence that relates to it is kept public in a special archive.
The first mention for ChaCha20-Poly1305 I found in the archive dates to 30 July 2013. It is still referred to as Salsa back then.
After some time and debate an initial draft was published by Adam Langley from Google in September 2013. The latest draft of the ChaCha20-Poly1305 for TLS including all the previous revisions can be found here. It is interesting to see the incremental process, and the gradual refinement. For example initially ChaCha20 was also supposed to work with HMAC-SHA1.
Another standard that defines the general usage of ChaCha20-Poly1305 is RFC7539. First published in January 2014, it was standardized in May 2015.
Differences
There are two key differences between the draft version we initially implemented and the current version of the cipher suites that make the two incompatible.
The first difference relates to how the cipher suite is used in TLS. The current version incorporated the TLS records sequence number into the IV generation, making it more resistant to dangerous IV reuse.
The second difference relates to how Poly1305 applies to the TLS record. Records are the equivalent of a TCP packet for TLS. When data is streamed over TLS, it is broken into many smaller records. Each record holds part of the data (encrypted), with the MAC calculated for that record. It also holds other information, such as the protocol version, record type and length. The maximum amount of data a single record can hold is 16KB.
The draft Poly1305 calculated the hash of the additional data, followed by the length of the additional data as an 8 byte string, followed by the ciphertext, followed by the length of the ciphertext as an 8 byte string. In the current iteration, the hash is generated over the additional data, padded with zeroes to 16 byte boundary, followed by ciphertext similarly padded with zeroes, followed by the length of the additional data as an 8 byte string, followed by the length of the ciphertext as an 8 byte string.
The older cipher suites can be identified by IDs {cc}{13}, {cc}{14} and {cc}{15}, while the newer cipher suites have IDs {cc}{a8} through {cc}{ae}.
Future of ChaCha20-Poly1305
Today we already see that almost 20% of all the request to sites using CloudFlare use ChaCha20-Poly1305. And that is with only one browser supporting it. In the coming months Firefox will join the party, potentially increasing this number.
More importantly, the IETF is currently finalizing another very important standard, TLS 1.3. Right now it looks like TLS 1.3 will allow AEADs only, leaving AES-GCM and ChaCha20-Poly1305 as the only two options (for now). This would definitely bring the usage of ChaCha20-Poly1305 up significantly.
Can you handle it?
Given the rising popularity of ChaCha20-Poly1305 suites, and TLS in general, it is important to have efficient implementations that does not hog too much of the servers' CPU time. ChaCha20-Poly1305 allows for highly efficient implementation using SIMD instructions. Most of our servers are based on Intel CPUs with 256-bit SIMD extensions called AVX2. We utilize those to get the maximal performance.
The main competition for ChaCha20-Poly1305 are the AES-GCM based cipher suites. The most widely used AES-GCM, uses AES with 128 bit key, however in terms of security AES-256 is more comparable to ChaCha20.
Usually cipher performance numbers are reported for large messages, to show asymptotic performance, but on our network we started using dynamic record sizing. In practice it means many connections will never reach the maximal size of a TLS record (16KB), but instead will use significantly smaller records (below 1400 bytes). The records dynamically grow as the connection progresses, scaling to about 4KB and eventually to 16KB. Most messages will also not fit precisely into a record, and all sizes are possible.
Below are two graphs, comparing the performance of our ChaCha20-Poly1305 to the implementation in OpenSSL 1.1.0 pre, and to AES-GCM. The performance is reported in CPU cycles per byte, for a plaintext of given length, when performing the "seal" operation on a given plaintext with 13 bytes of AD, similarly to TLS. The first graph covers sizes 64-1536, while the second covers the remaining sizes to 16KB.
CPU cycles per byte (Y-axis) vs. record size in bytes (X-axis)
CPU cycles per byte (Y-axis) vs. record size in bytes (X-axis)
We can see that our implementation significantly outperforms OpenSSL for short records, and is slightly faster for longer records. The average performance advantage is 7%. AES-128-GCM and AES-256-GCM both still beat ChaCha20-Poly1305 in pure performance for records larger than 320 bytes, but getting below 2 cycles/byte is a major performance achievement. Not many modes can beat this performance. It is also important to note that AES-GCM uses two dedicated CPU instructions (AESENC and CLMULQDQ), whereas both ChaCha and Poly only use the generic SIMD instructions.
Performance outlook
The current performance is outstanding. We measured the performance on a Haswell CPU. Broadwell and Skylake CPUs actually perform AES-GCM faster, but we don't use them in our servers yet.
In the future, processors with wider SIMD instructions are expected to bridge the performance gap. The AVX512 will provide instructions twice as wide, and potentially will improve the performance two fold as well, bringing it below 1 cycle/byte. Following AVX512, Intel is expected to release the AVX512IFMA extensions too, that will accelerate Poly1305 even further.
Conclusion
CloudFlare is constantly pushing the envelope in terms of TLS performance and availability of the most secure cipher suites and modes. We are actively involved in the development and specification of TLS 1.3 and are committed to open source by releasing our performance patches.