December 6, 2017

Make SSL boring again

8 minute read

It may (or may not!) come as surprise, but a few months ago we migrated Cloudflare’s edge SSL connection termination stack to use BoringSSL: Google's crypto and SSL implementation that started as a fork of OpenSSL.

We dedicated several months of work to make this happen without negative impact on customer traffic. We had a few bumps along the way, and had to overcome some challenges, but we ended up in a better place than we were in a few months ago.

TLS 1.3

We have already blogged extensively about TLS 1.3. Our original TLS 1.3 stack required our main SSL termination software (which was based on OpenSSL) to hand off TCP connections to a separate system based on our fork of Go's crypto/tls standard library, which was specifically developed to only handle TLS 1.3 connections. This proved handy as an experiment that we could roll out to our client base in relative safety.

However, over time, this separate system started to make our lives more complicated: most of our SSL-related business logic needed to be duplicated in the new system, which caused a few subtle bugs to pop up, and made it harder to roll-out new features such as Client Auth to all our clients.

As it happens, BoringSSL has supported TLS 1.3 for quite a long time (it was one the first open source SSL implementations to work on this feature), so now all of our edge SSL traffic (including TLS 1.3 connections) is handled by the same system, with no duplication, no added complexity, and no increased latency. Yay!

Fancy new crypto, part 1: X25519 for TLS 1.2 (and earlier)

When establishing an SSL connection, client and server will negotiate connection-specific secret keys that will then be used to encrypt the application traffic. There are a few different methods for doing this, the most popular one being ECDH (Elliptic Curve Diffie–Hellman). Long story short this depends on an elliptic curve being negotiated between client and server.

For the longest time the only widely supported curves available were the ones defined by NIST, until Daniel J. Bernstein proposed Curve25519 (X25519 is the mechanism used for ECDH based on Curve25519), which has quickly gained popularity and is now the default choice of many popular browsers (including Chrome).

This was already supported for TLS 1.3 connections, and with BoringSSL we are now able to support key negotiation based on X25519 at our edge for TLS 1.2 (and earlier) connections as well.

X25519 is now the second most popular elliptic curve algorithm that is being used on our network:

Fancy new crypto, part 2: RSA-PSS for TLS 1.2

Another one of the changes introduced by TLS 1.3 is the adoption of the PSS padding scheme for RSA signatures (RSASSA-PSS). This replaces the more fragile, and historically prone to security vulnerabilities, RSASSA-PKCS1-v1.5, for all TLS 1.3 connections.

RSA PKCS#v1.5 has been known to be vulnerable to known ciphertext attacks since Bleichenbacher’s CRYPTO 98 paper which showed SSL/TLS to be vulnerable to this kind of attacks as well.

The attacker exploits an “oracle”, in this case a TLS server that allows them to determine whether a given ciphertext has been correctly padded under the rules of PKCS1-v1.5 or not. For example, if the server returns a different error for correct padding vs. incorrect padding, that information can be used as an oracle (this is how Bleichenbacher broke SSLv3 in 1998). If incorrect padding causes the handshake to take a measurably different amount of time compared to correct padding, that’s called a timing oracle.

If an attacker has access to an oracle, it can take as little as 15,000 messages to gain enough information to perform an RSA secret-key operation without possessing the secret key. This is enough for the attacker to either decrypt a ciphertext encrypted with RSA, or to forge a signature. Forging a signature allows the attacker to hijack TLS connections, and decrypting a ciphertext allows the attacker to decrypt any connection that do not use forward secrecy.

Since then, SSL/TLS implementations have adopted mitigations to prevent these attacks, but they are tricky to get right, as the recently published F5 vulnerability shows.

With the switch to BoringSSL we made RSA PSS available to TLS 1.2 connections as well. This is already supported "in the wild", and is the preferred scheme by modern browsers like Chrome when dealing with RSA server certificates.

The dark side of the moon

Besides all these new exciting features that we are now offering to all our clients, BoringSSL also has a few internal features that end users won't notice, but that made our life so much easier.

Some of our SSL features required special patches that we maintained in our internal OpenSSL fork, however BoringSSL provides replacements for these features (and more!) out of the box.

Some examples include its private key callback support that we now use to implement Keyless SSL, its asynchronous session lookup callback that we use to support distributed session ID caches (for session resumption with clients that, for whatever reason, don't support session tickets), its equal-preference cipher grouping that allows us to offer ChaCha20-Poly1305 ciphers alongside AES GCM ones and let clients decide which they prefer, or its "select_certificate" callback that we use for inspecting and logging ClientHellos, and for dynamically enabling features depending on the user’s configuration (we were previously using the “cert_cb” callback for the latter, which is also supported by OpenSSL, but we ran into some limitations like the fact that you can’t dynamically change the supported protocol versions with it, or the fact that it is not executed during session resumption).

The case of the missing OCSP

Apart from adding new features, the BoringSSL developers have also been busy working on removing features that most people don't care about, to make the codebase lighter and easier to maintain. For the most part this worked out very well: a huge amount of code has been removed from BoringSSL without anyone noticing.

However one of the features that also got the axe was OCSP. We relied heavily on this feature at our edge to offer OCSP stapling to all clients automatically. So in order to avoid losing this functionality we spent a few weeks working on a replacement, and, surprise! we ended up with a far more reliable OCSP pipeline than when we started. You can read more about the work we did in this blog post.

ChaCha20-Poly1305 draft

Another feature that was removed was support for the legacy ChaCha20-Poly1305 ciphers (not to be confused with the ciphers standardized in RFC7905). These ciphers were deployed by some browsers before the standardization process finished and ended up being incompatible with the standard ciphers later ratified.

We looked at our metrics and realized that a significant percentage of clients still relied on this feature. These would be older mobile clients that don't have AES hardware offloading, and that didn't get software updated to get the newer ChaCha20 ciphers.

We decided to add support for these ciphers back to our own internal BoringSSL fork so that those older clients could still take advantage of them. We will keep monitoring our metrics and decide whether to remove them once the usage drops significantly.

Slow Base64: veni, vidi, vici

One somewhat annoying problem we noticed during a test deployment, was an increase in the startup time of our NGINX instances. Armed with perf and flamegraphs we looked into what was going on and realized the CPU was spending a ridiculous amount of time in BoringSSL’s base64 decoder.

It turns out that we were loading CA trusted certificates from disk (in PEM format, which uses base64) over and over and over in different parts of our NGINX configuration, and because of a change in BoringSSL that was intended to make the base64 decoder constant-time, but also made it several times slower than the decoder in OpenSSL, our startup times also suffered.

Of course the astute reader might ask, why were you loading those certificates from disk multiple times in the first place? And indeed there was no particular reason, other than the fact that the problem went unnoticed until it actually became a problem. So we fixed our configuration to only load the certificates from disk in the configuration sections where they are actually needed, and lived happily ever after.

Conclusion

Despite a few hiccups, this whole process turned out to be fairly smooth, also thanks to the rock-solid stability of the BoringSSL codebase, not to mention its extensive documentation. Not only we ended up with a much better and more easily maintainable system than we had before, but we also managed to contribute a little back to the open-source community.

As a final note we’d like to thank the BoringSSL developers for the great work they poured into the project and for the help they provided us along the way.