In 2014, a bug was found in OpenSSL, a popular encryption library used to secure the majority of servers on the Internet. This bug allowed attackers to abuse an obscure feature called TLS heartbeats to read memory from affected servers. Heartbleed was big news because it allowed attackers to extract the most important secret on a server: its TLS/SSL certificate private key. After confirming that the bug was easy to exploit, we revoked and reissued over 100,000 certificates, which highlighted some major issues with how the Internet is secured.
As much as Heartbleed and other key compromise events were painful for security and operations teams around the world, they also provided a learning opportunity for the industry. Over the past seven years, Cloudflare has taken the lessons of Heartbleed and applied them to improve the design of our systems and the resiliency of the Internet overall. Read on to learn how using Cloudflare reduces the risk of key compromise and reduces the cost of recovery if it happens.
Keeping keys safe
An important tenet of security system design is defense-in-depth. Important things should be protected with multiple layers of defense. This is why security-conscious people keep spare house keys in a secure lockbox rather than under the mat. For cryptographic systems that face the Internet, defense-in-depth means designing your systems so that keys are not one exploit away from being stolen. This wasn’t the case for OpenSSL and Heartbleed. Private keys were loaded into memory into a memory-unsafe Internet-facing process, so only a single memory disclosure bug was needed to exfiltrate them.
Keyless SSL: keeping the server separate from the key
An effective defense-in-depth strategy for protecting TLS/SSL private keys is splitting the process into two parts: the private key/authentication part, and the encryption/decryption part. This is exactly why we developed Keyless SSL, to allow customers to keep full control over their private keys while allowing Cloudflare to handle the rest of the connection details. Keyless SSL provides physical separation between where a key is being used and where it is stored. We support Keyless SSL for both software and hardware security modules (HSMs), and today we’re announcing that we now support multiple cloud-based HSMs.
Geo Key Manager: configurable key management by geography
Heartbleed showed us that this strategy could also be useful to add an extra layer of security for private keys that we manage for our customers. In 2017, we launched Geo Key Manager, a feature that allows customers to select which locations in the world they want their keys to be stored. Geo Key Manager protects keys against physical compromise of servers in different geographies. In 2019, we took this even further by implementing a strategy called Keyless Everywhere, moving all managed keys onto a system that provides logical separation between the Internet and the private keys.
Delegated Credentials: key separation with no latency
Keyless SSL is a great security solution, but depending on where keys are stored it can introduce some latency. Because of this, we worked with the IETF to develop a new standard called Delegated Credentials, which allows connections to use short-lived keys that are signed by the certificate instead of the certificate itself in TLS. This eliminates the additional latency introduced by Keyless SSL. We support Delegated Credentials for all Keyless SSL and Geo Key Manager customers; and Firefox 89 (May 2021) will enable Delegated Credential support by default.
Making revocation work
When Heartbleed happened and hundreds of thousands of certificates were deemed potentially compromised, the logical thing to do was to revoke and reissue these certificates. Doing so caused some unexpected consequences.
The first consequence of revocation was a massive surge of network traffic related to the revocation information. In 2014, there were three main mechanisms for certificate revocation:
Certificate Revocation Lists (CRLs)
A list of revoked serial numbers for a given certificate authority
Online Certificate Status Protocol (OCSP)
A protocol to query a certificate authority about the revocation status of a certificate
OCSP can be queried by the browser or responses can be included by the server at connection time “OCSP stapling”
CRLSets — Chrome’s custom revocation system
A meta-list of revoked serial numbers, limited to high-value certificates
Tens of thousands of certificates were revoked at once due to potential compromise from Heartbleed. After this, the CRL for a prominent CA, GlobalSign, increased from 22KB to 4.7MB in a day. This caused major disruptions in Cloudflare’s internal caching infrastructure and bandwidth spikes that affected the Internet at large as all clients that relied on CRLs to check for certificate validation (mostly Microsoft Windows) downloaded the file.
After this revocation event, it became clear that there were other reasons why revocation wasn’t a functional system. In Firefox, if a user makes a connection to a site and a stapled OCSP response is not provided, Firefox queries the OCSP server for a response. However, Firefox implements a fail-open strategy: if the OCSP response takes too long, the revocation check is bypassed and the page is rendered. An attacker with a privileged network position could use a compromised and revoked certificate to attack users by simply blocking the OCSP request and letting the browser skip the revocation check. OCSP stapling is a way for a server to get the OCSP response to the browser in a reliable way, but since stapling is not a requirement for clients, an attacker could just not include the staple and the browser would fail open, leaving the user vulnerable to attack. OCSP also doesn’t provide strong protection against key compromise in fail-open mode (plus, OCSP is a privacy leak, but that’s another issue).
The situation in Chrome is even worse from a security perspective. Since neither OCSP nor CRLs are checked for the majority of certificates (CRLSets only contain revoked “Extended Validation” certificates), most certificates are trusted without checking revocation status. The solution to revoking the set of Cloudflare-managed certificates in Chrome for Heartbleed was actually a short patch to the Chromium codebase. Clearly not a scalable solution!
Mass revocation of certificates back in 2014 plainly didn’t work. Some of the certificates at the time were valid for up to five years, so a compromised key was a problem for a long time.
In 2015, a new standard emerged that seemed to provide a reasonable solution to this problem: OCSP Must-staple. Certificates issued with the must-staple feature are only trusted if accompanied by a valid OCSP staple. If a must-staple certificate is compromised and revoked, then it can only be used to attack users for the lifetime of the last issued OCSP (usually 10 days or less). This is a big improvement and allows certificate owners to limit their overall risk.
Cloudflare has supported best-effort OCSP stapling since 2012. In 2017, we set out to improve the reliability of our OCSP stapling so that we could support OCSP must-staple certificates. The result of this work was High-reliability OCSP stapling and a research study published in IMC that demonstrated the feasibility of OCSP must-staple more broadly on the Internet. Cloudflare now supports OCSP must-staple certificates, providing an additional safety net in case of a future key compromise.
2014 vs. now
We’ve come a long way in seven years. Cloudflare is constantly innovating in the TLS/SSL key protection and security space. Here’s what changed over the past few years:
2014
Five year certificates
Opportunistic OCSP stapling
No OCSP must-staple
Keys in Internet-facing process
No Keyless SSL
No Delegated Credentials
2021
Configurable lifetime certificates (from one year down to two weeks with ACM)
100% OCSP stapling support
OCSP Must-staple support
Keyless Everywhere
Keyless SSL + Cloud HSM support! (new)
Geo Key Manager
Delegated Credential Support
These improvements and more are big reasons why Cloudflare is a leader in the security space and why the next Heartbleed won’t be as bad as the last.