On December 16, 2022, Cloudflare discovered a bug where, in limited circumstances, some users with revoked certificates may not have been blocked by Cloudflare firewall settings. Specifically, Cloudflare’s Firewall Rules solution did not block some users with revoked certificates from resuming a session via mutual transport layer security (mTLS), even if the customer had configured Firewall Rules to do so. This bug has been mitigated, and we have no evidence of this being exploited. We notified any customers that may have been impacted in an abundance of caution, so they can check their own logs to determine if an mTLS protected resource was accessed by entities holding a revoked certificate.
One of Cloudflare Firewall Rules’ features, introduced in March 2021, lets customers revoke or block a client certificate, preventing it from being used to authenticate and establish a session. For example, a customer may use Firewall Rules to protect a service by requiring clients to provide a client certificate through the mTLS authentication protocol. Customers could also revoke or disable a client certificate, after which it would no longer be able to be used to authenticate a party initiating an encrypted session via mTLS.
When Cloudflare receives traffic from an end user, a service at the edge is responsible for terminating the incoming TLS connection. From there, this service is a reverse proxy, and it is responsible for acting as a bridge between the end user and various upstreams. Upstreams might include other services within Cloudflare such as Workers or Caching, or may travel through Cloudflare to an external server such as an origin hosting content. Sometimes, you may want to restrict access to an endpoint, ensuring that only authorized actors can access it. Using client certificates is a common way of authenticating users. This is referred to as mutual TLS, because both the server and client provide a certificate. When mTLS is enabled for a specific hostname, this service at the edge is responsible for parsing the incoming client certificate and converting that into metadata that is attached to HTTP requests that are forwarded to upstreams. The upstreams can process this metadata and make the decision whether the client is authorized or not.
Customers can use the Cloudflare dashboard to revoke existing client certificates. Instead of immediately failing handshakes involving revoked client certificates, revocation is optionally enforced via Firewall Rules, which take effect at the HTTP request level. This leaves the decision to enforce revocation with the customer.
So how exactly does this service determine whether a client certificate is revoked?
When we see a client certificate presented as part of the TLS handshake, we store the entire certificate chain on the TLS connection. This means that for every HTTP request that is sent on the connection, the client certificate chain is available to the application. When we receive a request, we look at the following fields related to a client certificate chain:
- Leaf certificate Subject Key Identifier (SKI)
- Leaf certificate Serial Number (SN)
- Issuer certificate SKI
- Issuer certificate SN
Some of these values are used for upstream processing, but the issuer SKI and leaf certificate SN are used to query our internal data stores for revocation status. The data store indexes on an issuer SKI, and stores a collection of revoked leaf certificate serial numbers. If we find the leaf certificate in this collection, we set the relevant metadata for consumption in Firewall Rules.
But what does this have to do with TLS session resumption?
To explain this, let’s first discuss how session resumption works. At a high level, session resumption grants the ability for clients and servers to expedite the handshake process, saving both time and resources. The idea is that if a client and server successfully handshake, then future handshakes are more or less redundant, assuming nothing about the handshake needs to change at a fundamental level (e.g. cipher suite or TLS version).
Traditionally, there are two mechanisms for session resumption - session IDs and session tickets. In both cases, the TLS server will handle encrypting the context of the session, which is basically a snapshot of the acquired TLS state that is built up during the handshake process. Session IDs work in a stateful fashion, meaning that the server is responsible for saving this state, somewhere, and keying against the session ID. When a client provides a session ID in the client hello, the server checks to see if it has a corresponding session cached. If it does, then the handshake process is expedited and the cached session is restored. In contrast, session tickets work in a stateless fashion, meaning that the server has no need to store the encrypted session context. Instead, the server sends the client the encrypted session context (AKA a session ticket). In future handshakes, the client can send the session ticket in the client hello, which the server can decrypt in order to restore the session and expedite the handshake.
Recall that when a client presents a certificate, we store the certificate chain on the TLS connection. It was discovered that when sessions were resumed, the code to store the client certificate chain in application data did not run. As a result, we were left with an empty certificate chain, meaning we were unable to check the revocation status and pass this information to firewall rules for further processing.
To illustrate this, let's use an example where mTLS is used for api.example.com. Firewall Rules are configured to block revoked certificates, and all certificates are revoked. We can reconstruct the client certificate checking behavior using a two-step process. First we use OpenSSL's s_client to perform a handshake using the revoked certificate (recall that revocation has nothing to do with the success of the handshake - it only affects HTTP requests on the connection), and dump the session’s context into a "session.txt" file. We then issue an HTTP request on the connection, which fails with a 403 status code response because the certificate is revoked.
❯ echo -e "GET / HTTP/1.1\r\nHost:api.example.com\r\n\r\n" | openssl s_client -connect api.example.com:443 -cert cert2.pem -key key2.pem -ign_eof -sess_out session.txt | grep 'HTTP/1.1' depth=2 C=IE, O=Baltimore, OU=CyberTrust, CN=Baltimore CyberTrust Root verify return:1 depth=1 C=US, O=Cloudflare, Inc., CN=Cloudflare Inc ECC CA-3 verify return:1 depth=0 C=US, ST=California, L=San Francisco, O=Cloudflare, Inc., CN=sni.cloudflaressl.com verify return:1 HTTP/1.1 403 Forbidden ^C⏎
Now, if we reuse "session.txt" to perform session resumption and then issue an identical HTTP request, the request succeeds. This shouldn't happen. We should fail both requests because they both use the same revoked client certificate.
❯ echo -e "GET / HTTP/1.1\r\nHost:api.example.com\r\n\r\n" | openssl s_client -connect api.example.com:443 -cert cert2.pem -key key2.pem -ign_eof -sess_in session.txt | grep 'HTTP/1.1' HTTP/1.1 200 OK
How we addressed the problem
Upon realizing that session resumption led to the inability to properly check revocation status, our first reaction was to disable session resumption for all mTLS connections. This blocked the vulnerability immediately.
The next step was to figure out how to safely re-enable resumption for mTLS. To do so, we need to remove the requirement of depending on data stored within the TLS connection state. Instead, we can use an API call that will grant us access to the leaf certificate in both session resumption and non session resumption cases. Two pieces of information are necessary: the leaf certificate serial number and the issuer SKI. The issuer SKI is actually included in the leaf certificate, also known as the Authority Key Identifier (AKI). Similar to how one would obtain the SKI for a certificate,
X509_get0_subject_key_id, we can use
X509_get0_authority_key_id to get the AKI.
All timestamps are in UTC
In March 2021 we introduced a new feature in Firewall Rules that allows customers to block traffic from revoked mTLS certificates.
2022-12-16 21:53 - Cloudflare discovers that the vulnerability resulted from a bug whereby certificate revocation status was not checked for session resumptions. Cloudflare begins working on a fix to disable session resumption for all mTLS connections to the edge.
2022-12-17 02:20 - Cloudflare validates the fix and starts to roll out a fix globally.
2022-12-17 21:07 - Rollout is complete, mitigating the vulnerability.
2023-01-12 16:40 - Cloudflare starts to roll out a fix that supports both session resumption and revocation.
2023-01-18 14:07 - Rollout is complete.
In conclusion: once Cloudflare identified the vulnerability, a remediation was put into place quickly. A fix that correctly supports session resumption and revocation has been fully rolled out as of 2023-01-18. After reviewing the logs, Cloudflare has not seen any evidence that this vulnerability has been exploited in the wild.