This post is also available in Deutsch, Français.and Español
Today we announced Geo Key Manager, a feature that gives customers unprecedented control over where their private keys are stored when uploaded to Cloudflare. This feature builds on a previous Cloudflare innovation called Keyless SSL and a novel cryptographic access control mechanism based on both identity-based encryption and broadcast encryption. In this post we’ll explain the technical details of this feature, the first of its kind in the industry, and how Cloudflare leveraged its existing network and technologies to build it.
Keys in different area codes
Cloudflare launched Keyless SSL three years ago to wide acclaim. With Keyless SSL, customers are able to take advantage of the full benefits of Cloudflare’s network while keeping their HTTPS private keys inside their own infrastructure. Keyless SSL has been popular with customers in industries with regulations around the control of access to private keys, such as the financial industry. Keyless SSL adoption has been slower outside these regulated industries, partly because it requires customers to run custom software (the key server) inside their infrastructure.
One of the motivating use cases for Keyless SSL was the expectation that customers may not trust a third party like Cloudflare with their private keys. We found that this is actually very uncommon; most customers actually do trust Cloudflare with their private keys. But we have found that sometimes customers would like a way to reduce the risk associated with having their keys in some physical locations around the world.
This is where Geo Key Manager is useful: it lets customers limit the exposure of their private keys to certain locations. It’s similar Keyless SSL, but instead of having to run a key server inside your infrastructure, Cloudflare hosts key servers in the locations of your choosing. This reduces the complexity of deploying Keyless SSL and gives the control that people care about. Geo Key Manager “just works” with no software required.
A Keyless SSL Refresher
Keyless SSL was developed at Cloudflare to make HTTPS more secure. Content served over HTTPS is both encrypted and authenticated so that eavesdroppers or attackers can’t read or modify it. HTTPS makes use of a protocol called Transport Layer Security (TLS) to keep this data safe.
TLS has two phases: a handshake phase and a data exchange phase. In the handshake phase, cryptographic keys are exchanged and a shared secret is established. As part of this exchange, the server proves its identity to the client using a certificate and a private key. In the data exchange phase, shared keys from the handshake are used to encrypt and authenticate the data.
A TLS handshake can be naturally broken down into two components:
- The private key operation
- Everything else
The private key operation is critical to the TLS handshake: it allows the server to prove that it owns a given certificate. Without this private key operation, the client has no way of knowing whether or not the server is authentic. In Keyless SSL, the private key operation is separated from the rest of the handshake. In a Keyless SSL handshake, instead of performing the private key operation locally, the server makes a remote procedure call to another server (called the key server) that controls the private key.
Keyless SSL lets you logically separate concerns so that a compromise of the web server does not result in a compromise of the private key. This additional security comes at a cost. The remote procedure call from the server to the key server can add latency to the handshake, slowing down connection establishment. The additional latency cost corresponds to the round-trip time from the server to the key server, which can be as much as a second if the key server is on the other side of the world.
Luckily, this latency cost only applies to the first time you connect to a server. Once the handshake is complete, the key server is not involved. Furthermore, if you reconnect to a site you don’t have to pay the latency cost either because resuming a connection with TLS Session Resumption doesn’t require the private key.
Latency is only added for the initial connection.
For a deep-dive on this topic, read this post I wrote in 2014.
With Keyless SSL as a basic building block, we set out to design Geo Key Manager.
Geo Key Manager: designing the architecture
Cloudflare has a truly international customer base and we’ve learnt that customers around the world have different regulatory and statutory requirements, and different risk profiles, concerning the placement of their private keys. There’s no one size fits all solution across the planet. With that philosophy in mind, we set out to design a very flexible system for deciding where keys can be kept.
The first problem to solve was access control. How could we limit the number of locations that a private key is sent to? Cloudflare has a database that takes user settings and distributes them to all edge locations securely, and very fast. However, this system is optimized to synchronize an entire database worldwide; modifying it to selectively distribute keys to different locations was too big of an architectural change for Cloudflare. Instead, we decided to explore the idea of a cryptographic access control (CAC) system.
In a CAC system, data is encrypted and distributed everywhere. A piece of data can only be accessed if you have the decryption key. By only sending decryption keys to certain locations, we can effectively limit who has access to data. For example, we could encrypt customer private keys once—right after they’re uploaded—and send the encrypted keys to every location using the existing database replication system.
We’ve experimented with CAC systems before, most notably with the Red October project. With Red October, you can encrypt a piece of data so that multiple people are required to decrypt it. This is how PAL, our Docker Secrets solution works. However, Red October system is ill-suited to Geo Key Manager for a number of reasons:
- The more locations you encrypt to, the larger the encrypted key gets
- There is no way to encrypt a key to “everywhere except a given location” without having to re-encrypt when new locations are added (which we do frequently)
- There has to be a secure registry of each datacenter’s public key
For Geo Key Manager we wanted something that provides users with granular control and can scale with Cloudflare’s growing network. We came up with the following requirements:
- Users can select from a set of pre-defined regions they would like their keys to be in (E.U., U.S., Highest Security, etc.)
Users can add specific datacenter locations outside of the chosen regions (e.g. all U.S. locations plus Toronto)
- Users can choose selected datacenter locations inside their chosen regions to not send keys (e.g. all E.U. locations except London)
- It should be fast to decrypt a key and easy to store it, no matter how complicated the configuration
Building a system to satisfy these requirements gives customers the freedom to decide where their keys are kept and the scalability necessary to be useful with a growing network.
The cryptographic tool that allows us to satisfy these requirements is called Identity-based encryption.
Unlike traditional public key cryptography, where each party has a public key and a private key, in identity-based encryption, your identity is your public key. This is a very powerful concept, because it allows you to encrypt data to a person without having to obtain a their public key ahead of time. Instead of a large random-looking number, a public key can literally be any string, such as “[email protected]” or “beebob”. Identity-based encryption is useful for email, where you can imagine encrypting a message using person’s email address as the public key. Compare this to the complexity of using PGP, where you need to find someone’s public key and validate it before you send a message. With identity-based encryption, you can even encrypt data to someone before they have the private key associated with their identity (which is managed by a central key generation authority).
ID-based encryption was proposed by Shamir in the 80s, but it wasn’t fully practical until Boneh and Franklin’s proposal in 2001. Since then, a variety of interesting schemes been discovered and put to use. The underlying cryptographic primitive that makes efficient ID-based encryption possible is called Elliptic Curve Pairings, a topic that deserves its own blog post.
Our scheme is based the combination of two primitives:
Identity-Based Broadcast Encryption (IBBE)
Identity-Based Revocation (IBR)
Identity-Based Broadcast Encryption (IBBE) is like a allowlist. It lets you take a piece of data and encrypt it to a set of recipients. The specific construction we use is from a 2007 paper by Delerablee. Critically, the size of the ciphertext does not depend on the number of recipients. This means we can efficiently encrypt data without it getting larger no matter how many recipients there are (or PoPs in our case).
Identity-based Revocation (IBR) is like a blocklist. It lets you encrypt a piece of data so that all recipients can decrypt it except for a pre-defined set of recipients who are excluded. The implementation we used was from section 4 of a paper by Attrapadung et al. from 2011. Again, the ciphertext size does not depend on the number of excluded identities.
These two primitives can be combined to create a very flexible cryptographic access control scheme. To do this, create two sets of identities: an identity for each region, and an identity for each datacenter location. Once these sets have been decided, each server is provisioned the identity-based encryption private key for its region and its location.
With this in place, you can configure access to the key in terms of the following sets:
- Set of regions to encrypt to
- Set of locations inside the region to exclude
- Set of locations outside the region to include
Here’s how you can encrypt a customer key so that a given access control policy (regions, blocklist, allowlist) can be enforced:
- Create a key encryption key KEK
- Split it in two halves KEK1, KEK2
- Encrypt KEK1 with IBBE to include the set of regions (allowlist regions)
- Encrypt KEK2 with IBR to exclude locations in the regions defined in 3 (blocklist colo)
- Encrypt KEK with IBBE to include locations not in the regions defined in 3 (allowlist colo)
- Send all three encrypted keys (KEK1)region, (KEK2)exclude, (KEK)include to all locations
This can be visualized as follows with
- Regions: U.S. and E.U.
- Blocklist colos: Dallas and London
- Allowlist colos: Sydney and Tokyo
The locations inside the regions allowlist can decrypt half of the key (KEK1), and need to also be outside the blocklist to decrypt the other half of the key (KEK2). In the example, London and Dallas only have access to KEK1 because they can’t decrypt KEK2. These locations can’t reconstruct KEK and therefore can’t decrypt the private key. Every other location in the E.U. and U.S. can decrypt KEK1 and KEK2 so they can construct KEK to decrypt the private key. Tokyo and Sydney can decrypt the KEK from the allowlist and use it to decrypt the private key.
This will make the private TLS key available in all of the EU and US except for Dallas and London and it will additionally be available in Tokyo and Sydney. The result is the following map:
This design lets customers choose with per-datacenter granularity where their private keys can be accessed. If someone connects to a datacenter that can’t decrypt the private key, that SSL connection is handled using Keyless SSL, where the key server is located in a location with access to the key. The Geo Key code automatically finds the nearest data center that can access the relevant TLS private key and uses Keyless SSL to handle the TLS handshake with the client.
Creating a key server network
Any time you use Keyless SSL for a new connection, there’s going to be a latency cost for connecting to the key server. With Geo Key Manager, we wanted to reduce this latency cost as much as possible. In practical terms, this means we need to know which key server will respond fastest.
To solve this, we created an overlay network between all of our datacenters to measure latency. Every datacenter has an outbound connection to every other datacenter. Every few seconds, we send a “ping” (a message in the Keyless SSL protocol, not an ICMP message) and we measure how long it takes the server to send a corresponding “pong”.
When a client connects to a site behind Cloudflare in a datacenter that can’t decrypt the private key for that site, we use metadata to find out which datacenters have access to the key. We then choose the datacenter that has the lowest latency according to our measurements and use that datacenter’s key server for Keyless SSL. If the location with the lowest latency is overloaded, we may choose another location with higher latency but more capacity.
The data from these measurements was used to construct the following map, highlighting the additional latency added for visitors around the world for the US-only configuration.
Latency added when keys are in U.S. only. Green: no latency cost,
Red: > 100ms
We’re constantly innovating to provide our customers with powerful features that are simple to use. With Geo Key Manager, we are leveraging Keyless SSL and 21st century cryptography to improve private key security in an increasingly complicated geo-political climate.