The Cloudflare Blog

Inside Geo Key Manager v2: re-imagining access control for distributed systems

Tanya Verma — Fri, 27 Jan 2023 14:00:00 GMT

In December 2022 we announced the closed beta of the new version of Geo Key Manager. Geo Key Manager v2 (GeoV2) is the next step in our journey to provide customers with a secure and flexible way to control the distribution of their private keys by geographic location. Our original system, Geo Key Manager v1, was launched as a research project in 2017, but as customer needs evolved and our scale increased, we realized that we needed to make significant improvements to provide a better user experience.

One of the principal challenges we faced with Geo Key Manager v1 (GeoV1) was the inflexibility of our access control policies. Customers required richer data localization, often spurred by regulatory concerns. Internally, events such as the conflict in Ukraine reinforced the need to be able to quickly restrict access to sensitive key material. Geo Key Manager v1’s underlying cryptography was a combination of identity-based broadcast encryption and identity-based revocation that simulated a subset of the functionality offered by Attribute-Based Encryption (ABE). Replacing this with an established ABE scheme addressed the inflexibility of our access control policies and provided a more secure foundation for our system.

Unlike our previous scheme, which limited future flexibility by freezing the set of participating data centers and policies at the outset, using ABE made the system easily adaptable for future needs. It allowed us to take advantage of performance gains from additional data centers added after instantiation and drastically simplified the process for handling changes to attributes and policies. Furthermore, GeoV1 struggled with some perplexing performance issues that contributed to high tail latency and a painfully manual key rotation process. GeoV2 is our answer to these challenges and limitations of GeoV1.

While this blog focuses on our solution for geographical key management, the lessons here can also be applied to other access control needs. Access control solutions are traditionally implemented using a highly-available central authority to police access to resources. As we will see, ABE allows us to avoid this single point of failure. As there are no large scale ABE-based access control systems we are aware of, we hope our discussion can help engineers consider using ABE as an alternative to access control with minimal reliance on a centralized authority. To facilitate this, we’ve included our implementation of ABE in CIRCL, our open source cryptographic library.

Unsatisfactory attempts at a solution

Before coming back to GeoV2, let’s take a little detour and examine the problem we’re trying to solve.

Consider this example: a large European bank wants to store their TLS private keys only within the EU. This bank is a customer of Cloudflare, which means we perform TLS handshakes on their behalf. The reason we need to terminate TLS for them is so that we can provide the best protection against DDoS attacks, improve performance by caching, support web application firewalls, etc.

In order to terminate TLS, we need to have access to their TLS private keys¹. The control plane, which handles API traffic, encrypts the customer’s uploaded private key with a master public key shared amongst all machines globally. It then puts the key into a globally distributed KV store, Quicksilver. This means every machine in every data center around the world has a local copy of this customer’s TLS private key. Consequently, every machine in each data center has a copy of every customer’s private key.

Customer uploading their TLS certificate and private key to be stored in all data centers

This bank however, wants its key to be stored only in EU data centers. In order to allow this to happen, we have three options.

The first option is to ensure that only EU data centers can receive this key and terminate the handshake. All other machines proxy TLS requests to an EU server for processing. This would require giving each machine only a subset of the entire keyset stored in Quicksilver, which challenges core design decisions Cloudflare has made over the years that assume the entire dataset is replicated on every machine.

Restricting customer keys to EU data centers

Another option is to store the keys in the core data center instead of Quicksilver. This would allow us to enforce the proper access control policy every time, ensuring that only certain machines can access certain keys. However, this would defeat the purpose of having a global network in the first place: to reduce latency and avoid a single point of failure at the core.

Storing keys in core data center where complicated business logic runs to enforce policies

A third option is to use public key cryptography. Instead of having a master key pair, every data center is issued its own key pair. The core encrypts the customer's private key with the keys of every data center allowed to use it. Only machines in the EU will be able to access the key in this example. Let’s assume there are 500 data centers, with 50 machines each. Of these 500 data centers, let’s say 200 are in the EU. Where 100 keys of 1kB consumed a total of 100 x 500 x 50 x 1 kB (globally), now they will consume 200 times that, and in the worst case, up to 500 times. This increases the space it takes to store the keys on each machine by a whole new factor - before, the storage space was purely a function of how many customer keys are registered; now, the storage space is still a function of the number of customer keys, but also multiplied by the number of data centers.

Assigning unique keys to each data center and wrapping customer key with EU data center keys

Unfortunately, all three of these options are undesirable in their own ways. They would either require changing fundamental assumptions we made about the architecture of Cloudflare, abandoning the advantages of using a highly distributed network, or quadratically increasing the storage this feature uses.

A deeper look at the third option reveals – why not create two key pairs instead of a unique one for each data center? One pair would be common among all EU data centers, and one for all non-EU data centers. This way, the core only needs to encrypt the customer’s key twice instead of for each EU data center. This is a good solution for the EU bank, but it doesn’t scale once we start adding additional policies. Consider the example: a data center in New York City could have a key for the policy “country: US”, another one for “country: US or region: EU”, another one for “not country: RU”, and so on… You can already see this getting rather unwieldy. And every time a new data center is provisioned, all policies must be re-evaluated and the appropriate keys assigned.

A key for each policy and its negation

Geo Key Manager v1: identity-based encryption and broadcast encryption

The invention of RSA in 1978 kicked off the era of modern public key cryptography, but anyone who has used GPG or is involved with certificate authorities can attest to the difficulty of managing public key infrastructure that connects keys to user identities. In 1984, Shamir asked if it was possible to create a public-key encryption system where the public key could be any string. His motivation for this question was to simplify email management. Instead of encrypting an email to Bob using Bob’s public key, Alice could encrypt it to Bob’s identity bob@institution.org. Finally, in 2001, Boneh and Franklin figured out how to make it work.

Broadcast encryption was first proposed in 1993 by Fiat and Naor. It lets you send the same encrypted message to everyone, but only people with the right key can decrypt it. Looking back to our third option, instead of wrapping the customer’s key with the key of every EU data center, we could use broadcast encryption to create a singular encryption of the customer’s key that only EU-based data centers could decrypt. This would solve the storage problem.

Geo Key Manager v1 used a combination of identity-based broadcast encryption and identity-based revocation to implement access control. Briefly, a set of identities is designated for each region and each data center location. Then, each machine is issued an identity-based private key for its region and location. With this in place, access to the customer’s key can be controlled using three sets: the set of regions to encrypt to, the set of locations inside the region to exclude, and the set of locations outside the region to include. For example, the customer’s key could be encrypted so that it is available in all regions except for a few specific locations, and also available in a few locations outside those regions. This blog post has all the nitty-gritty details of this approach.

Unfortunately this scheme was insufficiently responsive to customer needs; the parameters used during initial cryptographic setup, such as the list of regions, data centers, and their attributes, were baked into the system and could not be easily changed. Tough luck excluding the UK from the EU region post Brexit, or supporting a new region based on a recent compliance standard that customers need. Using a predetermined static list of locations also made it difficult to quickly revoke machine access. Additionally, decryption keys could not be assigned to new data centers provisioned after setup, preventing them from speeding up requests. These limitations provided the impetus for integrating Attribute-Based Encryption (ABE) into Geo Key Manager.

Attribute-Based Encryption

In 2004, Amit Sahai and Brent Waters proposed a new cryptosystem based on access policies, known as attribute-based encryption (ABE). Essentially, a message is encrypted under an access policy rather than an identity. Users are issued a private key based on their attributes, and they can only decrypt the message if their attributes satisfy the policy. This allows for more flexible and fine-grained access control than traditional methods of encryption.

Brief timeline of Public Key Encryption

The policy can be attached either to the key or to the ciphertext, leading to two variants of ABE: key-policy attribute-based encryption (KP-ABE) and ciphertext-policy attribute-based encryption (CP-ABE). There exist trade-offs between them, but they are functionally equivalent as they are duals of each other. Let’s focus on CP-ABE it aligns more closely with real-world access control. Imagine a hospital where a doctor has the attributes “role: doctor” and “region: US”, while a nurse has the attributes “role: nurse” and “region: EU”. A document encrypted under the policy “role: doctor or region: EU” can be decrypted by both the doctor and nurse. In other words, ABE is like a magical lock that only opens for people who have the right attributes.

Policy	Semantics
country: US or region: EU	Decryption is possible either in the US or in the European Union
not (country: RU or country: US)	Decryption is not possible in Russia and US
country: US and security: high	Decryption is possible only in data centers within the US that have a high level of security (for some security definition established previously)

There are many different ABE schemes out there, with varying properties. The scheme we choose must satisfy a few requirements:

Negation We want to be able to support boolean formulas consisting of AND, OR and NOT, aka non-monotonic boolean formulas. While practically every scheme handles AND and OR, NOT is rarer to find. Negation makes blocklisting certain countries or machines easier.
Repeated Attributes Consider the policy “organization: executive or (organization: weapons and clearance: top-secret)”. The attribute “organization” has been repeated twice in the policy. Schemes with support for repetition add significant expressibility and flexibility when composing policies.
Security against Chosen Ciphertext Attacks Most schemes are presented in a form that is only secure if the attacker doesn’t choose the messages to decrypt (CPA). There are standard ways to convert such a scheme into one that is secure even if the attacker manipulates ciphertexts (CCA), but it isn’t automatic. We apply the well-known Boneh-Katz transform to our chosen scheme to make it secure against this class of attacks. We will present a proof of security for the end to end scheme in our forthcoming paper.

Negation in particular deserves further comment. For an attribute to be satisfied when negated, the name must stay the same, but the value must differ. It’s like the data center is saying, “I have a country, but it’s definitely not Japan”, instead of “I don’t have a country”. This might seem counterintuitive, but it enables decryption without needing to examine every attribute value. It also makes it safe to roll out attributes incrementally. Based on these criteria, we ended up choosing the scheme by Tomida et al (2021).

Implementing a complex cryptographic scheme such as this can be quite challenging. The discrete log assumption that underlies traditional public key cryptography is not sufficient to meet the security requirements of ABE. ABE schemes must secure both ciphertexts and the attribute-based secret keys, whereas traditional public key cryptography only imposes security constraints on the ciphertexts, while the secret key is merely an integer. To achieve this, most ABE schemes are constructed using a mathematical operation known as bilinear pairings.

The speed at which we can perform pairing operations determines the baseline performance of our implementation. Their efficiency is particularly desirable during decryption, where they are used to combine the attribute-based secret key with the ciphertext in order to recover the plaintext. To this end, we rely on our highly optimized pairing implementations in our open source library of cryptographic suites, CIRCL, which we discuss at length in a previous blog. Additionally, the various keys, attributes and the ciphertext that embeds the access structure are expressed as matrices and vectors. We wrote linear algebra routines to handle matrix operations such as multiplication, transpose, inverse that are necessary to manipulate the structures as needed. We also added serialization, extensive testing and benchmarking. Finally, we implemented our conversion to a CCA2 secure scheme.

In addition to the core cryptography, we had to decide how to express and represent policies. Ultimately we decided on using strings for our API. While perhaps less convenient for programs than structures would be, users of our scheme would have to implement a parser anyway. Having us do it for them seemed like a way to have a more stable interface. This means the frontend of our policy language was composed of boolean expressions as strings, such as “country: JP or (not region: EU)”, while the backend is a monotonic boolean circuit consisting of wires and gates. Monotonic boolean circuits only include AND and OR gates. In order to handle NOT gates, we assigned positive or negative values to the wires. Every NOT gate can be placed directly on a wire because of De Morgan’s Law, which allows the conversion of a formula like “not (X and Y)” into “not X or not Y”, and similarly for disjunction.

The following is a demonstration of the API. The central authority runs Setup to generate the master public key and master secret key. The master public key can be used by anyone to encrypt a message over an access policy. The master secret key, held by the central authority, is used to generate secret keys for users based on their attributes. Attributes themselves can be supplied out-of-band. In our case, we rely on the machine provisioning database to provide and validate attributes. These attribute-based secret keys are securely distributed to users, such as over TLS, and are used to decrypt ciphertexts. The API also includes helper functions to check decryption capabilities and extract policies from ciphertexts for improved usability.

publicKey, masterSecretKey := cpabe.Setup()

policy := cpabe.Policy{}
policy.FromString("country: US or region: EU")

ciphertext := publicKey.Encrypt(policy, []byte("secret message"))

attrsParisDC := cpabe.Attributes{}
attrsParisDC.FromMap(map[string]string{"country": "FR", "region": "EU"}

secretKeyParisDC := masterSecretKey.KeyGen(attrsParisDC)

plaintext := secretKeyParisDC.Decrypt(ciphertext)

assertEquals(plaintext, "secret message")

We now come back to our original example. This time, the central authority holds the master secret key. Each machine in every data center presents its set of attributes to the central authority, which, after some validation, generates a unique attribute-based secret key for that particular machine. Key issuance happens when a machine is first brought up, if keys must be rotated, or if an attribute has changed, but never in the critical path of a TLS handshake. This solution is also collusion resistant, which means two machines without the appropriate attributes cannot combine their keys to decrypt a secret that they individually could not decrypt. For example, a machine with the attribute “country: US” and another with “security: high”. These machines cannot collude together to decrypt a resource with the policy “country: US and security: high”.

Crucially, this solution can seamlessly scale and respond to changes to machines. If a new machine is added, the central authority can simply issue it a secret key since the participants of the scheme don’t have to be predetermined at setup, unlike our previous identity-broadcast scheme.

Key Distribution

When a customer uploads their TLS certificate, they can specify a policy, and the central authority will encrypt their private key with the master public key under the specified policy. The encrypted customer key then gets written to Quicksilver, to be distributed to all data centers. In practice, there is a layer of indirection here that we will discuss in a later section.

Encryption using Master Public Key

When a user visits the customer’s website, the TLS termination service at the data center that first receives the request, fetches the customer’s encrypted private key from Quicksilver. If the service's attributes do not satisfy the policy, decryption fails and the request is proxied to the closest data center that satisfies the policy. Whichever data center can successfully decrypt the key performs the signature to complete the TLS handshake.

Decryption using Attribute-based Secret Key (Simplified)

The following table summarizes the pros and cons of the various solutions we discussed:

Solution	Flexible policies	Fault Tolerant	Efficient Space	Low Latency	Collusion-resistant	Changes to machines
Different copies of Quicksilver in data centers	✅		✅	✅	✅	✅
Complicated Business Logic in Core	✅		✅		✅	✅
Encrypt customer keys with each data center’s unique keypair	✅	✅		✅	✅
Encrypt customer keys with a policy-based keypair, where each data center has multiple policy-based keypairs		✅		✅	✅
Identity-Based Broadcast Encryption + Identity-Based Negative Broadcast Encryption(Geo Key Manager v1)		✅	✅	✅
Attribute-Based Encryption(Geo Key Manager v2)	✅	✅	✅	✅	✅	✅

Performance characteristics

We characterize our scheme’s performance on measures inspired by ECRYPT. We set the attribute size to 50, which is significantly higher than necessary for most applications, but serves as a worst case scenario for benchmarking purposes. We conduct our measurements on a laptop with Intel Core i7-10610U CPU @ 1.80GHz and compare the results against RSA with 2048-bit security, X25519 and our previous scheme.

Scheme	Secret key(bytes)	Public key(bytes)	Overhead of encrypting 23 bytes (ciphertext length - message length)	Overhead of encrypting 10k bytes (ciphertext length - message length)
RSA-2048	1190 (PKCS#1)	256	233	3568
X25519	32	32	48	48
GeoV1 scheme	4838	4742	169	169
GeoV2 ABE scheme	33416	3282	19419	19419

Different attribute based encryption schemes optimize for different performance profiles. Some may have fast key generation, while others may prioritize fast decryption. In our case, we only care about fast decryption because it is the only part of the process that lies in the critical path of a request. Everything else happens out-of-band where the extra overhead is acceptable.

Scheme	Generating keypair	Encrypting 23 bytes	Decrypting 23 bytes
RSA-2048	117 ms	0.043 ms	1.26 ms
X25519	0.045 ms	0.093 ms	0.046 ms
GeoV1 scheme	75 ms	10.7 ms	13.9 ms
GeoV2 ABE scheme	1796 ms	704 ms	62.4 ms

A Brief Note on Attribute-Based Access Control (ABAC)

We have used Attribute-Based Encryption to implement what is commonly known as Attribute-Based Access Control (ABAC).

ABAC is an extension of the more familiar Role-Based Access Control (RBAC). To understand why ABAC is relevant, let’s briefly discuss its origins. In 1970, the United States Department of Defense introduced Discretionary Access Control (DAC). DAC is how Unix file systems are implemented. But DAC isn’t enough if you want to restrict resharing, because the owner of the resource can grant other users permission to access it in ways that the central administrator does not agree with. To address this, the Department of Defense introduced Mandatory Access Control (MAC). DRM is a good example of MAC. Even though you have the file, you don’t have a right to share it to others.

RBAC is an implementation of certain aspects of MAC. ABAC is an extension of RBAC that was defined by NIST in 2017 to address the increasing characteristics of users that are not restricted to their roles, such as time of day, user agent, and so on.

However, RBAC/ABAC is simply a specification. While they are traditionally implemented using a central authority to police access to some resource, it doesn’t have to be so. Attribute-based encryption is an excellent mechanism to implement ABAC in distributed systems.

Key rotation

While it may be tempting to attribute all failures to DNS, changing keys is another strong contender in this race. Suffering through the rather manual and error-prone key rotation process of Geo Key Manager v1 taught us to make robust and simple key rotation without impact on availability, an explicit design goal for Geo Key Manager v2.

To facilitate key rotation and improve performance, we introduce a layer of indirection to the customer key wrapping (encryption) process. When a customer uploads their TLS private key, instead of encrypting with the Master Public Key, we generate a X25519 keypair, called the policy key. The central authority then adds the public part of this newly minted policy keypair and its associated policy label to a database. It then encrypts the private half of the policy keypair with the Master Public Key, over the associated access policy. The customer’s private key is encrypted with the public policy key, and saved into Quicksilver.

When a user accesses the customer’s website, the TLS termination service at the data center that receives the request fetches the encrypted policy key associated with the customer’s access policy. If the machine’s attributes don’t satisfy the policy, decryption fails and the request is forwarded to the closest satisfying data center. If decryption succeeds, the policy key is used to decrypt the customer’s private key and complete the handshake.

Key	Purpose	CA in core	Core	Network
Master Public Key	Encrypts private policy keys over an access policy	Generate	Read
Master Secret Key	Generates secret keys for machines based on their attributes	Generate,Read
Machine Secret Key / Attribute-Based Secret Key	Decrypts private policy keys stored in global KV store, Quicksilver	Generate		Read
Customer TLS Private Key	Performs digital signature necessary to complete TLS handshake to the customer’s website		Read (transiently on upload)	Read
Public Policy Key	Encrypts customers’ TLS private keys		Generate, Read
Private Policy Key	Decrypts customer’s TLS private keys	Read (transiently during key rotation)	Generate	Read

However, policy keys are not generated for every customer’s certificate upload. As shown in the figure below, if a customer requests a policy that already exists in the system and thus has an associated policy key, the policy key will get re-used. Since most customers use the same few policies, such as restricting to one country, or restricting to the EU, the number of policy keys is orders of magnitude smaller compared to the number of customer keys.

Policy Keys

This sharing of policy keys is tremendously useful for key rotation. When master keys are rotated (and consequently the machine secret keys), only the handful of policy keys used to control access to the customers’ keys need to be re-encrypted, rather than every customer’s key encryption. This reduces compute and bandwidth requirements. Additionally, caching policy keys at the TLS termination service improves performance by reducing the need for frequent decryptions in the critical path.

This is similar to hybrid encryption, where public key cryptography is used to establish a shared symmetric key, which then gets used to encrypt data. The difference here is that the policy keys are not symmetric, but rather X25519 keypairs, which is an asymmetric scheme based on elliptic curves. While not as fast as symmetric schemes like AES, traditional elliptic curve cryptography is significantly faster than attribute-based encryption. The advantage here is that the central service doesn’t need access to secret key material to encrypt customer keys.

The other component of robust key rotation involves maintaining multiple key versions.The latest key generation is used for encryption, but the latest and previous versions can be used for decryption. We use a system of states to manage key transitions and safe deletion of older keys. We also have extensive monitoring in place to alert us if any machines are not using the appropriate key generations.

The Tail At Scale

Geo Key Manager suffered from high tail latency, which occasionally impacted availability. Jeff Dean’s paper, The Tail at Scale, is an enlightening read on how even elevated p99 latency at Cloudflare scale can be damaging. Despite revamping the server and client components of our service, the p99 latency didn’t budge. These revamps, such as switching from worker pools to one goroutine per request, did simplify the service, as they removed thousands of lines of code. Distributed tracing was able to pin down the delays: they took place between the client sending a request and the server receiving it. But we could not dig in further. We even wrote a blog last year describing our debugging endeavors, but without a concrete solution.

Finally, we realized that there is a level of indirection between the client and the server. Our data centers around the world are very different sizes. To avoid swamping smaller data centers with connections, larger data centers would task individual, intermediary machines with proxying requests to other data centers using the Go net/rpc library.

Once we included the forwarding function on the intermediary server in the trace, the problem became clear. There was a long delay between issuing the request and processing it. Yet the code was merely a call to a built-in library function. Why was it delaying the request?

Ultimately we found that there was a lock held while the request was serialized. The net/rpc package does not support streams, but our packet-oriented custom application protocol, which we wrote before the advent of gRPC, does support streaming. To bridge this gap, we executed a request and waited for the response in the serialization function. While an expedient way to get the code written, it created a performance bottleneck as only one request could be forwarded at a time.

Our solution was to use channels for coordination, letting multiple requests execute while we waited for the responses to arrive. When we rolled it out we saw dramatic decreases in tail latency.

The results of fixing RPC failures in remote colo in Australia

Unfortunately we cannot make the speed of light any faster (yet). Customers who want their keys kept only in the US while their website users are in the land down under will have to endure some delays as we make the trans-pacific voyage. But thanks to session tickets, those delays only affect new connections.

Uptime was also significantly improved. Data centers provisioned after cryptographic initiation could now participate in the system, which also implies that data centers that did not satisfy a certain policy had a broader range of satisfying neighbors to which they could forward the signing request to. This increased redundancy in the system, and particularly benefited data centers in regions without the best internet connectivity. The graph below represents successful probes spanning every machine globally over a two-day period. For GeoV1, we see websites with policies for US and EU regions falling to under 98% at one point, while for GeoV2, uptime rarely drops below 4 9s of availability.

Uptime by Key Profile across US and EU for GeoV1 and GeoV2, and IN for GeoV2

Conclusion

Congratulations dear reader for making it this far. Just like you, applied cryptography has come a long way, but only limited slivers manage to penetrate the barrier between research and real-world adoption. Bridging this gap can help enable novel capabilities for protecting sensitive data. Attribute-based encryption itself has become much more efficient and featureful over the past few years. We hope that this post encourages you to consider ABE for your own access control needs, particularly if you deal with distributed systems and don’t want to depend on a highly available central authority. We have open-sourced our implementation of CP-ABE in CIRCL, and plan on publishing a paper with additional details.

We look forward to the numerous product improvements to Geo Key Manager made possible by this new cryptographic foundation. We plan to use this ABE-based mechanism for storing not just private keys, but also other types of data. We are working on making it more user-friendly and generalizable for internal services to use.

Acknowledgements

We’d like to thank Watson Ladd for his contributions to this project during his tenure at Cloudflare.

......

¹While true for most customers, we do offer Keyless SSL that allows customers who can run their own keyservers, the ability to store their private keys on-prem

A new, configurable and scalable version of Geo Key Manager, now available in Closed Beta

Dina Kozlov — Thu, 15 Dec 2022 14:00:00 GMT

Today, traffic on the Internet stays encrypted through the use of public and private keys that encrypt the data as it's being transmitted. Cloudflare helps secure millions of websites by managing the encryption keys that keep this data protected. To provide lightning fast services, Cloudflare stores these keys on our fleet of data centers that spans more than 150 countries. However, some compliance regulations require that private keys are only stored in specific geographic locations.

In 2017, we introduced Geo Key Manager, a product that allows customers to store and manage the encryption keys for their domains in different geographic locations so that compliance regulations are met and that data remains secure. We launched the product a few months before General Data Protection Regulation (GDPR) went into effect and built it to support three regions: the US, the European Union (EU), and a set of our top tier data centers that employ the highest security measures. Since then, GDPR-like laws have quickly expanded and now, more than 15 countries have comparable data protection laws or regulations that include restrictions on data transfer across and/or data localization within a certain boundary.

At Cloudflare, we like to be prepared for the future. We want to give our customers tools that allow them to maintain compliance in this ever-changing environment. That’s why we’re excited to announce a new version of Geo Key Manager — one that allows customers to define boundaries by country, ”only store my private keys in India”, by a region ”only store my private keys in the European Union”, or by a standard, such as “only store my private keys in FIPS compliant data centers” — now available in Closed Beta, sign up here!

Learnings from Geo Key Manager v1

Geo Key Manager has been around for a few years now, and we’ve used this time to gather feedback from our customers. As the demand for a more flexible system grew, we decided to go back to the drawing board and create a new version of Geo Key Manager that would better meet our customers’ needs.

We initially launched Geo Key Manager with support for US, EU, and Highest Security Data centers. Those regions were sufficient at the time, but customers wrestling with data localization obligations in other jurisdictions need more flexibility when it comes to selecting countries and regions. Some customers want to be able to set restrictions to maintain their private keys in one country, some want the keys stored everywhere except in certain countries, and some may want to mix and match rules and say “store them in X and Y, but not in Z”. What we learned from our customers is that they need flexibility, something that will allow them to keep up with the ever-changing rules and policies — and that’s what we set out to build out.

The next issue we faced was scalability. When we built the initial regions, we included a hard-coded list of data centers that met our criteria for the US, EU, “high security” data center regions. However, this list was static because the underlying cryptography did not support dynamic changes to our list of data centers. In order to distribute private keys to new data centers that met our criteria, we would have had to completely overhaul the system. In addition to that, our network significantly expands every year, with more than 100 new data centers since the initial launch. That means that any new potential locations that could be used to store private keys are currently not in use, degrading the performance and reliability of customers using this feature.

With our current scale, automation and expansion is a must-have. Our new system needs to dynamically scale every time we onboard or remove a data center from our Network, without any human intervention or large overhaul.

Finally, one of our biggest learnings was that customers make mistakes, such as defining a region that’s so small that availability becomes a concern. Our job is to prevent our customers from making changes that we know will negatively impact them.

Define your own geo-restrictions with the new version of Geo Key Manager

Cloudflare has significantly grown in the last few years and so has our international customer base. Customers need to keep their traffic regionalized. This region can be as broad as a continent — Asia, for example. Or, it can be a specific country, like Japan.

From our conversations with our customers, we’ve heard that they want to be able to define these regions themselves. This is why today we’re excited to announce that customers will be able to use Geo Key Manager to create what we call “policies”.

A policy can be a single country, defined by two-letter (ISO 3166) country code. It can be a region, such as “EU” for the European Union or Oceania. It can be a mix and match of the two, “country:US or region: EU”.

Our new policy based Geo Key Manager allows you to create allowlist or blocklists of countries and supported regions, giving you control over the boundary in which your private key will be stored. If you’d like to store your private keys globally and omit a few countries, you can do that.

If you would like to store your private keys in the EU and US, you would make the following API call:

curl -X POST "https://api.cloudflare.com/client/v4/zones/zone_id/custom_certificates" \
     -H "X-Auth-Email: user@example.com" \
     -H "X-Auth-Key: auth-key" \
     -H "Content-Type: application/json" \
     --data '{"certificate":"certificate","private_key":"private_key","policy":"(country: US) or (region: EU)", "type": "sni_custom"}'

If you would like to store your private keys in the EU, but not in France, here is how you can define that:

curl -X POST "https://api.cloudflare.com/client/v4/zones/zone_id/custom_certificates" \
     -H "X-Auth-Email: user@example.com" \
     -H "X-Auth-Key: auth-key" \
     -H "Content-Type: application/json" \
     --data '{"certificate":"certificate","private_key":"private_key","policy": "region: EU and (not country: FR)", "type": "sni_custom"}'

Geo Key Manager can now support more than 30 countries and regions. But that’s not all! The superpower of our Geo Key Manager technology is that it doesn’t actually have to be “geo” based, but instead, it’s attribute based. In the future, we’ll have a policy that will allow our customers to define where their private keys are stored based on a compliance standard like FedRAMP or ISO 27001.

Reliability, resiliency, and redundancy

By giving our customers the remote control for Geo Key Manager, we want to make sure that customers understand the impact of their changes on both redundancy and latency.

On the redundancy side, one of our biggest concerns is allowing customers to choose a region small enough that if a data center is removed for maintenance, for example, then availability is drastically impacted. To protect our customers, we’ve added redundancy restrictions. These prevent our customers from setting regions with too few data centers, ensuring that all the data centers within a policy can offer high availability and redundancy.

Not just that, but in the last few years, we’ve significantly improved the underlying networking that powers Geo Key Manager. For more information on how we did that, keep an eye out for a technical deep dive inside Geo Key Manager.

Performance matters

With the original regions (US, EU, and Highest Security Data Centers), we learned customers may overlook possible latency impacts that occur when defining the key manager to a certain region. Imagine your keys are stored in the US. For your Asia-based customers, there’s going to be some latency impact for the requests that go around the world. Now, with customers being able to define more granular regions, we want to make sure that before customers make that change, they see the impact of it.

If you’re an E-Commerce platform then performance is always top-of-mind. One thing that we’re working on right now is performance metrics for Geo Key Manager policies both from a regional point of view — “what’s the latency impact for Asia based customers?” and from a global point of view — “for anyone in the world, what is the average impact of this policy?”.

By seeing the latency impact, if you see that the impact is unacceptable, you may want to create a separate domain for your service that’s specific to the region that it’s serving.

Closed Beta, now available!

Interested in trying out the latest version of Geo Key Manager? Fill out this form.

Coming soon!

Geo Key Manager is only available via API at the moment. But, we are working on creating an easy-to-use UI for it, so that customers can easily manage their policies and regions. In addition, we’ll surface performance measurements and warnings when we see any degraded impact in terms of performance or redundancy to ensure that customers are mindful when setting policies.

We’re also excited to extend our Geo Key Manager product beyond custom uploaded certificates. In the future, certificates issued through Advanced Certificate Manager or SSL for SaaS will be allowed to add policy based restrictions for the key storage.

Finally, we’re looking to add more default regions to make the selection process simple for our customers. If you have any regions that you’d like us to support, or just general feedback or feature requests related to Geo Key Manager, make a note of it on the form. We love hearing from our customers!

Geo Key Manager: Setting up a service for scale

Tanya Verma — Fri, 15 Oct 2021 13:00:57 GMT

In 2017, we launched Geo Key Manager, a service that allows Cloudflare customers to choose where they store their TLS certificate private keys. For example, if a US customer only wants its private keys stored in US data centers, we can make that happen. When a user from Tokyo makes a request to this website or API, it first hits the Tokyo data center. As the Tokyo data center lacks access to the private key, it contacts a data center in the US to terminate the TLS request. Once the TLS session is established, the Tokyo data center can serve future requests. For a detailed description of how this works, refer to this post on Geo Key Manager.

This is a story about the evolution of systems in response to increase in scale and scope. Geo Key Manager started off as a small research project and, as it got used more and more, wasn’t scaling as well as we wanted it to. This post describes the challenges Geo Key Manager is facing today, particularly from a networking standpoint, and some of the steps along its way to a truly scalable service.

Geo Key Manager started out as a research project that leveraged two key innovations: Keyless SSL, an early Cloudflare innovation; and identity-based encryption and broadcast encryption, a relatively new cryptographic technique which can be used to construct identity-based access management schemes — in this case, identities based on geography. Keyless SSL was originally designed as a keyserver that customers would host on their own infrastructure, allowing them to retain complete ownership of their own private keys while still reaping the benefits of Cloudflare.

Eventually we started using Keyless SSL for Geo Key Manager requests, and later, all TLS terminations at Cloudflare were switched to an internal version of Keyless SSL. We made several tweaks to make the transition go more smoothly, but this meant that we were using Keyless SSL in ways we hadn't originally intended.

With the increasing risks of balkanization of the Internet in response to geography-specific regulations like GDPR, demand for products like Geo Key Manager which enable users to retain geographical control of their information has surged. A lot of the work we do on the Research team is exciting because we get to apply cutting-edge advancements in the field to Cloudflare scale systems. It's also fascinating to see projects be used in new and unexpected ways. But inevitably, many of our systems use technology that has never before been used at this scale which can trigger failures as we shall see.

A trans-pacific voyage

In late March, we started seeing an increase in TLS handshake failures in Melbourne, Australia. When we start observing failures in a specific data center, depending on the affected service, we have several options to mitigate impact. For critical systems like TLS termination, one approach is to reroute all traffic using anycast to neighboring data centers. But when we rerouted traffic away from the Melbourne data center, the same failures moved to the nearby data centers of Adelaide and Perth. As we were investigating the issue, we noticed a large spike in timeouts.

The service that is first in line in the system that makes up TLS termination at Cloudflare performs all the parts of TLS termination that do not require access to the customers’ private keys. Since it is an Internet-facing process, we try to keep sensitive information out of this process as much as possible, in case of memory disclosure bugs. So it forwards the key signing request to keynotto, a service written in Rust that performs RSA and ECDSA key signatures.

We continued our investigation. This service was timing out on the requests it sent to keynotto. Next, we looked into the requests itself. We observed that there was an increase in total requests, but not by a very large factor. You can see a large drop in traffic in the Melbourne data center below. That indicates the time where we dropped traffic from Melbourne and rerouted it to other nearby data centers.

We decided to track one of these timed-out requests using our distributed tracing infrastructure. The Jaeger UI allows you to chart traces that take the longest. Thanks to this, we quickly figured out that most of these failures were being caused by a single, new zone that was getting a large amount of API traffic. And interestingly, this zone also had Geo Key Manager enabled, with a policy set to US-only data centers.

Geo Key Manager routes requests to the closest data center that satisfies the policy. This happened to be a data center in San Francisco. That meant adding ~175ms to the actual time spent performing the key signing (median is 3 ms), to account for the trans-pacific voyage ?. So, it made sense that the remote key signatures were relatively slow, but what was causing TLS timeouts and degradation for unrelated zones with nothing to do with Geo Key Manager?

Below are graphs depicting the increase by quantile in RSA key signatures latencies in Melbourne. Plotting them all on the same graph didn’t work right with the scale, so the p50 is shown separately from p70 and p90.

To answer why unrelated zones had timeouts and performance degradation, we have to understand the architecture of the three services involved in terminating TLS.

Life of a TLS request

TLS requests arrive at a data center routed by anycast. Each server runs the same stack of services, so each has its own instance of the initial service, keynotto, and gokeyless (discussed shortly). The first service has a worker pool of half the number of CPU cores. There’s 96 cores on each server, so 48 workers. Each of these workers creates their own connection to keynotto, which they use to send and receive the responses of key-signing requests. keynotto, at that point in time, could multiplex between all 48 of these connections because it spawned a new thread to handle each connection. However, it processed all requests on the same connection sequentially. Given that there could be dozens of requests per second on the same connection, if even a single one was slow, it would cause head of line blocking of all other requests enqueued after it. So long as most requests were short lived, this bug went unnoticed. But when a lot of traffic needed to be processed via Geo Key Manager, the head of line blocking created problems. This type of flaw is usually only exposed under heavy load or when load testing, and will make more sense after I introduce gokeyless and explain the history of keynotto.

gokeyless-internal, very imaginatively named, is the internal instance of our Keyless SSL keyserver written in Go. I’ll abbreviate it to gokeyless for the sake of simplicity. Before we introduced keynotto as the designated key signing process, the first service sent all key signing requests directly to gokeyless. gokeyless created worker pools based on the type of operation consisting of goroutines, a very lightweight thread managed by the Go runtime. The ones of interest are the RSA, ECDSA, and the remote worker pool. RSA and ECDSA are fairly obvious, these goroutines performed RSA key signatures and ECDSA key signatures. Any requests involving Geo Key Manager were placed in the remote pool. This prevented the network-dependent remote requests from affecting local key signatures. Worker pools were an artifact of a previous generation of Keyless which didn’t need to account for remote requests. Using benchmarks we had noticed that spinning up worker pools provided some marginal latency benefits. For local operations only, performance was optimal as the computation was very fast and CPU bound. However, when we started adding remote operations that could block the workers needed for performing local operations, we decided to create a new worker pool only for remote ops.

When gokeyless was designed in 2014, Rust was not a mature language. But that changed recently, and we decided to experiment with a minimal Rust proxy placed between the first service and gokeyless. This would handle RSA and ECDSA signatures, which were about 99% of all key signing operations, while handing off the more esoteric operations like Geo Key Manager over to gokeyless running locally on the same server. The hope was that we could eke out some performance gains from the tight runtime control afforded by Rust and the use of Rust’s cryptographic libraries. Performance is incredibly important to Cloudflare since CPU is one of the main limiting factors for edge servers. Go’s RSA is notorious for being slow and using a lot of CPU. Given that one in three handshakes use RSA, it is important to optimize it. CGo seemed to create unnecessary overhead, and there was no assembly-only RSA that we could use. We tried to speed up RSA without CGo and only using assembly and made some strides, but it was still a little non-optimal. So keynotto was built to take advantage of the fast RSA implementation in BoringSSL.

The next section is a quick diversion into keynotto that isn’t strictly related to the story, but who doesn’t enjoy a hot take on Go vs Rust?

Go vs Rust: The battle continues

While keynotto was initially deployed because it objectively lowered CPU, we weren't sure what fraction of its benefits over gokeyless were related to different cryptography libraries used vs the choice of language. keynotto used BoringCrypto, the TLS library that is part of BoringSSL, whereas gokeyless used Go's standard crypto library crypto/tls. And keynotto was implemented in Rust, while gokeyless in Go. However, recently during the process of FedRAMP certification, we had to switch the TLS library in gokeyless to use the same cryptography library as keynotto. This was because while the Go standard library doesn’t use a FIPS validated cryptography module, this Google-maintained branch of Go that uses BoringCrypto does. We then turned off keynotto in a few very large data centers for a couple days and compared this to the control group of having keynotto turned on. We found that moving gokeyless to BoringCrypto provided a very marginal benefit, forcing us to revisit our stance on CGo. This result meant we could attribute the difference to using Rust over Go and implementation differences between keynotto and gokeyless.

Turning off keynotto resulted in an average 26% increase in maximum memory consumed and 71% increase in maximum CPU consumed. In the case of all quantiles, we observed a significant increase in latency for both RSA and ECDSA. ECDSA as measured across four different large data centers (each is a different color) is shown:

Remote ops don’t play fair

The first service, keynotto and gokeyless communicate over TCP via the custom Keyless protocol. This protocol was developed in 2014, and originally intended for communicating with external key servers aka gokeyless. After this protocol started being used internally, by increasing numbers of clients in several different languages, challenges started appearing. In particular, each client had to implement by hand the serialization code and make sense of all the features the protocol provided. The custom protocol also makes tracing more challenging to do right. So while older clients like the first service have tuned their implementations, for newer clients like keynotto, it can be difficult to consider every property such as the right way to propagate traces and such.

One such property that the Keyless protocol offers is multiplexing of requests within a connection. It does so by provisioning unique IDs for each request, which allows for out-of-order delivery of the responses similar to protocols like HTTP/2. While the first service and gokeyless leveraged this property to handle situations where the order of responses is different from the order of requests, keynotto, a much newer service, didn’t. This is why it had to process requests on the same connection sequentially. This led to local key signing requests — which took roughly 3ms — being blocked on the remote requests that took 60x that duration!

We now know why most TLS requests were degraded/dropped. But it’s also worth examining the remote requests themselves. The gokeyless remote worker pool had 200 workers in each server. One of the mitigations we applied was bumping that number to 2,000. Increasing the concurrency in a system when faced with problems that resemble resource constraints is an understandable thing to do. The reason not to do so is high resource utilization. Let’s get some context on why bumping up remote workers 10x wasn’t necessarily a problem. When gokeyless was created, it had separate pools with a configurable number of workers for RSA, ECDSA, and remote ops. RSA and ECDSA had a large fraction of workers, and remote was relatively smaller. Post keynotto, the need for the RSA and ECDSA pool was obviated since keynotto handled all local key signing operations.

Let’s do some napkin math now to prove that bumping it by 10x could not have possibly helped. We were receiving five thousand requests per second in addition to the usual traffic in Melbourne. 200 were remote requests. Melbourne has 60 servers. Looking at a breakdown of remote requests by server, the maximum remote requests one server had was ten requests per second (rps). The graph below shows the percentage of workers in gokeyless actually performing operations, “other” is the remote workers. We can see that while Melbourne worker utilization peaked at 47%, that is still not 100%. And we see that the utilization in San Francisco was much lower, so it couldn’t be SF that was slowing things down.

Given that we had 200 workers per server and at most ten requests per second, even if they took 200 ms each, gokeyless was not the bottleneck. We had no reason to increase the number of remote workers. The culprit here was keynotto’s serial queue.

The solution here was to prevent remote signing ops from blocking local ops in keynotto. We did this by changing the internal task handling model of keynotto to not just handle each connection concurrently, but also to process each request on a connection concurrently. This was done by using a multi-producer, single-consumer queue. When a request from a connection was read, it was handed off to a new thread for processing while the connection thread went back to reading requests. When the request was done processing, the response was written to a channel. A write thread for each connection was also created, which polled the read side of the channel and wrote the response back to the connection.

The next thing we added were carefully chosen timeout values to keynotto. Timeouts allow for faster failure and faster cleanup of resources associated with the timed out request. Choosing appropriate timeouts can be tricky, especially when composed across multiple services. Since keynotto is downstream of the first service but upstream of gokeyless, its timeout should be smaller than the former but larger than the latter to allow upstream services to diagnose which process triggered the timeout. Timeouts can further be complicated when network latency is involved, because trans-pacific p99 latency is very different from p99 between two US states. Using the worst case network latency can be a fair compromise. We chose keynotto timeouts to be an order of magnitude larger than the p99 request latency, since the purpose was to prevent continued stall and resource consumption.

A more general solution to avoid the two issues outlined above would be to use gRPC. gRPC was released in 2015, which was one year too late to be used in Keyless at that time. It has several advantages over custom protocol implementations such as multi-language client libraries, improved tracing, easy timeouts, load balancing, protobuf for serialization and so on, but directly relevant here is multiplexing. gRPC supports multiplexing out of the box which makes it unnecessary to handle request/response multiplexing manually.

Tracking requests across the Atlantic

Then in early June, we started seeing TLS termination timeouts in a large data center In the Midwest. This time, there were timeouts across the first service, keynotto, and gokeyless. We had on our hands a zone with Geo Key Manager policy set to the EU that had suddenly started receiving around 80 thousand remote rps, which was significantly more than the 200 remote rps in Australia. This time, keynotto was not responsible since its head of line blocking issues had been fixed. It was timing out waiting for gokeyless to perform the remote key signatures. A quick glance at the maximum worker utilization of gokeyless for the entire data center revealed that it was maxed out. To counteract the issue, we increased the remote workers from 200 to 20,000, and then to 200,000. It wasn’t enough. Requests were still timing out and worker utilization was squarely at 100% until we rate limited the traffic to that zone.

Why didn’t increasing the number of workers by a factor of 1,000 times help?

File descriptors don’t scale

gokeyless maintains a map of ping latencies to each data center as measured from the data center that the instance is running in. It updates this map frequently and uses it to make decisions about which data center to route remote requests to. A long time ago, every server in every data center maintained its own connections to other data centers. This quickly grew out of hand and we started running out of file descriptors as the number of data centers grew. So, we switched to a new system of creating pods of 32 servers and shared the connections that each server needed to maintain with other data centers by the size of the pod. This was great for reducing the number of file descriptors being used by each server.

An example will illustrate this better. In a data center with 100 servers, there would be four pods: three with 32 servers each and one with four servers. If there were a total of 65 data centers in the world, then each server in a pod of 32 will be responsible for maintaining connections to two data centers, and each server in the pod of four will handle 16 servers. Remote requests are routed to the data center with the shortest latency. So in a pod of 32 servers, the server that maintains the connection with the closest data center will be responsible for sending remote requests from every member of the pod to the target data center. In case of failure or timeout, the data center which is second closest (and then the third closest) will be responsible. If all three closest data centers return failures, we give up and return a failure. So this means if a data center with 100 servers is receiving 100k remote rps, which is 590 rps/server, then there will be approximately four servers that are actually responsible for routing all remote traffic at any given time, and will each be handling ~20k rps. This was approximately the situation in the Midwest data center we were seeing issues. Requests were being routed to a data center in Germany and there could be a maximum of four connections at the same time between the Midwest location and Germany.

Little’s Law

At that time, gokeyless used a worker pool architecture with a queue for buffering additional tasks. This queue blocked after 1,024 requests were queued, to create backpressure on the client. This is a common technique to prevent overloading — not accepting more tasks than the server knows it can handle. p50 RSA latency was 4 ms. The latency from the Midwest to Germany was ~100 ms. After we bumped up the remote workers for each server to 200,000 in the Midwest, we were surely not constrained on the sending end since we only had 80k rps for the whole data center and each request shouldn’t take more than 100 ms in the vast majority of cases. We know this because of Little’s Law. Little’s Law states that the capacity of a system is the arrival rate multiplied by the service time for each request. Let’s see how it applies here, and how it allowed us to prove why increasing concurrency or queue size did not help.

Consider the queue in the German data center. We hadn’t bumped the number of remote workers there, so only 200 remote workers were available. Assuming 20k rps arrived to one server in Germany and each took ~5 ms to process, the necessary concurrency in the system should be 100 workers. We had 200 workers. Even without a queue, the German data center should’ve been easily able to handle the 20k rps thrown at it. We investigated how many remote rps were actually processed per server:

These numbers didn’t make any sense! The maximum number at any time was a little over 700. We should’ve been able to process orders of magnitude more requests. By the time we investigated this, the incident had ended. We had no visibility into the size of the service level queue — or the size of the TCP queue — to understand what could have caused the timeouts. We speculated that while p50 for the population of all RSA key signatures might be 4 ms, perhaps for the specific population of Geo Key Manager RSA signatures, things took a whole lot longer. With 200 workers, we processed ~500 rps, which meant each operation would have to take 400 ms. p99.9 for signatures can be around this much, so it is possible that was how long things took.

We recently ran a load test to establish an upper bound on remote rps, and discovered that one server in the US could process 7,000 remote rps to the EU — much lower than Little’s Law suggests. We identified several slow remote requests through tracing and noticed these usually corresponded with multiple tries to different remote hosts. The explanation for why these retries are necessary to get through to an otherwise very functional data center is that gokeyless uses a hardcoded list of data centers and corresponding server hostnames to establish connections to. Hostnames can change when new servers are added or removed. If we are attempting to connect to an invalid host and waiting on a one second timeout, we will certainly need to retry to another host, which can cause large increases in average processing time.

After the outage, we decided to remove gokeyless worker pools and switch to a model of one goroutine per request. Goroutines are extremely lightweight, with benchmarks that suggest that a 4 GB machine can handle around one million goroutines. This was something that made sense to do after we had extended gokeyless to support remote requests in 2017, but because of the complexity involved in changing the entire architecture of processing requests, we held off on the change. Furthermore, when we launched Geo Key Manager, we didn’t have enough traffic that might have prompted us to rethink this architecture urgently. This removes complexity from the code, because tuning worker pool sizes and queue sizes can be complicated. We have also observed, on average, a 7x drop in the memory consumed on switching to this one goroutine per request model, because we no longer have idle goroutines active as was in the case of worker pools. It also makes it easier to trace the life of an individual request because you get to follow along all the steps in its execution, which can help when trying to chase down per request slowdowns.

Conclusion

The complexity of distributed systems can quickly scale up to unmanageable levels, making it important to have deep visibility into it. An extremely important tool to have is distributed tracing, which tracks a request through multiple services and provides information about time spent in each part of the journey. We didn’t propagate gokeyless trace spans through parts of the remote operation, which prevented us from identifying why Hamburg only processed 700 rps. Having examples on hand for various error cases can also make diagnosis easier, especially in the midst of an outage. While we load tested our TLS termination system in the common case, where key signatures happened locally, we hadn’t load tested the less common and lower volume use case of remote operations. Using sources of truth that update dynamically to reflect the current state of the world instead of static sources is also important. In our case, the list of data centers that gokeyless connected to for performing remote operation was hardcoded a long time ago and never updated. So while we added more data centers, gokeyless was unable to make use of them, and in some cases, may have been connecting to servers with invalid.

We’re now working on overhauling several pieces of Geo Key Manager to make it significantly more flexible and scalable, so think of this as setting up the stage for a future blog post where we finally solve some of the issues outlined here, and stay tuned for updates!

Technical reading from the Cloudflare blog for the holidays

John Graham-Cumming — Fri, 22 Dec 2017 14:17:57 GMT

During 2017 Cloudflare published 172 blog posts (including this one). If you need a distraction from the holiday festivities at this time of year here are some highlights from the year.

CC BY 2.0 image by perzon seo

The WireX Botnet: How Industry Collaboration Disrupted a DDoS Attack

We worked closely with companies across the industry to track and take down the Android WireX Botnet. This blog post goes into detail about how that botnet operated, how it was distributed and how it was taken down.

Randomness 101: LavaRand in Production

The wall of Lava Lamps in the San Francisco office is used to feed entropy into random number generators across our network. This blog post explains how.

ARM Takes Wing: Qualcomm vs. Intel CPU comparison

Our network of data centers around the world all contain Intel-based servers, but we're interested in ARM-based servers because of the potential cost/power savings. This blog post took a look at the relative performance of Intel processors and Qualcomm's latest server offering.

How to Monkey Patch the Linux Kernel

One engineer wanted to combine the Dvorak and QWERTY keyboard layouts and did so by patching the Linux kernel using SystemTap. This blog explains how and why. Where there's a will, there's a way.

Introducing Cloudflare Workers: Run JavaScript Service Workers at the Edge

Traditionally, the Cloudflare network has been configurable by our users, but not programmable. In September, we introduced Cloudflare Workers which allows users to write JavaScript code that runs on our edge worldwide. This blog post explains why we chose JavaScript and how it works.

CC BY 2.0 image by Peter Werkman

Geo Key Manager: How It Works

Our Geo Key Manager gives customers granular control of the location of their private keys on the Cloudflare network. This blog post explains the mathematics that makes the possible.

SIDH in Go for quantum-resistant TLS 1.3

Quantum-resistant cryptography isn't an academic fantasy. We implemented the SIDH scheme as part of our Go implementation of TLS 1.3 and open sourced it.

The Languages Which Almost Became CSS

This blog post recounts the history of CSS and the languages that might have been CSS.

Perfect locality and three epic SystemTap scripts

In an ongoing effort to understand the performance of NGINX under heavy load on our machines (and wring out the greatest number of requests/core), we used SystemTap to experiment with different queuing models.

How we built rate limiting capable of scaling to millions of domains

We rolled out a rate limiting feature that allows our customers to control the maximum number of HTTP requests per second/minute/hour that their servers receive. This blog post explains how we made that operate efficiently at our scale.

CC BY 2.0 image by Han Cheng Yeh

Reflections on reflection (attacks)

We deal with a new DDoS attack every few minutes and in this blog post we took a close look at reflection attacks and revealed statistics on the types of reflection-based DDoS attacks we see.

On the dangers of Intel's frequency scaling

Intel processors contain special AVX-512 that provide 512-bit wide SIMD instructions to speed up certain calculations. However, these instructions have a downside: when used the CPU base frequency is scaled down slowing down other instructions. This blog post explores that problem.

How Cloudflare analyzes 1M DNS queries per second

This blog post details how we handle logging information for 1M DNS queries per second using a custom pipeline, ClickHouse and Grafana (via a connector we open sourced) to build real time dashboards.

AES-CBC is going the way of the dodo

CBC-mode cipher suites have been declining for some time because of padding oracle-based attacks. In this blog we demonstrate that AES-CBC has now largely been replaced by ChaCha20-Poly1305 .

CC BY-SA 2.0 image by Christine

How we made our DNS stack 3x faster

We answer around 1 million authoritative DNS queries per second using a custom software stack. Responding to those queries as quickly as possible is why Cloudflare is fastest authoritative DNS provider on the Internet. This blog post details how we made our stack even faster.

Quantifying the Impact of "Cloudbleed"

On February 18 a serious security bug was reported to Cloudflare. Five days later we released details of the problem and six days after that we posted this analysis of the impact.

LuaJIT Hacking: Getting next() out of the NYI list

We make extensive use of LuaJIT when processing our customers' traffic and making it faster is a key goal. In the past, we've sponsored the project and everyone benefits from those contributions. This blog post examines getting one specific function JITted correctly for additional speed.

Privacy Pass: The Math

The Privacy Pass project provides a zero knowledge way of proving your identity to a service like Cloudflare. This detailed blog post explains the mathematics behind authenticating a user without knowing their identity.

How and why the leap second affected Cloudflare DNS

The year started with a bang for some engineers at Cloudflare when we ran into a bug in our custom DNS server, RRDNS, caused by the introduction of a leap second at midnight UTC on January 1, 2017. This blog explains the error and why it happened.

There's no leap second this year.

Introducing the Cloudflare Geo Key Manager

Patrick R. Donahue — Tue, 26 Sep 2017 13:00:00 GMT

Cloudflare’s customers recognize that they need to protect the confidentiality and integrity of communications with their web visitors. The widely accepted solution to this problem is to use the SSL/TLS protocol to establish an encrypted HTTPS session, over which secure requests can then be sent. Eavesdropping is protected against as only those who have access to the “private key” can legitimately identify themselves to browsers and decrypt encrypted requests.

Today, more than half of all traffic on the web uses HTTPS—but this was not always the case. In the early days of SSL, the protocol was viewed as slow as each encrypted request required two round trips between the user’s browser and web server. Companies like Cloudflare solved this problem by putting web servers close to end users and utilizing session resumption to eliminate those round trips for all but the very first request.

Expanding footprint meets geopolitical concerns

As Internet adoption grew around the world, with companies increasingly serving global and more remote audiences, providers like Cloudflare had to continue expanding their physical footprint to keep up with demand. As of the date this blog post was published, Cloudflare has data centers in over 55 countries, and we continue to accelerate the build out of our global network.

While this expansion makes our customers’ sites faster and better protected against DDoS wherever visitors (or attackers) are connecting from, it presents a few related security challenges.

First, as we put equipment (and SSL private keys) in more and more countries around the world, it’s likely that local governments will possess differing—and often evolving—views on privacy and the sanctity of encrypted communications. In the past, there have been reports that certain governments have attempted to compel organizations to turn key material to them. (To be unambiguously clear, Cloudflare—as documented in our Transparency Report—has never turned over our SSL keys or our customers’ SSL keys to anyone.)

Even if local governments are to be trusted, organizations may have strong geopolitical-based opinions on security or mandates to adhere to certain regulatory frameworks. That, or they simply may understand there are only so many data centers in the world that can meet our most stringent physical security requirements and controls. As Cloudflare’s network grows, it’s inevitable that we will exhaust these facilities. We recognize that our customers may want to control distribution of their private key material based on an assessment of physical controls.

Private key restrictions to match your security posture

As Cloudflare serves larger, more multinational and geographically distributed customers, the frequency of requests we’ve heard to restrict private key distribution by physical location has increased.

We’ve listened carefully to those requests, and today we’re excited to introduce a new feature that hands this geography-based control over to you. We call it Geo Key Manager.

On upload of a new Custom Certificate, you now have the ability to indicate which subset of our data centers should have local access to your private key for the purpose of establishing HTTPS sessions.

Connections to data centers that you’ve told us aren’t allowed to hold your key will still be fully encrypted and accessible via HTTPS, using identical encryption algorithms, but will be done so using our Keyless SSL technology rather than locally accessing the key.

Initial options available for use

When we thought about which options to include in the dropdown box pictured above, we reached out to human rights and civil liberties organizations, leading financial and banking institutions, and other groups that place a high premium on security. These discussions resulted in the options you see today, and over the coming weeks we’ll be having additional conversations. Our goal in these talks is identifying what other designations make sense for our customers, so we can provide easily usable defaults.

Everywhere

By default, we won’t apply any restrictions; securely storing your keys in all of our data centers allows us to optimize performance for your visitors across the globe.

U.S. Only or E.U. Only

One common theme we heard from customers was the desire to restrict the distribution of private SSL keys to either the U.S. only—typically requested by U.S. government agencies or businesses that serve exclusively U.S. customers—or the E.U. only, for comparable reasons.

Selecting either of these options in the dropdown will ensure that your keys do not leave this geopolitical designation. As of the time this post was written, Cloudflare has 31 data centers in 25 U.S. cities and 30 data centers across 21 E.U. countries.

Highest Security Data Centers

Cloudflare has a checklist of security requirements that we require before locating our equipment somewhere. With every installation we ensure that our deployment is in compliance with internationally recognized security standards, including PCI DSS 3.2 and SSAE-18 SOC 2. This security review is re-evaluated quarterly by our internal security staff as well as annually by independent, third-party auditors.

As part of this assessment we group our data centers into security tiers, using this knowledge to help shape our deployment and operational decision making and strategies.

Baseline Requirements

At minimum, all of our data centers offer the following physical controls:

External and Internal Security

The datacenter must be physically secure with perimeter controls that meet industry accepted minimums (e.g., locked doors, high walls, man traps, alarm systems, etc.).

Restricted Facility Access

Access is restricted to Cloudflare employees and named authorized individuals only. Every access request must be authorized and the requester’s access confirmed with government issued ID.
All access requests are logged and those logs are reviewed on a regular basis.
Any attempts to physically access, modify or remove our equipment generates an alert to security and operational teams.

24x7 Onsite Security

Security guards are onsite and present 24 hours a day, 7 days a week.

CCTV

Facilities are monitored by closed-circuit television cameras.

Secured Server Cabinet

Access to Cloudflare’s server cabinets are controlled through physical security locks and other mechanisms.
Physical tamper detection on all systems.
All hardware deployed within cabinets are equipped with physical tamper detection systems that have been enabled and configured to report any tampering attempts.

Additional "High Security" Requirements

While we ensure that every Cloudflare data center meets minimum industry security standards, our top tier of data centers are held to a much higher standard.

These all have the baseline security requirements plus:

Pre-Scheduled, Biometric Controlled Facility Access

Employees of Cloudflare permitted to access the facility must have previously scheduled a visit before access will be granted.
Access to the entrance of the facility is controlled through the use of a biometric hand reader combined with an assigned access code.

Private Cages with Biometric Readers

All equipment is in private cages with physical access controlled via biometrics and recorded in audit logs.
Entrants have to pass through 5 separate readers before the cage can be accessed.

Exterior Security Controls and Monitoring

All points of ingress/egress are monitored by an intrusion detection system (IDS), with authorized users and access events archived for historical review.

Interior Security Controls and Monitoring

Interior points of ingress/egress are controlled by the access control subsystem, with entry routed through a mantrap. All areas are monitored and recorded with closed-circuit television, with data kept for a minimum of 30 days.
Exterior walls are airtight and may incorporate additional security measure such as reinforced concrete, Kevlar bullet board, vapor barriers, or bullet proof front doors.

We designate facilities that meet the above our "Highest Security" data centers.

What's the catch?

As mentioned previously, the very first HTTPS request made to a data center that does not hold your private key requires a bit of overhead while a symmetric encryption key for the session is negotiated.

By “overhead” we mean that a bit of network latency is introduced while the remote facility is contacted for assistance establishing the secure channel. The further (physically) this key-holding data center is from where the HTTPS request is being established, the longer the wait.

For example, the first request to Moscow for a resource that's protected by a key held only in our “Highest Security” facilities takes an additional three one-hundredths of a second (30ms) to go to Stockholm and back. Or four one-hundredths of a second (40ms) to go from Moscow to Frankfurt and back.

Cloudflare will pick the fastest route to a nearby data center for this first request that requires an extra hop. However, all subsequent requests after the initial one will be just as fast as if the key were local thanks to SSL/TLS session resumption.

This page brought to you by Geo Key Manager

To demonstrate the performance impact is minimal, we set blog.cloudflare.com to use only our Highest Security data centers. If you click around on the blog you can see that, while the pages are encrypted, there's no noticeable delay in surfing them.

We estimate that the average user, based on the geographic distribution of traffic to our blog, will see a ~65ms average increase in latency for their first request. After that, subsequent requests will be as fast as if keys were distributed to all our data centers.

Looking forward

Product-offering wise, we plan to extend Geo Key Manager to work with Dedicated Certificates. Once this is available, we will communicate the news on the blog or in a knowledge-base article.

Additionally, we plan to soon allow Enterprise users to specify a precise list of which of our data centers they do or do not want their keys held in; for example, customers who have concerns about specific countries or regions will be able to use this feature to gain fine-grained control over their private key distribution. If you’re on an Enterprise plan and would like early access to this feature today, please contact your Customer Success Manager.

Beyond granular controls, we aim to fundamentally change the dynamics of geo key selection by allowing you to outsource the decision making and, potentially, legal challenge process. Imagine, for instance, if you could specify that a civil society organization like the Electronic Frontier Foundation be the custodian of your keys. Different organizations that you trust could develop expertise on fighting overly broad legal requests that may someday come.

We’re just getting started having conversations with these types of organizations, so if you have one in mind, be sure to let us know in the comment section below!