Privacy Pass v3: the new privacy bits

In November 2017, we released our implementation of a privacy preserving protocol to let users prove that they are humans without enabling tracking. When you install Privacy Pass’s browser extension, you get tokens when you solve a Cloudflare CAPTCHA which can be used to avoid needing to solve one again... The redeemed token is cryptographically unlinkable to the token originally provided by the server. That is why Privacy Pass is privacy preserving.

In October 2019, Privacy Pass reached another milestone. We released Privacy Pass Extension v2.0 that includes a new service provider (hCaptcha) which provides a way to redeem a token not only with CAPTCHAs in the Cloudflare challenge pages but also hCaptcha CAPTCHAs in any website. When you encounter any hCaptcha CAPTCHA in any website, including the ones not behind Cloudflare, you can redeem a token to pass the CAPTCHA.

We believe Privacy Pass solves an important problem — balancing privacy and security for bot mitigation— but we think there’s more to be done in terms of both the codebase and the protocol. We improved the codebase by redesigning how the service providers interact with the core extension. At the same time, we made progress on the standardization at IETF and improved the protocol by adding metadata which allows us to do more fabulous things with Privacy Pass.

Announcing Privacy Pass Extension v3.0

The current implementation of our extension is functional, but it is difficult to maintain two Privacy Pass service providers: Cloudflare and hCaptcha. So we decided to refactor the browser extension to improve its maintainability. We also used this opportunity to make following improvements:

Implement the extension using TypeScript instead of plain JavaScript.
Build the project using a module bundler instead of custom build scripts.
Refactor the code and define the API for the cryptographic primitive.
Treat provider-specific code as an encapsulated software module rather than a list of configuration properties.

As a result of the improvements listed above, the extension will be less error-prone and each service provider will have more flexibility and can be integrated seamlessly with other providers.

In the new extension we use TypeScript instead of plain JavaScript because its syntax is a kind of extension to JavaScript, and we already use TypeScript in Workers. One of the things that makes TypeScript special is that it has features that are only available in modern programming languages, like null safety.

Support for Future Service Providers

Another big improvement in v3.0 is that it is designed for modularity, meaning that it will be very easy to add a new potential service provider in the future. A new provider can use an API provided by us to implement their own request flow to use the Privacy Pass protocol and to handle the HTTP requests. By separating the provider-specific code from the core extension code using the API, the extension will be easier to update when there is a need for more service providers.

On a technical level, we allow each service provider to have its own WebRequest API event listeners instead of having central event listeners for all the providers. This allows providers to extend the browser extension's functionality and implement any request handling logic they want.

Another major change that enables us to do this is that we moved away from configuration to programmable modularization.

Configuration vs Modularization

As mentioned in 2019, it would be impossible to expect different service providers to all abide by the same exact request flow, so we decided to use a JSON configuration file in v2.0 to define the request flow. The configuration allows the service providers to easily modify the extension characteristics without dealing too much with the core extension code. However, recently we figured out that we can improve it without using a configuration file, and using modules instead.

Using a configuration file limits the flexibility of the provider by the number of possible configurations. In addition, when the logic of each provider evolves and deviates from one another, the size of configuration will grow larger and larger which makes it hard to document and keep track of. So we decided to refactor how we determine the request flow from using a configuration file to using a module file written specifically for each service provider instead.

Configuration to modularization refactoring.

By using a programmable module, the providers are not limited by the available fields in the configuration. In addition, the providers can use the available implementations of the necessary cryptographic primitives in any point of the request flow because we factored out the crypto bits into a separate module which can be used by any provider. In the future, if the cryptographic primitives ever change, the providers can update the code and use it any time.

Towards Standard Interoperability

The Privacy Pass protocol was first published at the PoPETS symposium in 2018. As explained in this previous post, the core of the Privacy Pass protocol is a secure way to generate tokens between server and client. To that end, the protocol requires evaluating a pseudorandom function that is oblivious and verifiable. The first property prevents the server from learning information about the client’s tokens, while the client learns nothing about the server’s private key. This is useful to protect the privacy of users. The token generation must also be verifiable in the sense that the client can attest to the fact that its token was minted using the server’s private key.

The original implementation of Privacy Pass has seen real-world use in our browser extension, helping to reduce CAPTCHAs for hundreds of thousands of people without compromising privacy. But to guarantee interoperability between services implementing Privacy Pass, what's required is an accurate specification of the protocol and its operations. With this motivation, the Privacy Pass protocol was proposed as an Internet draft at the Internet Engineering Task Force (IETF) — to know more about our participation at IETF look at the post.

In March 2020, the protocol was presented at IETF-107 for the first time. The session was a Birds-of-a-Feather, a place where the IETF community discusses the creation of new working groups that will write the actual standards. In the session, the working group’s charter is presented and proposes to develop a secure protocol for redeeming unforgeable tokens that attest to the validity of some attribute being held by a client. The charter was later approved, and three documents were integrated covering the protocol, the architecture, and an HTTP API for supporting Privacy Pass. The working group at IETF can be found at https://datatracker.ietf.org/wg/privacypass/.

Additionally, to its core functionality, the Privacy Pass protocol can be extended to improve its usability or to add new capabilities. For instance, adding a mechanism for public verifiability will allow a third party, someone who did not participate in the protocol, to verify the validity of tokens. Public verifiability can be implemented using a blind-signature scheme — this is a special type of digital signatures firstly proposed by David Chaum in which signers can produce signatures on messages without learning the content of the message. A diversity of algorithms to implement blind-signatures exist; however, there is still work to be done to define a good candidate for public verifiability.

Another extension for Privacy Pass is the support for including metadata in the tokens. As this is a feature with high impact on the protocol, we devote a larger section to explain the benefits of supporting metadata in the face of hoarding attacks.

Future work: metadata

What is research without new challenges that arise? What does development look like if there are no other problems to solve? During the design and development of Privacy Pass (both as a service, as an idea, and as a protocol), a potential vector for abuse was noted, which will be referred to as a “hoarding” or “farming” attack. This attack consists of individual users or groups of users that can gather tokens over a long period of time and redeem them all at once with the aim of, for example, overwhelming a website and making the service unavailable for other users. In a more complex scenario, an attacker can build up a stock of tokens that they could then redistribute amongst other clients. This redistribution ability is possible as tokens are not linked to specific clients, which is a property of the Privacy Pass protocol.

There have been several proposed solutions to this attack. One can, for example, make the verification of tokens procedure very efficient, so attackers will need to hoard an even larger amount of tokens in order to overwhelm a service. But the problem is not only about making verification times faster, and, therefore, this does not completely solve the problem. Note that in Privacy Pass, a successful token redemption could be exchanged for a single-origin cookie. These cookies allow clients to avoid future challenges for a particular domain without using more tokens. In the case of a hoarding attack, an attacker could trade in their hoarded number of tokens for a number of cookies. An attacker can, then, mount a layer 7 DDoS attack with the “hoarded” cookies, which would render the service unavailable.

In the next sections, we will explore other different solutions to this attack.

A simple solution and its limitations: key rotation

What does “key rotation” mean in the context of Privacy Pass? In Privacy Pass, each token is attested by keys held by the service. These keys are further used to verify the honesty of a token presented by a client when trying to access a challenge-protected service. “Key rotation” means updating these keys with regard to a chosen epoch (meaning, for example, that every two weeks — the epoch —, the keys will be rotated). Regular key rotation, then, implies that tokens belong to these epochs and cannot be used outside them, which prevents stocks of tokens from being useful for longer than the epoch they belong to.

Keys, however, should not be rotated frequently as:

Rotating a key can lead to security implications
Establishing trust in a frequently-rotating key service can be a challenging problem
The unlinkability of the client when using tokens can be diminished

Let’s explore these problems one by one now:

Rotating a key can lead to security implications, as past keys need to be deleted from secure storage locations and replaced with new ones. This process is prone to failure if done regularly, and can lead to potential key material leakage.

Establishing trust in a frequently-rotating key service can be a challenging problem, as keys will have to be verified by the needed parties each time they are regenerated. Keys need to be verified as it has to be attested that they belong to the entity one is trying to communicate with. If keys rotate too frequently, this verification procedure will have to happen frequently as well, so that an attacker will not be able to impersonate the honest entity with a “fake” public key.

The unlinkability of the client when using tokens can be diminished as a savvy attacker (a malicious server, for example) could link token generation and token future-use. In the case of a malicious server, it can, for example, rotate their keys too often to violate unlinkability or could pick a separate public key for each client issuance. In these cases, this attack can be solved by the usage of public mechanisms to record which server’s public keys are used; but this requires further infrastructure and coordination between actors. Other cases are not easily solvable by this “public verification”: if keys are rotated every minute, for example, and a client was the only one to visit a “privacy pass protected” site in that minute, then, it's not hard to infer (to “link”) that the token came only from this specific client.

A novel solution: Metadata

A novel solution to this “hoarding” problem that does not require key rotation or further optimization of verification times is the addition of metadata. This approach was introduced in the paper “A Fast and Simple Partially Oblivious PRF, with Applications”, and it is called the “POPRF with metadata” construction. The idea is to add a metadata field to the token generation procedure in such a way that tokens are cryptographically linked to this added metadata. The added metadata can be, for example, a number that signals which epoch this token belongs to. The service, when presented with this token on verification, promptly checks that it corresponds to its internal epoch number (this epoch number can correspond to a period of time, a threshold of number of tokens issued, etc.). If it does not correspond, this token is expired and cannot be further used. Metadata, then, can be used to expire tokens without performing key rotations, thereby avoiding some issues outlined above.

Other kinds of metadata can be added to the Partially Oblivious PRF (PO-PRF) construction as well. Geographic location can be added, which signals that tokens can only be used in a specific region.

The limits of metadata

Note, nevertheless, that the addition of this “metadata” should be carefully considered as adding, in the case of “time-metadata”, an explicit time bound signal will diminish the unlikability set of the tokens. If an explicit time-bound signal is added (for example, the specific time — year, month, day, hour, minute and seconds — in which this token was generated and the amount of time it is valid for), it will allow a malicious server to link generation and usage. The recommendation is to use “opaque metadata”: metadata that is public to both client and service but that only the service knows its precise meaning. A server, for example, can set a counter that gets increased after a period of time (for example, every two weeks). The server will add this counter as metadata rather than the period of time. The client, in this case, publicly knows what this counter is but does not know to which period it refers to.

Geographic location metadata should be coarse as well: it should refer to a large geographical area, such as a continent, or political and economic union rather than an explicit location.

Wrap up

The Privacy Pass protocol provides users with a secure way for redeeming tokens. At Cloudflare, we use the protocol to reduce the number of CAPTCHAs improving the user experience while browsing websites. A natural evolution of the protocol is expected, ranging from its standardization to innovating with new capabilities that help to prevent abuse of the service.

On the service side, we refactored the Privacy Pass browser extension aiming to improve the quality of the code, so bugs can be detected in earlier phases of the development. The code is available at the challenge-bypass-extension repository, and we invite you to try the release candidate version.

An appealing extension for Privacy Pass is the inclusion of metadata as it provides a non-cumbersome way to solve hoarding attacks, while preserving the anonymity (in general, the privacy) of the protocol itself. Our paper provides you more information about the technical details behind this idea.

The application of the Privacy Pass protocol in other use cases or to create other service providers requires a certain degree of compatibility. People wanting to implement Privacy Pass must be able to have a standard specification, so implementations can interoperate. The efforts along these lines are centered on the Privacy Pass working group at IETF, a space open for anyone to participate in delineating the future of the protocol. Feel free to be part of these efforts too.

We are continuously working on new ways of improving our services and helping the Internet be a better and a more secure place. You can join us on this effort and can reach us at research.cloudflare.com. See you next time.

The Cloudflare Blog