Christopher Wood - The Cloudflare Blog

Encrypted Client Hello - the last puzzle piece to privacy

Achiel van der Mandele — Fri, 29 Sep 2023 13:00:52 GMT

Today we are excited to announce a contribution to improving privacy for everyone on the Internet. Encrypted Client Hello, a new proposed standard that prevents networks from snooping on which websites a user is visiting, is now available on all Cloudflare plans.

Encrypted Client Hello (ECH) is a successor to ESNI and masks the Server Name Indication (SNI) that is used to negotiate a TLS handshake. This means that whenever a user visits a website on Cloudflare that has ECH enabled, no one except for the user, Cloudflare, and the website owner will be able to determine which website was visited. Cloudflare is a big proponent of privacy for everyone and is excited about the prospects of bringing this technology to life.

Browsing the Internet and your privacy

Whenever you visit a website, your browser sends a request to a web server. The web server responds with content and the website starts loading in your browser. Way back in the early days of the Internet this happened in 'plain text', meaning that your browser would just send bits across the network that everyone could read: the corporate network you may be browsing from, the Internet Service Provider that offers you Internet connectivity and any network that the request traverses before it reaches the web server that hosts the website. Privacy advocates have long been concerned about how much information could be seen in "plain text": If any network between you and the web server can see your traffic, that means they can also see exactly what you are doing. If you are initiating a bank transfer any intermediary can see the destination and the amount of the transfer.

So how to start making this data more private? To prevent eavesdropping, encryption was introduced in the form of SSL and later TLS. These are amazing protocols that safeguard not only your privacy but also ensure that no intermediary can tamper with any of the content you view or upload. But encryption only goes so far.

While the actual content (which particular page on a website you're visiting and any information you upload) is encrypted and shielded from intermediaries, there are still ways to determine what a user is doing. For example, the DNS request to determine the address (IP) of the website you're visiting and the SNI are both common ways for intermediaries to track usage.

Let's start with DNS. Whenever you visit a website, your operating system needs to know which IP address to connect to. This is done through a DNS request. DNS by default is unencrypted, meaning anyone can see which website you're asking about. To help users shield these requests from intermediaries, Cloudflare introduced DNS over HTTPS (DoH) in 2019. In 2020, we went one step further and introduced Oblivious DNS over HTTPS which prevents even Cloudflare from seeing which websites a user is asking about.

That leaves SNI as the last unencrypted bit that intermediaries can use to determine which website you're visiting. After performing a DNS query, one of the first things a browser will do is perform a TLS handshake. The handshake constitutes several steps, including which cipher to use, which TLS version and which certificate will be used to verify the web server's identity. As part of this handshake, the browser will indicate the name of the server (website) that it intends to visit: the Server Name Indication.

Due to the fact that the session is not encrypted yet, and the server doesn't know which certificate to use, the browser must transmit this information in plain text. Sending the SNI in plaintext means that any intermediary can view which website you’re visiting simply by checking the first packet for a connection:

This means that despite the amazing efforts of TLS and DoH, which websites you’re visiting on the Internet still isn't truly private. Today, we are adding the final missing piece of the puzzle with ECH. With ECH, the browser performs a TLS handshake with Cloudflare, but not a customer-specific hostname. This means that although intermediaries will be able to see that you are visiting a website on Cloudflare, they will never be able to determine which one.

How does ECH work?

In order to explain how ECH works, it helps to first understand how TLS handshakes are performed. A TLS handshake starts with a ClientHello part, which allows a client to say which ciphers to use, which TLS version and most importantly, which server it's trying to visit (the SNI).

With ECH, the ClientHello message part is split into two separate messages: an inner part and an outer part. The outer part contains the non-sensitive information such as which ciphers to use and the TLS version. It also includes an "outer SNI". The inner part is encrypted and contains an "inner SNI".

The outer SNI is a common name that, in our case, represents that a user is trying to visit an encrypted website on Cloudflare. We chose cloudflare-ech.com as the SNI that all websites will share on Cloudflare. Because Cloudflare controls that domain we have the appropriate certificates to be able to negotiate a TLS handshake for that server name.

The inner SNI contains the actual server name that the user is trying to visit. This is encrypted using a public key and can only be read by Cloudflare. Once the handshake completes the web page is loaded as normal, just like any other website loaded over TLS.

In practice, this means that any intermediary that is trying to establish which website you’re visiting will simply see normal TLS handshakes with one caveat: any time you visit an ECH enabled website on Cloudflare the server name will look the same. Every TLS handshake will appear identical in that it looks like it's trying to load a website for cloudflare-ech.com, as opposed to the actual website. We've solved the last puzzle-piece in preserving privacy for users that don't like intermediaries seeing which websites they are visiting.

Visist our introductory blog for full details on the nitty-gritty of ECH technology.

The future of privacy

We're excited about what this means for privacy on the Internet. Browsers like Google Chrome and Firefox are starting to ramp up support for ECH already. If you're a website, and you care about users visiting your website in a fashion that doesn't allow any intermediary to see what users are doing, enable ECH today on Cloudflare. We've enabled ECH for all free zones already. If you're an existing paying customer, just head on over to the Cloudflare dashboard and apply for the feature. We’ll be enabling this for everyone that signs up over the coming few weeks.

Over time, we hope others will follow our footsteps, leading to a more private Internet for everyone. The more providers that offer ECH, the harder it becomes for anyone to listen in on what users are doing on the Internet. Heck, we might even solve privacy for good.

If you're looking for more information on ECH, how it works and how to enable it head on over to our developer documentation on ECH.

Privacy Gateway: a privacy preserving proxy built on Internet standards

Mari Galicer — Thu, 27 Oct 2022 13:01:00 GMT

If you’re running a privacy-oriented application or service on the Internet, your options to provably protect users’ privacy are limited. You can minimize logs and data collection but even then, at a network level, every HTTP request needs to come from somewhere. Information generated by HTTP requests, like users’ IP addresses and TLS fingerprints, can be sensitive especially when combined with application data.

Meaningful improvements to your users’ privacy require a change in how HTTP requests are sent from client devices to the server that runs your application logic. This was the motivation for Privacy Gateway: a service that relays encrypted HTTP requests and responses between a client and application server. With Privacy Gateway, Cloudflare knows where the request is coming from, but not what it contains, and applications can see what the request contains, but not where it comes from. Neither Cloudflare nor the application server has the full picture, improving end-user privacy.

We recently deployed Privacy Gateway for Flo Health Inc., a leading female health app, for the launch of their Anonymous Mode. With Privacy Gateway in place, all request data for Anonymous Mode users is encrypted between the app user and Flo, which prevents Flo from seeing the IP addresses of those users and Cloudflare from seeing the contents of that request data.

With Privacy Gateway in place, several other privacy-critical applications are possible:

Browser developers can collect user telemetry in a privacy-respecting manner– what extensions are installed, what defaults a user might have changed — while removing what is still a potentially personal identifier (the IP address) from that data.
Users can visit a healthcare site to report a Covid-19 exposure without worrying that the site is tracking their IP address and/or location.
DNS resolvers can serve DNS queries without linking who made the request with what website they’re visiting – a pattern we’ve implemented with Oblivious DNS.Privacy Gateway is based on Oblivious HTTP (OHTTP), an emerging IETF standard and is built upon standard hybrid public-key cryptography.

How does it work?

The main innovation in the Oblivious HTTP standard – beyond a basic proxy service – is that these messages are encrypted to the application’s server, such that Privacy Gateway learns nothing of the application data beyond the source and destination of each message.

Privacy Gateway enables application developers and platforms, especially those with strong privacy requirements, to build something that closely resembles a “Mixnet”: an approach to obfuscating the source and destination of a message across a network. To that end, Privacy Gateway consists of three main components:

Client: the user’s device, or any client that’s configured to forward requests to Privacy Gateway.
Privacy Gateway: a service operated by Cloudflare and designed to relay requests between the Client and the Gateway, without being able to observe the contents within.
Application server: the origin or application web server responsible for decrypting requests from clients, and encrypting responses back.

If you were to imagine request data as the contents of the envelope (a letter) and the IP address and request metadata as the address on the outside, with Privacy Gateway, Cloudflare is able to see the envelope’s address and safely forward it to its destination without being able to see what’s inside.

An Oblivious HTTP transaction using Privacy Gateway

In slightly more detail, the data flow is as follows:

Client encapsulates an HTTP request using the public key of the application server, and sends it to Privacy Gateway over an HTTPS connection.
Privacy Gateway forwards the request to the server over its own, separate HTTPS connection with the application server.
The application server decapsulates the request, forwarding it to the target server which can produce the response.
The application server returns an encapsulated response to Privacy Gateway, which then forwards the result to the client.As specified in the protocol, requests from the client to the server are encrypted using HPKE, a state-of-the-art standard for public key encryption – which you can read more about here. We’ve taken additional measures to ensure that OHTTP’s use of HPKE is secure by conducting a formal analysis of the protocol, and we expect to publish a deeper analysis in the coming weeks.

How Privacy Gateway improves end-user privacy

This interaction offers two types of privacy, which we informally refer to as request privacy and client privacy.

Request privacy means that the application server does not learn information that would otherwise be revealed by an HTTP request, such as IP address, geolocation, TLS and HTTPS fingerprints, and so on. Because Privacy Gateway uses a separate HTTPS connection between itself and the application server, all of this per-request information revealed to the application server represents that of Privacy Gateway, not of the client. However, developers need to take care to not send personally identifying information in the contents of requests. If the request, once decapsulated, includes information like users’ email, phone number, or credit card info, for example, Privacy Gateway will not meaningfully improve privacy.

Client privacy is a stronger notion. Because Cloudflare and the application server are not colluding to share individual user’s data, from the server’s perspective, each individual transaction came from some unknown client behind Privacy Gateway. In other words, a properly configured Privacy Gateway deployment means that applications cannot link any two requests to the same client. In particular, with Privacy Gateway, privacy loves company. If there is only one end-user making use of Privacy Gateway, then it only provides request privacy (since the client IP address remains hidden from the Gateway). It would not provide client privacy, since the server would know that each request corresponds to the same, single client. Client privacy requires that there be many users of the system, so the application server cannot make this determination.

To better understand request and client privacy, consider the following HTTP request between a client and server:

Normal HTTP configuration with a client anonymity set of size 1

If a client connects directly to the server (or “Gateway” in OHTTP terms), the server is likely to see information about the client, including the IP address, TLS cipher used, and a degree of location data based on that IP address:

There’s plenty of sensitive information here that might be unique to the end-user. In other words, the connection offers neither request nor client privacy.

With Privacy Gateway, clients do not connect directly to the application server itself. Instead, they connect to Privacy Gateway, which in turn connects to the server. This means that the server only observes connections from Privacy Gateway, not individual connections from clients, yielding a different view:

Privacy Gateway configuration with a client anonymity set of size k

This is request privacy. All information about the client’s location and identity are hidden from the application server. And all details about the application data are hidden from Privacy Gateway. For sensitive applications and protocols like DNS, keeping this metadata separate from the application data is an important step towards improving end-user privacy.

Moreover, applications should take care to not reveal sensitive, per-client information in their individual requests. Privacy Gateway cannot guarantee that applications do not send identifying info – such as email addresses, full names, etc – in request bodies, since it cannot observe plaintext application data. Applications which reveal user identifying information in requests may violate client privacy, but not request privacy. This is why – unlike our full application-level Privacy Proxy product – Privacy Gateway is not meant to be used as a generic proxy-based protocol for arbitrary applications and traffic. It is meant to be a special purpose protocol for sensitive applications, including DNS (as is evidenced by Oblivious DNS-over-HTTPS), telemetry data, or generic API requests as discussed above.

Integrating Privacy Gateway into your application

Integrating with Privacy Gateway requires applications to implement the client and server side of the OHTTP protocol. Let’s walk through what this entails.

Server Integration

The server-side part of the protocol is responsible for two basic tasks:

Publishing a public key for request encapsulation; and
Decrypting encapsulated client requests, processing the resulting request, and encrypting the corresponding response.

A public encapsulation key, called a key configuration, consists of a key identifier (so the server can support multiple keys at once for rotation purposes), cryptographic algorithm identifiers for encryption and decryption, and a public key:

Clients need this public key to create their request, and there are lots of ways to do this. Servers could fix a public key and then bake it into their application, but this would require a software update to rotate the key. Alternatively, clients could discover the public key some other way. Many discovery mechanisms exist and vary based on your threat model – see this document for more details. To start, a simple approach is to have clients fetch the public key directly from the server over some API. Below is a snippet of the API that our open source OHTTP server provides:

Once public key generation and distribution is solved, the server then needs to handle encapsulated requests from clients. For each request, the server needs to decrypt the request, translate the plaintext to a corresponding HTTP request that can be resolved, and then encrypt the resulting response back to the client.

Open source OHTTP libraries typically offer functions for request decryption and response encryption, whereas plaintext translation from binary HTTP to an HTTP request is handled separately. For example, our open source server delegates this translation to a different library that is specific to how Go HTTP requests are represented in memory. In particular, the function to translate from a plaintext request to a Go HTTP request is done with a function that has the following signature:

Conversely, translating a Go HTTP response to a plaintext binary HTTP response message is done with a function that has the following signature:

While there exist several open source libraries that one can use to implement OHTTP server support, we’ve packaged all of it up in our open source server implementation available here. It includes instructions for building, testing, and deploying to make it easy to get started.

Client integration

Naturally, the client-side behavior of OHTTP mirrors that of the server. In particular, the client must:

Discover or obtain the server public key; and
Encode and encrypt HTTP requests, send them to Privacy Gateway, and decrypt and decode the HTTP responses.

Discovery of the server public key depends on the server’s chosen deployment model. For example, if the public key is available over an API, clients can simply fetch it directly:

Encoding, encrypting, decrypting, and decoding are again best handled by OHTTP libraries when available. With these functions available, building client support is rather straightforward. A trivial example Go client using the library functions linked above is as follows:

A standalone client like this isn’t likely very useful to you if you have an existing application. To help integration into your existing application, we created a sample OHTTP client library that’s compatible with iOS and macOS applications. Additionally, if there’s language or platform support you would like to see to help ease integration on either or the client or server side, please let us know!

Interested?

Privacy Gateway is currently in early access – available to select privacy-oriented companies and partners. If you’re interested, please get in touch.

Stronger than a promise: proving Oblivious HTTP privacy properties

Christopher Wood — Thu, 27 Oct 2022 13:00:00 GMT

We recently announced Privacy Gateway, a fully managed, scalable, and performant Oblivious HTTP (OHTTP) relay. Conceptually, OHTTP is a simple protocol: end-to-end encrypted requests and responses are forwarded between client and server through a relay, decoupling who from what was sent. This is a common pattern, as evidenced by deployed technologies like Oblivious DoH and Apple Private Relay. Nevertheless, OHTTP is still new, and as a new protocol it’s imperative that we analyze the protocol carefully.

To that end, we conducted a formal, computer-aided security analysis to complement the ongoing standardization process and deployment of this protocol. In this post, we describe this analysis in more depth, digging deeper into the cryptographic details of the protocol and the model we developed to analyze it. If you’re already familiar with the OHTTP protocol, feel free to skip ahead to the analysis to dive right in. Otherwise, let’s first review what OHTTP sets out to achieve and how the protocol is designed to meet those goals.

Decoupling who from what was sent

OHTTP is a protocol that combines public key encryption with a proxy to separate the contents of an HTTP request (and response) from the sender of an HTTP request. In OHTTP, clients generate encrypted requests and send them to a relay, the relay forwards them to a gateway server, and then finally the gateway decrypts the message to handle the request. The relay only ever sees ciphertext and the client and gateway identities, and the gateway only ever sees the relay identity and plaintext.

In this way, OHTTP is a lightweight application-layer proxy protocol. This means that it proxies application messages rather than network-layer connections. This distinction is important, so let’s make sure we understand the differences. Proxying connections involves a whole other suite of protocols typically built on HTTP CONNECT. (Technologies like VPNs and WireGuard, including Cloudflare WARP, can also be used, but let’s focus on HTTP CONNECT for comparison.)

Connection-oriented proxy depiction

Since the entire TCP connection itself is proxied, connection-oriented proxies are compatible with any application that uses TCP. In effect, they are general purpose proxy protocols that support any type of application traffic. In contrast, proxying application messages is compatible with application use cases that require transferring entire objects (messages) between a client and server.

Message-oriented proxy depiction

Examples include DNS requests and responses, or, in the case of OHTTP, HTTP requests and responses. In other words, OHTTP is not a general purpose proxy protocol: it’s fit for purpose, aimed at transactional interactions between clients and servers (such as app-level APIs). As a result, it is much simpler in comparison.

Applications use OHTTP to ensure that requests are not linked to either of the following:

Client identifying information, including the IP address, TLS fingerprint, and so on. As a proxy protocol, this is a fundamental requirement.
Future requests from the same client. This is necessary for applications that do not carry state across requests.

These two properties make OHTTP a perfect fit for applications that wish to provide privacy to their users without compromising basic functionality. It’s served as the foundation for a widespread deployment of Oblivious DoH for over a year now, and as of recently, serves as the foundation for Flo Health Inc.’s Anonymous Mode feature.

It’s worth noting that both of these properties could be achieved with a connection-oriented protocol, but at the cost of a new end-to-end TLS connection for each message that clients wish to transmit. This can be prohibitively expensive for all entities that participate in the protocol.

So how exactly does OHTTP achieve these goals? Let’s dig deeper into OHTTP to find out.

Oblivious HTTP protocol design

A single transaction in OHTTP involves the following steps:

A client encapsulates an HTTP request using the public key of the gateway server, and sends it to the relay over a client<>relay HTTPS connection.
The relay forwards the request to the server over its own relay<>gateway HTTPS connection.
The gateway decapsulates the request, forwarding it to the target server which can produce the resource.
The gateway returns an encapsulated response to the relay, which then forwards the result to the client.

Observe that in this transaction the relay only ever sees the client and gateway identities (the client IP address and the gateway URL, respectively), but does not see any application data. Conversely, the gateway sees the application data and the relay IP address, but does not see the client IP address. Neither party has the full picture, and unless the relay and gateway collude, it stays that way.

The HTTP details for forwarding requests and responses in the transaction above are not technically interesting – a message is sent from sender to receiver over HTTPS using a POST – so we’ll skip over them. The fascinating bits are in the request and response encapsulation, which build upon HPKE, a recently ratified standard for hybrid public key encryption.

Let’s begin with request encapsulation, which is hybrid public key encryption. Clients first transform their HTTP request into a binary format, called Binary HTTP, as specified by RFC9292. Binary HTTP is, as the name suggests, a binary format for encoding HTTP messages. This representation lets clients encode HTTP requests to binary-encoded values and for the gateway to reverse this process, recovering an HTTP request from a binary-encoded value. Binary encoding is necessary because the public key encryption layer expects binary-encoded inputs.

Once the HTTP request is encoded in binary format, it is then fed into HPKE to produce an encrypted message, which clients then send to the relay to be forwarded to the gateway. The gateway decrypts this message, transforms the binary-encoded request back to its equivalent HTTP request, and then forwards it to the target server for processing.

Responses from the gateway are encapsulated back to the client in a very similar fashion. The gateway first encodes the response in an equivalent binary HTTP message, encrypts it using a symmetric key known only to the client and gateway, and then returns it to the relay to be forwarded to the client. The client decrypts and transforms this message to recover the result.

Simplified model and security goals

In our formal analysis, we set out to make sure that OHTTP’s use of encryption and proxying achieves the desired privacy goals described above.

To motivate the analysis, consider the following simplified model where there exists two clients C1 and C2, one relay R, and one gateway, G. OHTTP assumes an attacker that can observe all network activity and can adaptively compromise either R or G, but not C1 or C2. OHTTP assumes that R and G do not collude, and so we assume only one of R and G is compromised. Once compromised, the attacker has access to all session information and private key material for the compromised party. The attacker is prohibited from sending client-identifying information, such as IP addresses, to the gateway. (This would allow the attacker to trivially link a query to the corresponding client.)

In this model, both C1 and C2 send OHTTP requests Q1 and Q2, respectively, through R to G, and G provides answers A1 and A2. The attacker aims to link C1 to (Q1, A1) and C2 to (Q2, A2), respectively. The attacker succeeds if this linkability is possible without any additional interaction. OHTTP prevents such linkability. Informally, this means:

Requests and responses are known only to clients and gateways in possession of the corresponding response key and HPKE keying material.
The gateway cannot distinguish between two identical requests generated from the same client, and two identical requests generated from different clients, in the absence of unique per-client keys.

And informally it might seem clear that OHTTP achieves these properties. But we want to prove this formally, which means that the design, if implemented perfectly, would have these properties. This type of formal analysis is distinct from formal verification, where you take a protocol design and prove that some code implements it correctly. Whilst both are useful they are different processes, and in this blog post we’ll be talking about the former. But first, let’s give some background on formal analysis.

Formal analysis programming model

In our setting, a formal analysis involves producing an algebraic description of the protocol and then using math to prove that the algebraic description has the properties we want. The end result is proof that shows that our idealized algebraic version of the protocol is “secure”, i.e. has the desired properties, with respect to an attacker we want to defend against. In our case, we chose to model our idealized algebraic version of OHTTP using a tool called Tamarin, a security-focused theorem prover and model checker. Tamarin is an intimidating tool to use, but makes intuitive sense once you get familiar with it. We’ll break down the various parts of a Tamarin model in the context of our OHTTP model below.

Modeling the Protocol Behavior

Tamarin uses a technique known as multiset rewriting to describe protocols. A protocol description is formed of a series of “rules” that can “fire” when certain requirements are met. Each rule represents a discrete step in the protocol, and when a rule fires that means the step was taken. For example, we have a rule representing the gateway generating its long-term public encapsulation key, and for different parties in the protocol establishing secure TLS connections. These rules can be triggered pretty much any time as they have no requirements.

Basic rule for OHTTP gateway key generation

Tamarin represents these requirements as “facts”. A rule can be triggered when the right facts are available. Tamarin stores all the available facts in a “bag” or multiset. A multiset is similar to an ordinary set, in that it stores a collection of objects in an unordered fashion, but unlike an ordinary set, duplicate objects are allowed. This is the “multiset” part of “multiset rewriting”.

The rewriting part refers to the output of our rules. When a rule triggers it takes some available facts out of the bag and, when finished, inserts some new facts into the bag. These new facts might fulfill the requirements of some other rule, which can then be triggered, producing even more new facts, and so on¹. In this way we can represent progress through the protocol. Using input and output facts, we can describe our rule for generating long-term public encapsulation keys, which has no requirements and produces a long-term key as output, as follows.

A rule requirement is satisfied if there exist output facts that match the rule’s input facts. As an example, in OHTTP, one requirement for the client rule for generating a request is that the long-term public encapsulation key exists. This matching is shown below.

Let’s put some of these pieces together to show a very small but concrete part of OHTTP as an example: the client generating its encapsulated request and sending it to the relay. This step should produce a message for the relay, as well as any corresponding state needed to process the eventual response from the relay. As a precondition, the client requires (1) the gateway public key and (2) a TLS connection to the relay. And as mentioned earlier, generating the public key and TLS connection do not require any inputs, so they can be done at any time.

Modeling events in time

Beyond consuming and producing new facts, each Tamarin rule can also create side effects, called “action facts.” Tamarin records the action facts each time a rule is triggered. An action fact might be something like “a client message containing the contents m was sent at time t.” Sometimes rules can only be triggered in a strict sequence, and we can therefore put their action facts in a fixed time order. At other times multiple rules might have their prerequisites met at the same time, and therefore we can’t put their action facts into a strict time sequence. We can represent this pattern of partially ordered implications as a directed acyclic graph, or DAG for short.

Altogether, multiset rewriting rules describe the steps of a protocol, and the resulting DAG records the actions associated with the protocol description. We refer to the DAG of actions as the action graph. If we’ve done our job well it’s possible to follow these rules and produce every possible combination of messages or actions allowed by the protocol, and their corresponding action graph.

As an example of the action graph, let’s consider what happens when the client successfully finishes the protocol. When the requirements for this rule are satisfied, the rule triggers, marking that the client is done and that the response was valid. Since the protocol is done at this point, there are no output facts produced.

Action graph for terminal client response handler rule

Modeling the attacker

The action graph is core to reasoning about the protocol’s security properties. We can check a graph for various properties, e.g. “does the first action taken by the relay happen after the first action taken by the client?”. Our rules allow for multiple runs of the protocol to happen at the same time. This is very powerful. We can look at a graph and ask “did something bad happen here that might break the protocol’s security properties?”

In particular, we can prove (security and correctness) properties by querying this graph, or by asserting various properties about it. For example, we might say “for all runs of the protocol, if the client finishes the protocol and can decrypt the response from the gateway, then the response must have been generated and encrypted by an entity which has the corresponding shared secret.”

This is a useful statement, but it doesn’t say much about security. What happens if the gateway private key is compromised, for example? In order to prove security properties, we need to define our threat model, which includes the adversary and their capabilities. In Tamarin, we encode the threat model as part of the protocol model. For example, when we define messages being passed from the client to the relay, we can add a special rule that allows the attacker to read it as it goes past. This gives us the ability to describe properties such as “for all runs of the protocol in our language the attacker never learns the secret key.”

For security protocols, we typically give the attacker the ability to read, modify, drop, and replay any message. This is sometimes described as “the attacker controls the network”, or a Dolev-Yao attacker. However, the attacker can also sometimes compromise different entities in a protocol, learning state associated with that entity. This is sometimes called an extended Dolev-Yao attacker, and it is precisely the attacker we consider in our model.

Going back to our model, we give the attacker the ability to compromise long-term key pairs and TLS sessions as needed through different rules. These set various action facts that mark the fact that compromise took place.

Action graph for key compromise rule

Putting everything together, we have a way to model the protocol behavior, attacker capabilities, and security properties. Let’s now dive into how we applied these to prove OHTTP secure.

OHTTP Tamarin model

In our model, we give the attacker the ability to compromise the server’s long-term keys and the key between the client and the relay. Against this attacker, we aim to prove these two informal statements stated above:

Requests and responses are known only to clients and gateways in possession of the corresponding response key and HPKE keying material.
The gateway cannot distinguish between two requests generated from the same client, and two requests generated from different clients, in the absence of unique per-client keys.

To prove these formally, we express them somewhat differently. First, we assert that the protocol actually completes. This is an important step, because if your model has a bug in it where the protocol can’t even run as intended, then Tamarin is likely to say it’s “secure” because nothing bad ever happens.

For the core security properties, we translate the desired goals into questions we ask about the model. In this way, formal analysis only provides us proof (or disproof!) of the questions we ask, not the questions we should have asked, and so this translation relies on experience and expertise. We break down this translation for each of the questions we want to ask below, starting with gateway authentication.

Gateway authentication Unless the attacker has compromised the gateway’s long term keys, if the client completes the protocol and is able to decrypt the gateway’s response, then it knows that: the responder was the gateway it intended to use, the gateway derived the same keys, the gateway saw the request the client sent, and the response the client received is the one the gateway sent.

This tells us that the protocol actually worked, and that the messages sent and received were as they were supposed to be. One aspect of authentication can be that the participants agree on some data, so although this protocol seems to be a bit of a grab bag of properties they’re all part of one authentication property.

Next, we need to prove that the request and response remain secret. There are several ways in which secrecy may be violated, e.g., if encryption or decryption keys are compromised. We do so by proving the following properties.

Request and response secrecyThe request and response are both secret, i.e., the attacker never learns them, unless the attacker has compromised the gateway’s long term keys.

In a sense, request and response secrecy covers the case where the gateway is malicious, because if the gateway is malicious then the “attacker” knows the gateway’s long term keys.

Relay connection securityThe contents of the connection between the client and relay are secret unless the attacker has compromised the relay.

We don’t have to worry about the secrecy of the connection if the client is compromised because in that scenario the attacker knows the query before it’s even been sent, and can learn the response by making an honest query itself. If your client is compromised then it’s game over.

AEAD nonce reuse resistanceIf the gateway sends a message to the client, and the attacker finds a different message encrypted with the same key and nonce, then either the attacker has already compromised the gateway, or they already knew the query.

In translation, this property means that the response encryption is correct and not vulnerable to attack, such as through AEAD nonce reuse. This would obviously be a disaster for OHTTP, so we were careful to check that this situation never arises, especially as we’d already detected this issue in ODoH.

Finally, and perhaps most importantly, we want to prove that an attacker can’t link a particular query to a client. We prove a slightly different property which effectively argues that, unless the relay and gateway collude, then the attacker cannot link the encrypted query to its decrypted query together. In particular, we prove the following:

Client unlinkabilityIf an attacker knows the query and the contents of the connection sent to the relay (i.e. the encrypted query), then it must have compromised both the gateway and the relay.

This doesn’t in general prove indistinguishability. There are two techniques an attacker can use to link two queries. Direct inference and statistical analysis. Because of the anonymity trilemma we know that we cannot defend against statistical analysis, so we have to declare it out of scope and move on. To prevent direct inference we need to make sure that the attacker doesn't compromise either the client, or both the relay and the gateway together, which would let it directly link the queries. So is there anything we can protect against? Thankfully there is one thing. We can make sure that a malicious gateway can't identify that a single client sent two messages. We prove that by not keeping any state between connections. If a returning client acts in exactly the same way as a new client, and doesn't carry any state between requests, there's nothing for the malicious gateway to analyze.

And that’s it! If you want to have a go at proving some of these properties yourself our models and proofs are available on our GitHub, as are our ODoH models and proofs. The Tamarin prover is freely available too, so you can double-check all our work. Hopefully this post has given you a flavor of what we mean when we say that we’ve proven a protocol secure, and inspired you to have a go yourself. If you want to work on great projects like this check out our careers page.

¹Depending on the model, this process can lead to an exponential blow-up in search space, making it impossible to prove anything automatically. Moreover, if the new output facts do not fulfill the requirements of any remaining rule(s) then the process hangs.

Unlocking QUIC’s proxying potential with MASQUE

Lucas Pardue — Sun, 20 Mar 2022 16:58:37 GMT

In the last post on proxy TCP-based applications, we discussed how HTTP CONNECT can be used to proxy TCP-based applications, including DNS-over-HTTPS and generic HTTPS traffic, between a client and target server. This provides significant benefits for those applications, but it doesn’t lend itself to non-TCP applications. And if you’re wondering whether or not we care about these, the answer is an affirmative yes!

For instance, HTTP/3 is based on QUIC, which runs on top of UDP. What if we wanted to speak HTTP/3 to a target server? That requires two things: (1) the means to encapsulate a UDP payload between client and proxy (which the proxy decapsulates and forward to the target in an actual UDP datagram), and (2) a way to instruct the proxy to open a UDP association to a target so that it knows where to forward the decapsulated payload. In this post, we’ll discuss answers to these two questions, starting with encapsulation.

Encapsulating datagrams

While TCP provides a reliable and ordered byte stream for applications to use, UDP instead provides unreliable messages called datagrams. Datagrams sent or received on a connection are loosely associated, each one is independent from a transport perspective. Applications that are built on top of UDP can leverage the unreliability for good. For example, low-latency media streaming often does so to avoid lost packets getting retransmitted. This makes sense, on a live teleconference it is better to receive the most recent audio or video rather than starting to lag behind while you're waiting for stale data

QUIC is designed to run on top of an unreliable protocol such as UDP. QUIC provides its own layer of security, packet loss detection, methods of data recovery, and congestion control. If the layer underneath QUIC duplicates those features, they can cause wasted work or worse create destructive interference. For instance, QUIC congestion control defines a number of signals that provide input to sender-side algorithms. If layers underneath QUIC affect its packet flows (loss, timing, pacing, etc), they also affect the algorithm output. Input and output run in a feedback loop, so perturbation of signals can get amplified. All of this can cause congestion control algorithms to be more conservative in the data rates they use.

If we could speak HTTP/3 to a proxy, and leverage a reliable QUIC stream to carry encapsulated datagrams payload, then everything can work. However, the reliable stream interferes with expectations. The most likely outcome being slower end-to-end UDP throughput than we could achieve without tunneling. Stream reliability runs counter to our goals.

Fortunately, QUIC's unreliable datagram extension adds a new DATAGRAM frame that, as its name plainly says, is unreliable. It has several uses; the one we care about is that it provides a building block for performant UDP tunneling. In particular, this extension has the following properties:

DATAGRAM frames are individual messages, unlike a long QUIC stream.
DATAGRAM frames do not contain a multiplexing identifier, unlike QUIC's stream IDs.
Like all QUIC frames, DATAGRAM frames must fit completely inside a QUIC packet.
DATAGRAM frames are subject to congestion control, helping senders to avoid overloading the network.
DATAGRAM frames are acknowledged by the receiver but, importantly, if the sender detects a loss, QUIC does not retransmit the lost data.

The Datagram "Unreliable Datagram Extension to QUIC" specification will be published as an RFC soon. Cloudflare's quiche library has supported it since October 2020.

Now that QUIC has primitives that support sending unreliable messages, we have a standard way to effectively tunnel UDP inside it. QUIC provides the STREAM and DATAGRAM transport primitives that support our proxying goals. Now it is the application layer responsibility to describe how to use them for proxying. Enter MASQUE.

MASQUE: Unlocking QUIC’s potential for proxying

Now that we’ve described how encapsulation works, let’s now turn our attention to the second question listed at the start of this post: How does an application initialize an end-to-end tunnel, informing a proxy server where to send UDP datagrams to, and where to receive them from? This is the focus of the MASQUE Working Group, which was formed in June 2020 and has been designing answers since. Many people across the Internet ecosystem have been contributing to the standardization activity. At Cloudflare, that includes Chris (as co-chair), Lucas (as co-editor of one WG document) and several other colleagues.

MASQUE started solving the UDP tunneling problem with a pair of specifications: a definition for how QUIC datagrams are used with HTTP/3, and a new kind of HTTP request that initiates a UDP socket to a target server. These have built on the concept of extended CONNECT, which was first introduced for HTTP/2 in RFC 8441 and has now been ported to HTTP/3. Extended CONNECT defines the :protocol pseudo-header that can be used by clients to indicate the intention of the request. The initial use case was WebSockets, but we can repurpose it for UDP and it looks like this:

A client sends an extended CONNECT request to a proxy server, which identifies a target server in the :path. If the proxy succeeds in opening a UDP socket, it responds with a 2xx (Successful) status code. After this, an end-to-end flow of unreliable messages between the client and target is possible; the client and proxy exchange QUIC DATAGRAM frames with an encapsulated payload, and the proxy and target exchange UDP datagrams bearing that payload.

Anatomy of Encapsulation

UDP tunneling has a constraint that TCP tunneling does not – namely, the size of messages and how that relates to path MTU (Maximum Transmission Unit; for more background see our Learning Center article). The path MTU is the maximum size that is allowed on the path between client and server. The actual maximum is the smallest maximum across all elements at every hop and at every layer, from the network up to application. All it takes is for one component with a small MTU to reduce the path MTU entirely. On the Internet, 1,500 bytes is a common practical MTU. When considering tunneling using QUIC, we need to appreciate the anatomy of QUIC packets and frames in order to understand how they add bytes of overheard. This consumes bytes and subtracts from our theoretical maximum.

We've been talking in terms of HTTP/3 which normally has its own frames (HEADERS, DATA, etc) that have a common type and length overhead. However, there is no HTTP/3 framing when it comes to DATAGRAM, instead the bytes are placed directly into the QUIC frame. This frame is composed of two fields. The first field is a variable number of bytes, called the Quarter Stream ID field, which is an encoded identifier that supports independent multiplexed DATAGRAM flows. It does so by binding each DATAGRAM to the HTTP request stream ID. In QUIC, stream IDs use two bits to encode four types of stream. Since request streams are always of one type (client-initiated bidirectional, to be exact), we can divide their ID by four to save space on the wire. Hence the name Quarter Stream ID. The second field is payload, which contains the end-to-end message payload. Here's how it might look on the wire.

If you recall our lesson from the last post, DATAGRAM frames (like all frames) must fit completely inside a QUIC packet. Moreover, since QUIC requires that fragmentation is disabled, QUIC packets must fit completely inside a UDP datagram. This all combines to limit the maximum size of things that we can actually send: the path MTU determines the size of the UDP datagram, then we need to subtract the overheads of the UDP datagram header, QUIC packet header, and QUIC DATAGRAM frame header. For a better understanding of QUIC's wire image and overheads, see Section 5 of RFC 8999 and Section 12.4 of RFC 9000.

If a sender has a message that is too big to fit inside the tunnel, there are only two options: discard the message or fragment it. Neither of these are good options. Clients create the UDP tunnel and are more likely to accurately calculate the real size of encapsulated UDP datagram payload, thus avoiding the problem. However, a target server is most likely unaware that a client is behind a proxy, so it cannot accommodate the tunneling overhead. It might send a UDP datagram payload that is too big for the proxy to encapsulate. This conundrum is common to all proxy protocols! There's an art in picking the right MTU size for UDP-based traffic in the face of tunneling overheads. While approaches like path MTU discovery can help, they are not a silver bullet. Choosing conservative maximum sizes can reduce the chances of tunnel-related problems. However, this needs to be weighed against being too restrictive. Given a theoretical path MTU of 1,500, once we consider QUIC encapsulation overheads, tunneled messages with a limit between 1,200 and 1,300 bytes can be effective.This is especially important when we think about tunneling QUIC itself. RFC 9000 Section 8.1 details how clients that initiate new QUIC connections must send UDP datagrams of at least 1,200 bytes. If a proxy can't support that, then QUIC will not work in a tunnel.

Nested tunneling for Improved Privacy Proxying

MASQUE gives us the application layer building blocks to support efficient tunneling of TCP or UDP traffic. What's cool about this is that we can combine these blocks into different deployment architectures for different scenarios or different needs.

One example of this case is nested tunneling via multiple proxies, which can minimize the connection metadata available to each individual proxy or server (one example of this type of deployment is described in our recent post on iCloud Private Relay). In this kind of setup, a client might manage at least three logical connections. First, a QUIC connection between Client and Proxy 1. Second, a QUIC connection between Client and Proxy 2, which runs via a CONNECT tunnel in the first connection. Third, an end-to-end byte stream between Client and Server, which runs via a CONNECT tunnel in the second connection. A real TCP connection only exists between Proxy 2 and Server. If additional Client to Server logical connections are needed, they can be created inside the existing pair of QUIC connections.

Towards a full tunnel with IP tunneling

Proxy support for UDP and TCP already unblocks a huge assortment of use cases, including TLS, QUIC, HTTP, DNS, and so on. But it doesn’t help protocols that use different IP protocols, like ICMP or IPsec Encapsulating Security Payload (ESP). Fortunately, the MASQUE Working Group has also been working on IP tunneling. This is a lot more complex than UDP tunneling, so they first spent some time defining a common set of requirements. The group has recently adopted a new specification to support IP proxying over HTTP. This behaves similarly to the other CONNECT designs we've discussed but with a few differences. Indeed, IP proxying support using HTTP as a substrate would unlock many applications that existing protocols like IPsec and WireGuard enable.

At this point, it would be reasonable to ask: “A complete HTTP/3 stack is a bit excessive when all I need is a simple end-to-end tunnel, right?” Our answer is, it depends! CONNECT-based IP proxies use TLS and rely on well established PKIs for creating secure channels between endpoints, whereas protocols like WireGuard use a simpler cryptographic protocol for key establishment and defer authentication to the application. WireGuard does not support proxying over TCP but can be adapted to work over TCP transports, if necessary. In contrast, CONNECT-based proxies do support TCP and UDP transports, depending on what version of HTTP is used. Despite these differences, these protocols do share similarities. In particular, the actual framing used by both protocols – be it the TLS record layer or QUIC packet protection for CONNECT-based proxies, or WireGuard encapsulation – are not interoperable but only slightly differ in wire format. Thus, from a performance perspective, there’s not really much difference.

In general, comparing these protocols is like comparing apples and oranges – they’re fit for different purposes, have different implementation requirements, and assume different ecosystem participants and threat models. At the end of the day, CONNECT-based proxies are better suited to an ecosystem and environment that is already heavily invested in TLS and the existing WebPKI, so we expect CONNECT-based solutions for IP tunnels to become the norm in the future. Nevertheless, it's early days, so be sure to watch this space if you’re interested in learning more!

Looking ahead

The IETF has chartered the MASQUE Working Group to help design an HTTP-based solution for UDP and IP that complements the existing CONNECT method for TCP tunneling. Using HTTP semantics allows us to use features like request methods, response statuses, and header fields to enhance tunnel initialization. For example, allowing for reuse of existing authentication mechanisms or the Proxy-Status field. By using HTTP/3, UDP and IP tunneling can benefit from QUIC's secure transport native unreliable datagram support, and other features. Through a flexible design, older versions of HTTP can also be supported, which helps widen the potential deployment scenarios. Collectively, this work brings proxy protocols to the masses.

While the design details of MASQUE specifications continue to be iterated upon, so far several implementations have been developed, some of which have been interoperability tested during IETF hackathons. This running code helps inform the continued development of the specifications. Details are likely to continue changing before the end of the process, but we should expect the overarching approach to remain similar. Join us during the MASQUE WG meeting in IETF 113 to learn more!

A Primer on Proxies

Lucas Pardue — Sat, 19 Mar 2022 17:01:15 GMT

Traffic proxying, the act of encapsulating one flow of data inside another, is a valuable privacy tool for establishing boundaries on the Internet. Encapsulation has an overhead, Cloudflare and our Internet peers strive to avoid turning it into a performance cost. MASQUE is the latest collaboration effort to design efficient proxy protocols based on IETF standards. We're already running these at scale in production; see our recent blog post about Cloudflare's role in iCloud Private Relay for an example.

In this blog post series, we’ll dive into proxy protocols.

To begin, let’s start with a simple question: what is proxying? In this case, we are focused on forward proxying — a client establishes an end-to-end tunnel to a target server via a proxy server. This contrasts with the Cloudflare CDN, which operates as a reverse proxy that terminates client connections and then takes responsibility for actions such as caching, security including WAF, load balancing, etc. With forward proxying, the details about the tunnel, such as how it is established and used, whether it provides confidentiality via authenticated encryption, and so on, vary by proxy protocol. Before going into specifics, let’s start with one of the most common tunnels used on the Internet: TCP.

Transport basics: TCP provides a reliable byte stream

The TCP transport protocol is a rich topic. For the purposes of this post, we will focus on one aspect: TCP provides a readable and writable, reliable, and ordered byte stream. Some protocols like HTTP and TLS require reliable transport underneath them and TCP's single byte stream is an ideal fit. The application layer reads or writes to this byte stream, but the details about how TCP sends this data "on the wire" are typically abstracted away.

Large application objects are written into a stream, then they are split into many small packets, and they are sent in order to the network. At the receiver, packets are read from the network and combined back into an identical stream. Networks are not perfect and packets can be lost or reordered. TCP is clever at dealing with this and not worrying the application with details. It just works. A way to visualize this is to imagine a magic paper shredder that can both shred documents and convert shredded papers back to whole documents. Then imagine you and your friend bought a pair of these and decided that it would be fun to send each other shreds.

The one problem with TCP is that when a lost packet is detected at a receiver, the sender needs to retransmit it. This takes time to happen and can mean that the byte stream reconstruction gets delayed. This is known as TCP head-of-line blocking. Applications regularly use TCP via a socket API that abstracts away protocol details; they often can't tell if there are delays because the other end is slow at sending or if the network is slowing things down via packet loss.

Proxy Protocols

Proxying TCP is immensely useful for many applications, including, though certainly not limited to HTTPS, SSH, and RDP. In fact, Oblivious DoH, which is a proxy protocol for DNS messages, could very well be implemented using a TCP proxy, though there are reasons why this may not be desirable. Today, there are a number of different options for proxying TCP end-to-end, including:

SOCKS, which runs in cleartext and requires an expensive connection establishment step.
Transparent TCP proxies, commonly referred to as performance enhancing proxies (PEPs), which must be on path and offer no additional transport security, and, definitionally, are limited to TCP protocols.
Layer 4 proxies such as Cloudflare Spectrum, which might rely on side carriage metadata via something like the PROXY protocol.
HTTP CONNECT, which transforms HTTPS connections into opaque byte streams.

While SOCKS and PEPs are viable options for some use cases, when choosing which proxy protocol to build future systems upon, it made most sense to choose a reusable and general-purpose protocol that provides well-defined and standard abstractions. As such, the IETF chose to focus on using HTTP as a substrate via the CONNECT method.

The concept of using HTTP as a substrate for proxying is not new. Indeed, HTTP/1.1 and HTTP/2 have supported proxying TCP-based protocols for a long time. In the following sections of this post, we’ll explain in detail how CONNECT works across different versions of HTTP, including HTTP/1.1, HTTP/2, and the recently standardized HTTP/3.

HTTP/1.1 and CONNECT

In HTTP/1.1, the CONNECT method can be used to establish an end-to-end TCP tunnel to a target server via a proxy server. This is commonly applied to use cases where there is a benefit of protecting the traffic between the client and the proxy, or where the proxy can provide access control at network boundaries. For example, a Web browser can be configured to issue all of its HTTP requests via an HTTP proxy.

A client sends a CONNECT request to the proxy server, which requests that it opens a TCP connection to the target server and desired port. It looks something like this:

If the proxy succeeds in opening a TCP connection to the target, it responds with a 2xx range status code. If there is some kind of problem, an error status in the 5xx range can be returned. Once a tunnel is established there are two independent TCP connections; one on either side of the proxy. If a flow needs to stop, you can simply terminate them.

HTTP CONNECT proxies forward data between the client and the target server. The TCP packets themselves are not tunneled, only the data on the logical byte stream. Although the proxy is supposed to forward data and not process it, if the data is plaintext there would be nothing to stop it. In practice, CONNECT is often used to create an end-to-end TLS connection where only the client and target server have access to the protected content; the proxy sees only TLS records and can't read their content because it doesn't have access to the keys.

Finally, it's worth noting that after a successful CONNECT request, the HTTP connection (and the TCP connection underpinning it) has been converted into a tunnel. There is no more possibility of issuing other HTTP messages, to the proxy itself, on the connection.

HTTP/2 and CONNECT

HTTP/2 adds logical streams above the TCP layer in order to support concurrent requests and responses on a single connection. Streams are also reliable and ordered byte streams, operating on top of TCP. Returning to our magic shredder analogy: imagine you wanted to send a book. Shredding each page one after another and rebuilding the book one page at a time is slow, but handling multiple pages at the same time might be faster. HTTP/2 streams allow us to do that. But, as we all know, trying to put too much into a shredder can sometimes cause it to jam.

In HTTP/2, each request and response is sent on a different stream. To support this, HTTP/2 defines frames that contain the stream identifier that they are associated with. Requests and responses are composed of HEADERS and DATA frames which contain HTTP header fields and HTTP content, respectively. Frames can be large. When they are sent on the wire they might span multiple TLS records or TCP segments. Side note: the HTTP WG has been working on a new revision of the document that defines HTTP semantics that are common to all HTTP versions. The terms message, header fields, and content all come from this description.

HTTP/2 concurrency allows applications to read and write multiple objects at different rates, which can improve HTTP application performance, such as web browsing. HTTP/1.1 traditionally dealt with this concurrency by opening multiple TCP connections in parallel and striping requests across these connections. In contrast, HTTP/2 multiplexes frames belonging to different streams onto the single byte stream provided by one TCP connection. Reusing a single connection has benefits, but it still leaves HTTP/2 at risk of TCP head-of-line blocking. For more details, refer to Perf Planet blog.

HTTP/2 also supports the CONNECT method. In contrast to HTTP/1.1, CONNECT requests do not take over an entire HTTP/2 connection. Instead, they convert a single stream into an end-to-end tunnel. It looks something like this:

If the proxy succeeds in opening a TCP connection, it responds with a 2xx (Successful) status code. After this, the client sends DATA frames to the proxy, and the content of these frames are put into TCP packets sent to the target. In the return direction, the proxy reads from the TCP byte stream and populates DATA frames. If a tunnel needs to stop, you can simply terminate the stream; there is no need to terminate the HTTP/2 connection.

By using HTTP/2, a client can create multiple CONNECT tunnels in a single connection. This can help reduce resource usage (saving the global count of TCP connections) and allows related tunnels to be logically grouped together, ensuring that they "share fate" when either client or proxy need to gracefully close. On the proxy-to-server side there are still multiple independent TCP connections.

One challenge of multiplexing tunnels on concurrent streams is how to effectively prioritize them. We've talked in the past about prioritization for web pages, but the story is a bit different for CONNECT. We've been thinking about this and captured considerations in the new Extensible Priorities draft.

QUIC, HTTP/3 and CONNECT

QUIC is a new secure and multiplexed transport protocol from the IETF. QUIC version 1 was published as RFC 9000 in May 2021 and, the next day, we enabled it for all Cloudflare customers.

QUIC is composed of several foundational features. You can think of these like individual puzzle pieces that interlink to form a transport service. This service needs one more piece, an application mapping, to bring it all together.

Similar to HTTP/2, QUIC version 1 provides reliable and ordered streams. But QUIC streams live at the transport layer and they are the only type of QUIC primitive that can carry application data. QUIC has no opinion on how streams get used. Applications that wish to use QUIC must define that themselves.

QUIC streams can be long (up to 2^62 - 1 bytes). Stream data is sent on the wire in the form of STREAM frames. All QUIC frames must fit completely inside a QUIC packet. QUIC packets must fit entirely in a UDP datagram; fragmentation is prohibited. These requirements mean that a long stream is serialized to a series of QUIC packets sized roughly to the path MTU (Maximum Transmission Unit). STREAM frames provide reliability via QUIC loss detection and recovery. Frames are acknowledged by the receiver and if the sender detects a loss (via missing acknowledgments), QUIC will retransmit the lost data. In contrast, TCP retransmits packets. This difference is an important feature of QUIC, letting implementations decide how to repacketize and reschedule lost data.

When multiplexing streams, different packets can contain STREAM frames belonging to different stream identifiers. This creates independence between streams and helps avoid the head-of-line blocking caused by packet loss that we see in TCP. If a UDP packet containing data for one stream is lost, other streams can continue to make progress without being blocked by retransmission of the lost stream.

To use our magic shredder analogy one more time: we're sending a book again, but this time we parallelise our task by using independent shredders. We need to logically associate them together so that the receiver knows the pages and shreds are all for the same book, but otherwise they can progress with less chance of jamming.

HTTP/3 is an example of an application mapping that describes how streams are used to exchange: HTTP settings, QPACK state, and request and response messages. HTTP/3 still defines its own frames like HEADERS and DATA, but it is overall simpler than HTTP/2 because QUIC deals with the hard stuff. Since HTTP/3 just sees a logical byte stream, its frames can be arbitrarily sized. The QUIC layer handles segmenting HTTP/3 frames over STREAM frames for sending in packets. HTTP/3 also supports the CONNECT method. It functions identically to CONNECT in HTTP/2, each request stream converting to an end-to-end tunnel.

HTTP packetization comparison

We've talked about HTTP/1.1, HTTP/2 and HTTP/3. The diagram below is a convenient way to summarize how HTTP requests and responses get serialized for transmission over a secure transport. The main difference is that with TLS, protected records are split across several TCP segments. While with QUIC there is no record layer, each packet has its own protection.

Limitations and looking ahead

HTTP CONNECT is a simple and elegant protocol that has a tremendous number of application use cases, especially for privacy-enhancing technology. In particular, applications can use it to proxy DNS-over-HTTPS similar to what’s been done for Oblivious DoH, or more generic HTTPS traffic (based on HTTP/1.1 or HTTP/2), and many more.

However, what about non-TCP traffic? Recall that HTTP/3 is an application mapping for QUIC, and therefore runs over UDP as well. What if we wanted to proxy QUIC? What if we wanted to proxy entire IP datagrams, similar to VPN technologies like IPsec or WireGuard? This is where MASQUE comes in. In the next post, we’ll discuss how the MASQUE Working Group is standardizing technologies to enable proxying for datagram-based protocols like UDP and IP.

Announcing experimental DDR in 1.1.1.1

Christopher Wood — Tue, 08 Mar 2022 20:55:00 GMT

1.1.1.1 sees approximately 600 billion queries per day. However, proportionally, most queries sent to this resolver are over cleartext: 89% over UDP and TCP combined, and the remaining 11% are encrypted. We care about end-user privacy and would prefer to see all of these queries sent to us over an encrypted transport using DNS-over-TLS or DNS-over-HTTPS. Having a mechanism by which clients could discover support for encrypted protocols such as DoH or DoT will help drive this number up and lead to more name encryption on the Internet. That’s where DDR – or Discovery of Designated Resolvers – comes into play. As of today, 1.1.1.1 supports the latest version of DDR so clients can automatically upgrade non-secure UDP and TCP connections to secure connections. In this post, we’ll describe the motivations for DDR, how the mechanism works, and, importantly, how you can test it out as a client.

DNS transports and public resolvers

We initially launched our public recursive resolver service 1.1.1.1 over three years ago, and have since seen its usage steadily grow. Today, it is one of the fastest public recursive resolvers available to end-users, supporting the latest security and privacy DNS transports such as HTTP/3 for DNS-over-HTTPS (DoH), as well as Oblivious DoH.

As a public resolver, all clients, regardless of type, are typically manually configured based on a user’s desired performance, security, and privacy requirements. This choice reflects answers to two separate but related types of questions:

What recursive resolver should be used to answer my DNS queries? Does the resolver perform well? Does the recursive resolver respect my privacy?
What protocol should be used to speak to this particular recursive resolver? How can I keep my DNS data safe from eavesdroppers that should otherwise not have access to it?

The second question primarily concerns technical matters. In particular, whether or not a recursive resolver supports DoH is simple enough to answer. Either the recursive resolver does or does not support it!

In contrast, the first question is primarily a matter of policy. For example, consider the question of choosing between a local network-provided DNS recursive resolver and a public recursive resolver. How do resolver features (including DoH support, for example) influence this decision? How does the resolver’s privacy policy regarding data use and retention influence this decision? More generally, what information about recursive resolver capabilities is available to clients in making this decision and how is this information delivered to clients?

These policy questions have been the topic of substantial debate in the Internet Engineering Task Force (IETF), the standards body where DoH was standardized, and is the one facet of the Adaptive DNS Discovery (ADD) Working Group, which is chartered to work on the following items (among others):

_- Define a mechanism that allows clients to discover DNS resolvers that support encryption and that are available to the client either on the public Internet or on private or local networks.

- Define a mechanism that allows communication of DNS resolver information to clients for use in selection decisions. This could be part of the mechanism used for discovery, above._

In other words, the ADD Working Group aims to specify mechanisms by which clients can obtain the information they need to answer question (1). Critically, one of those pieces of information is what encrypted transport protocols the recursive resolver supports, which would answer question (2).

As the answer to question (2) is purely technical and not a matter of policy, the ADD Working Group was able to specify a workable solution that we’ve implemented and tested with existing clients. Before getting into the details of how it works, let’s dig into the problem statement here and see what’s required to address it.

Threat model and problem statement

The DDR problem is relatively straightforward: given the IP address of a DNS recursive resolver, how can one discover parameters necessary for speaking to the same resolver using an encrypted transport? (As above, discovering parameters for a different resolver is a distinctly different problem that pertains to policy and is therefore out of scope.)

This question is only meaningful insofar as using encryption helps protect against some attacker. Otherwise, if the network was trusted, encryption would add no value! A direct consequence is that this question assumes the network – for some definition of “the network” – is untrusted and encryption helps protect against this network.

But what exactly is the network here? In practice, the topology typically looks like the following:

Typical DNS configuration from DHCP

Again, for DNS discovery to have any meaning, we assume that either the ISP or home network – or both – is untrusted and malicious. The setting here depends on the client and the network they are attached to, but it’s generally simplest to assume the ISP network is untrusted.

This question also makes one important assumption: clients know the desired recursive resolver address. Why is this important? Typically, the IP address of a DNS recursive resolver is provided via Dynamic Host Configuration Protocol (DHCP). When a client joins a network, it uses DHCP to learn information about the network, including the default DNS recursive resolver. However, DHCP is a famously unauthenticated protocol, which means that any active attacker on the network can spoof the information, as shown below.

Unauthenticated DHCP discovery

One obvious attacker vector would be for the attacker to redirect DNS traffic from the network’s desired recursive resolver to an attacker-controlled recursive resolver. This has important implications on the threat model for discovery.

First, there is currently no known mechanism for encrypted DNS discovery in the presence of an active attacker that can influence the client’s view of the recursive resolver’s address. In other words, to make any meaningful improvement, DNS discovery assumes the client’s view of the DNS recursive resolver address is correct (and obtained through some secure mechanism). A second implication is that the attacker can simply block any attempt of client discovery, preventing upgrade to encrypted transports. This seems true of any interactive discovery mechanism. As a result, DNS discovery must relax this attacker’s capabilities somewhat: rather than add, drop, or modify packets, the attacker can only add or modify packets.

Altogether, this threat model lets us sharpen the DNS discovery problem statement: given the IP address of a DNS recursive resolver, how can one securely discover parameters necessary for speaking to the same resolver using an encrypted transport in the presence of an active attacker that can add or modify packets? It should be infeasible, for example, for the attacker to redirect the client from the resolver that it knows at the outset to one the attacker controls.

So how does this work, exactly?

DDR mechanics

DDR depends on two mechanisms:

Certificate-based authentication of encrypted DNS resolvers.
SVCB records for encoding and communicating DNS parameters.

Certificates allow resolvers to prove authority for IP addresses. For example, if you view the certificate for one.one.one.one, you’ll see several IP addresses listed under the SubjectAlternativeName extension, including 1.1.1.1.

SubjectAltName list of the one.one.one.one certificate

SVCB records are extensible key-value stores that can be used for conveying information about services to clients. Example information includes the supported application protocols, including HTTP/3, as well as keying material like that used for TLS Encrypted Client Hello.

How does DDR combine these two to solve the discovery problem above? In three simple steps:

Clients query the expected DNS resolver for its designations and their parameters with a special-purpose SVCB record.
Clients open a secure connection to the designated resolver, for example, one.one.one.one, authenticating the resolver against the one.one.one.one name.
Clients check that the designated resolver is additionally authenticated for the IP address of the origin resolver. That is, the certificate for one.one.one.one, the designated resolver, must include the IP address 1.1.1.1, the original designator resolver.

If this validation completes, clients can then use the secure connection to the designated resolver. In pictures, this is as follows:

DDR discovery process

This demonstrates that the encrypted DNS resolver is authoritative for the client’s original DNS resolver. Or, in other words, that the original resolver and the encrypted resolver are effectively “the same.” An encrypted resolver that does not include the originally requested resolver IP address on its certificate would fail the validation, and clients are not expected to follow the designated upgrade path. This entire process is referred to as “Verified Discovery” in the DDR specification.

Experimental deployment and next steps

To enable more encrypted DNS on the Internet and help the standardization process, 1.1.1.1 now has experimental support for DDR. You can query it directly to find out:

This command sends a SVCB query (type64) for the reserved name _dns.resolver.arpa to 1.1.1.1. The output lists the contents of this record, including the DoH and DoT designation parameters. Let’s walk through the contents of this record:

This says that the DoH target one.one.one.one is accessible over port 443 (port=”443”) using either HTTP/2 or HTTP/3 (alpn=”h2,h3”), and the DoH path (key7) for queries is “/dns-query{?dns}”.

Moving forward

DDR is a simple mechanism that lets clients automatically upgrade to encrypted transport protocols for DNS queries without any manual configuration. At the end of the day, users running compatible clients will enjoy a more private Internet experience. Happily, both Microsoft and Apple recently announced experimental support for this emerging standard, and we’re pleased to help them and other clients test support.Going forward, we hope to help add support for DDR to open source DNS resolver software such as dnscrypt-proxy and Bind. If you’re interested in helping us continue to drive adoption of encrypted DNS and related protocols to help build a better Internet, we’re hiring!

HPKE: Standardizing public-key encryption (finally!)

Christopher Wood — Thu, 24 Feb 2022 23:12:36 GMT

For the last three years, the Crypto Forum Research Group of the Internet Research Task Force (IRTF) has been working on specifying the next generation of (hybrid) public-key encryption (PKE) for Internet protocols and applications. The result is Hybrid Public Key Encryption (HPKE), published today as RFC 9180.

HPKE was made to be simple, reusable, and future-proof by building upon knowledge from prior PKE schemes and software implementations. It is already in use in a large assortment of emerging Internet standards, including TLS Encrypted Client Hello and Oblivious DNS-over-HTTPS, and has a large assortment of interoperable implementations, including one in CIRCL. This article provides an overview of this new standard, going back to discuss its motivation, design goals, and development process.

A primer on public-key encryption

Public-key cryptography is decades old, with its roots going back to the seminal work of Diffie and Hellman in 1976, entitled “New Directions in Cryptography.” Their proposal – today called Diffie-Hellman key exchange – was a breakthrough. It allowed one to transform small secrets into big secrets for cryptographic applications and protocols. For example, one can bootstrap a secure channel for exchanging messages with confidentiality and integrity using a key exchange protocol.

Unauthenticated Diffie-Hellman key exchange

In this example, Sender and Receiver exchange freshly generated public keys with each other, and then combine their own secret key with their peer’s public key. Algebraically, this yields the same value $g^{xy} = (g^x)^y = (g^y)^x$. Both parties can then use this as a shared secret for performing other tasks, such as encrypting messages to and from one another.

The Transport Layer Security (TLS) protocol is one such application of this concept. Shortly after Diffie-Hellman was unveiled to the world, RSA came into the fold. The RSA cryptosystem is another public key algorithm that has been used to build digital signature schemes, PKE algorithms, and key transport protocols. A key transport protocol is similar to a key exchange algorithm in that the sender, Alice, generates a random symmetric key and then encrypts it under the receiver’s public key. Upon successful decryption, both parties then share this secret key. (This fundamental technique, known as static RSA, was used pervasively in the context of TLS. See this post for details about this old technique in TLS 1.2 and prior versions.)

At a high level, PKE between a sender and receiver is a protocol for encrypting messages under the receiver’s public key. One way to do this is via a so-called non-interactive key exchange protocol.

To illustrate how this might work, let $g^y$ be the receiver’s public key, and let $m$ be a message that one wants to send to this receiver. The flow looks like this:

The sender generates a fresh private and public key pair, $(x, g^x)$.
The sender computes $g^{xy} = (g^y)^x$, which can be done without involvement from the receiver, that is, non-interactively.
The sender then uses this shared secret to derive an encryption key, and uses this key to encrypt m.
The sender packages up $g^x$ and the encryption of $m$, and sends both to the receiver.

The general paradigm here is called "hybrid public-key encryption" because it combines a non-interactive key exchange based on public-key cryptography for establishing a shared secret, and a symmetric encryption scheme for the actual encryption. To decrypt $m$, the receiver computes the same shared secret $g^{xy} = (g^x)^y$, derives the same encryption key, and then decrypts the ciphertext.

Conceptually, PKE of this form is quite simple. General designs of this form date back for many years and include the Diffie-Hellman Integrated Encryption System (DHIES) and ElGamal encryption. However, despite this apparent simplicity, there are numerous subtle design decisions one has to make in designing this type of protocol, including:

What type of key exchange protocol should be used for computing the shared secret? Should this protocol be based on modern elliptic curve groups like Curve25519? Should it support future post-quantum algorithms?
How should encryption keys be derived? Are there other keys that should be derived? How should additional application information be included in the encryption key derivation, if at all?
What type of encryption algorithm should be used? What types of messages should be encrypted?
How should sender and receiver encode and exchange public keys?

These and other questions are important for a protocol, since they are required for interoperability. That is, senders and receivers should be able to communicate without having to use the same source code.

There have been a number of efforts in the past to standardize PKE, most of which focus on elliptic curve cryptography. Some examples of past standards include: ANSI X9.63 (ECIES), IEEE 1363a, ISO/IEC 18033-2, and SECG SEC 1.

Timeline of related standards and software

A paper by Martinez et al. provides a thorough and technical comparison of these different standards. The key points are that all these existing schemes have shortcomings. They either rely on outdated or not-commonly-used primitives such as RIPEMD and CMAC-AES, lack accommodations for moving to modern primitives (e.g., AEAD algorithms), lack proofs of IND-CCA2 security, or, importantly, fail to provide test vectors and interoperable implementations.

The lack of a single standard for public-key encryption has led to inconsistent and often non-interoperable support across libraries. In particular, hybrid PKE implementation support is fractured across the community, ranging from the hugely popular and simple-to-use NaCl box and libsodium box seal implementations based on modern algorithm variants like X-SalsaPoly1305 for authenticated encryption, to BouncyCastle implementations based on “classical” algorithms like AES and elliptic curves.

Despite the lack of a single standard, this hasn’t stopped the adoption of ECIES instantiations for widespread and critical applications. For example, the Apple and Google Exposure Notification Privacy-preserving Analytics (ENPA) platform uses ECIES for public-key encryption.

When designing protocols and applications that need a simple, reusable, and agile abstraction for public-key encryption, existing standards are not fit for purpose. That’s where HPKE comes into play.

Construction and design goals

HPKE is a public-key encryption construction that is designed from the outset to be simple, reusable, and future-proof. It lets a sender encrypt arbitrary-length messages under a receiver’s public key, as shown below. You can try this out in the browser at Franziskus Kiefer’s blog post on HPKE!

HPKE overview

HPKE is built in stages. It starts with a Key Encapsulation Mechanism (KEM), which is similar to the key transport protocol described earlier and, in fact, can be constructed from the Diffie-Hellman key agreement protocol. A KEM has two algorithms: Encapsulation and Decapsulation, or Encap and Decap for short. The Encap algorithm creates a symmetric secret and wraps it for a public key such that only the holder of the corresponding private key can unwrap it. An attacker knowing this encapsulated key cannot recover even a single bit of the shared secret. Decap takes the encapsulated key and the private key associated with the public key, and computes the original shared secret. From this shared secret, HPKE computes a series of derived keys that are then used to encrypt and authenticate plaintext messages between sender and receiver.

This simple construction was driven by several high-level design goals and principles. We will discuss these goals and how they were met below.

Algorithm agility

Different applications, protocols, and deployments have different constraints, and locking any single use case into a specific (set of) algorithm(s) would be overly restrictive. For example, some applications may wish to use post-quantum algorithms when available, whereas others may wish to use different authenticated encryption algorithms for symmetric-key encryption. To accomplish this goal, HPKE is designed as a composition of a Key Encapsulation Mechanism (KEM), Key Derivation Function (KDF), and Authenticated Encryption Algorithm (AEAD). Any combination of the three algorithms yields a valid instantiation of HPKE, subject to certain security constraints about the choice of algorithm.

One important point worth noting here is that HPKE is not a protocol, and therefore does nothing to ensure that sender and receiver agree on the HPKE ciphersuite or shared context information. Applications and protocols that use HPKE are responsible for choosing or negotiating a specific HPKE ciphersuite that fits their purpose. This allows applications to be opinionated about their choice of algorithms to simplify implementation and analysis, as is common with protocols like WireGuard, or be flexible enough to support choice and agility, as is the approach taken with TLS.

Authentication modes

At a high level, public-key encryption ensures that only the holder of the private key can decrypt messages encrypted for the corresponding public key (being able to decrypt the message is an implicit authentication of the receiver.) However, there are other ways in which applications may wish to authenticate messages from sender to receiver. For example, if both parties have a pre-shared key, they may wish to ensure that both can demonstrate possession of this pre-shared key as well. It may also be desirable for senders to demonstrate knowledge of their own private key in order for recipients to decrypt the message (this functionally is similar to signing an encryption, but has some subtle and important differences).

To support these various use cases, HPKE admits different modes of authentication, allowing various combinations of pre-shared key and sender private key authentication. The additional private key contributes to the shared secret between the sender and receiver, and the pre-shared key contributes to the derivation of the application data encryption secrets. This process is referred to as the “key schedule”, and a simplified version of it is shown below.

Simplified HPKE key schedule

These modes come at a price, however: not all KEM algorithms will work with all authentication modes. For example, for most post-quantum KEM algorithms there isn’t a private key authentication variant known.

Reusability

The core of HPKE’s construction is its key schedule. It allows secrets produced and shared with KEMs and pre-shared keys to be mixed together to produce additional shared secrets between sender and receiver for performing authenticated encryption and decryption. HPKE allows applications to build on this key schedule without using the corresponding AEAD functionality, for example, by exporting a shared application-specific secret. Using HPKE in an “export-only” fashion allows applications to use other, non-standard AEAD algorithms for encryption, should that be desired. It also allows applications to use a KEM different from those specified in the standard, as is done in the proposed TLS AuthKEM draft.

Interface simplicity

HPKE hides the complexity of message encryption from callers. Encrypting a message with additional authenticated data from sender to receiver for their public key is as simple as the following two calls:

In fact, many implementations are likely to offer a simplified “single-shot” interface that does context creation and message encryption with one function call.

Notice that this interface does not expose anything like nonce ("number used once") or sequence numbers to the callers. The HPKE context manages nonce and sequence numbers internally, which means the application is responsible for message ordering and delivery. This was an important design decision done to hedge against key and nonce reuse, which can be catastrophic for security.

Consider what would be necessary if HPKE delegated nonce management to the application. The sending application using HPKE would need to communicate the nonce along with each ciphertext value for the receiver to successfully decrypt the message. If this nonce was ever reused, then security of the AEAD may fall apart. Thus, a sending application would necessarily need some way to ensure that nonces were never reused. Moreover, by sending the nonce to the receiver, the application is effectively implementing a message sequencer. The application could just as easily implement and use this sequencer to ensure in-order message delivery and processing. Thus, at the end of the day, exposing the nonce seemed both harmful and, ultimately, redundant.

Wire format

Another hallmark of HPKE is that all messages that do not contain application data are fixed length. This means that serializing and deserializing HPKE messages is trivial and there is no room for application choice. In contrast, some implementations of hybrid PKE deferred choice of wire format details, such as whether to use elliptic curve point compression, to applications. HPKE handles this under the KEM abstraction.

Development process

HPKE is the result of a three-year development cycle between industry practitioners, protocol designers, and academic cryptographers. In particular, HPKE built upon prior art relating to public-key encryption, iterated on a design and specification in a tight specification, implementation, experimentation, and analysis loop, with an ultimate goal towards real world use.

HPKE development process

This process isn’t new. TLS 1.3 and QUIC famously demonstrated this as an effective way of producing high quality technical specifications that are maximally useful for their consumers.

One particular point worth highlighting in this process is the value of interoperability and analysis. From the very first draft, interop between multiple, independent implementations was a goal. And since then, every revision was carefully checked by multiple library maintainers for soundness and correctness. This helped catch a number of mistakes and improved overall clarity of the technical specification.

From a formal analysis perspective, HPKE brought novel work to the community. Unlike protocol design efforts like those around TLS and QUIC, HPKE was simpler, but still came with plenty of sharp edges. As a new cryptographic construction, analysis was needed to ensure that it was sound and, importantly, to understand its limits. This analysis led to a number of important contributions to the community, including a formal analysis of HPKE, new understanding of the limits of ChaChaPoly1305 in a multi-user security setting, as well as a new CFRG specification documenting limits for AEAD algorithms. For more information about the analysis effort that went into HPKE, check out this companion blog by Benjamin Lipp, an HPKE co-author.

HPKE’s future

While HPKE may be a new standard, it has already seen a tremendous amount of adoption in the industry. As mentioned earlier, it’s an essential part of the TLS Encrypted Client Hello and Oblivious DoH standards, both of which are deployed protocols on the Internet today. Looking ahead, it’s also been integrated as part of the emerging Oblivious HTTP, Message Layer Security, and Privacy Preserving Measurement standards. HPKE’s hallmark is its generic construction that lets it adapt to a wide variety of application requirements. If an application needs public-key encryption with a key-committing AEAD, one can simply instantiate HPKE using a key-committing AEAD.

Moreover, there exists a huge assortment of interoperable implementations built on popular cryptographic libraries, including OpenSSL, BoringSSL, NSS, and CIRCL. There are also formally verified implementations in hacspec and F*; check out this blog post for more details. The complete set of known implementations is tracked here. More implementations will undoubtedly follow in their footsteps.

HPKE is ready for prime time. I look forward to seeing how it simplifies protocol design and development in the future. Welcome, RFC 9180.

Privacy-Preserving Compromised Credential Checking

Luke Valenta — Thu, 14 Oct 2021 12:59:53 GMT

Today we’re announcing a public demo and an open-sourced Go implementation of a next-generation, privacy-preserving compromised credential checking protocol called MIGP (“Might I Get Pwned”, a nod to Troy Hunt’s “Have I Been Pwned”). Compromised credential checking services are used to alert users when their credentials might have been exposed in data breaches. Critically, the ‘privacy-preserving’ property of the MIGP protocol means that clients can check for leaked credentials without leaking any information to the service about the queried password, and only a small amount of information about the queried username. Thus, not only can the service inform you when one of your usernames and passwords may have become compromised, but it does so without exposing any unnecessary information, keeping credential checking from becoming a vulnerability itself. The ‘next-generation’ property comes from the fact that MIGP advances upon the current state of the art in credential checking services by allowing clients to not only check if their exact password is present in a data breach, but to check if similar passwords have been exposed as well.

For example, suppose your password last year was amazon20$, and you change your password each year (so your current password is amazon21$). If last year’s password got leaked, MIGP could tell you that your current password is weak and guessable as it is a simple variant of the leaked password.

The MIGP protocol was designed by researchers at Cornell Tech and the University of Wisconsin-Madison, and we encourage you to read the paper for more details. In this blog post, we provide motivation for why compromised credential checking is important for security hygiene, and how the MIGP protocol improves upon the current generation of credential checking services. We then describe our implementation and the deployment of MIGP within Cloudflare’s infrastructure.

Our MIGP demo and public API are not meant to replace existing credential checking services today, but rather demonstrate what is possible in the space. We aim to push the envelope in terms of privacy and are excited to employ some cutting-edge cryptographic primitives along the way.

The threat of data breaches

Data breaches are rampant. The regularity of news articles detailing how tens or hundreds of millions of customer records have been compromised have made us almost numb to the details. Perhaps we all hope to stay safe just by being a small fish in the middle of a very large school of similar fish that is being predated upon. But we can do better than just hope that our particular authentication credentials are safe. We can actually check those credentials against known databases of the very same compromised user information we learn about from the news.

Many of the security breaches we read about involve leaked databases containing user details. In the worst cases, user data entered during account registration on a particular website is made available (often offered for sale) after a data breach. Think of the addresses, password hints, credit card numbers, and other private details you have submitted via an online form. We rely on the care taken by the online services in question to protect those details. On top of this, consider that the same (or quite similar) usernames and passwords are commonly used on more than one site. Our information across all of those sites may be as vulnerable as the site with the weakest security practices. Attackers take advantage of this fact to actively compromise accounts and exploit users every day.

Credential stuffing is an attack in which malicious parties use leaked credentials from an account on one service to attempt to log in to a variety of other services. These attacks are effective because of the prevalence of reused credentials across services and domains. After all, who hasn’t at some point had a favorite password they used for everything? (Quick plug: please use a password manager like LastPass to generate unique and complex passwords for each service you use.)

Website operators have (or should have) a vested interest in making sure that users of their service are using secure and non-compromised credentials. Given the sophistication of techniques employed by malevolent actors, the standard requirement to “include uppercase, lowercase, digit, and special characters” really is not enough (and can be actively harmful according to NIST’s latest guidance). We need to offer better options to users that keep them safe and preserve the privacy of vulnerable information. Dealing with account compromise and recovery is an expensive process for all parties involved.

Users and organizations need a way to know if their credentials have been compromised, but how can they do it? One approach is to scour dark web forums for data breach torrent links, download and parse gigabytes or terabytes of archives to your laptop, and then search the dataset to see if their credentials have been exposed. This approach is not workable for the majority of Internet users and website operators, but fortunately there’s a better way — have someone with terabytes to spare do it for you!

Making compromise checking fast and easy

This is exactly what compromised credential checking services do: they aggregate breach datasets and make it possible for a client to determine whether a username and password are present in the breached data. Have I Been Pwned (HIBP), launched by Troy Hunt in 2013, was the first major public breach alerting site. It provides a service, Pwned Passwords, where users can efficiently check if their passwords have been compromised. The initial version of Pwned Passwords required users to send the full password hash to the service to check if it appears in a data breach. In a 2018 collaboration with Cloudflare, the service was upgraded to allow users to run range queries over the password dataset, leaking only the salted hash prefix rather than the entire hash. Cloudflare continues to support the HIBP project by providing CDN and security support for organizations to download the raw Pwned Password datasets.

The HIBP approach was replicated by Google Password Checkup (GPC) in 2019, with the primary difference that GPC alerts are based on username-password pairs instead of passwords alone, which limits the rate of false positives. Enzoic and Microsoft Password Monitor are two other similar services. This year, Cloudflare also released Exposed Credential Checks as part of our Web Application Firewall (WAF) to help inform opted-in website owners when login attempts to their sites use compromised credentials. In fact, we use MIGP on the backend for this service to ensure that plaintext credentials never leave the edge server on which they are being processed.

Most standalone credential checking services work by having a user submit a query containing their password's or username-password pair’s hash prefix. However, this leaks some information to the service, which could be problematic if the service turns out to be malicious or is compromised. In a collaboration with researchers at Cornell Tech published at CCS’19, we showed just how damaging this leaked information can be. Malevolent actors with access to the data shared with most credential checking services can drastically improve the effectiveness of password-guessing attacks. This left open the question: how can you do compromised credential checking without sharing (leaking!) vulnerable credentials to the service provider itself?

What does a privacy-preserving credential checking service look like?

In the aforementioned CCS'19 paper, we proposed an alternative system in which only the hash prefix of the username is exposed to the MIGP server (independent work out of Google and Stanford proposed a similar system). No information about the password leaves the user device, alleviating the risk of password-guessing attacks. These credential checking services help to preserve password secrecy, but still have a limitation: they can only alert users if the exact queried password appears in the breach.

The present evolution of this work, Might I Get Pwned (MIGP), proposes a next-generation similarity-aware compromised credential checking service that supports checking if a password similar to the one queried has been exposed in the data breach. This approach supports the detection of credential tweaking attacks, an advanced version of credential stuffing.

Credential tweaking takes advantage of the fact that many users, when forced to change their password, use simple variants of their original password. Rather than just attempting to log in using an exact leaked password, say ‘password123’, a credential tweaking attacker might also attempt to log in with easily-predictable variants of the password such as ‘password124’ and ‘password123!’.

There are two main mechanisms described in the MIGP paper to add password variant support: client-side generation and server-side precomputation. With client-side generation, the client simply applies a series of transform rules to the password to derive the set of variants (e.g., truncating the last letter or adding a ‘!’ at the end), and runs multiple queries to the MIGP service with each username and password variant pair. The second approach is server-side precomputation, where the server applies the transform rules to generate the password variants when encrypting the dataset, essentially treating the password variants as additional entries in the breach dataset. The MIGP paper describes tradeoffs between the two approaches and techniques for generating variants in more detail. Our demo service includes variant support via server-side precomputation.

Breach extraction attacks and countermeasures

One challenge for credential checking services are breach extraction attacks, in which an adversary attempts to learn username-password pairs that are present in the breach dataset (which might not be publicly available) so that they can attempt to use them in future credential stuffing or tweaking attacks. Similarity-aware credential checking services like MIGP can make these attacks more effective, since adversaries can potentially check for more breached credentials per API query. Fortunately, additional measures can be incorporated into the protocol to help counteract these attacks. For example, if it is problematic to leak the number of ciphertexts in a given bucket, dummy entries and padding can be employed, or an alternative length-hiding bucket format can be used. Slow hashing and API rate limiting are other common countermeasures that credential checking services can deploy to slow down breach extraction attacks. For instance, our demo service applies the memory-hard slow hash algorithm scrypt to credentials as part of the key derivation function to slow down these attacks.

Let’s now get into the nitty-gritty of how the MIGP protocol works. For readers not interested in the cryptographic details, feel free to skip to the demo below!

MIGP protocol

There are two parties involved in the MIGP protocol: the client and the server. The server has access to a dataset of plaintext breach entries (username-password pairs), and a secret key used for both the precomputation and the online portions of the protocol. In brief, the client performs some computation over the username and password and sends the result to the server; the server then returns a response that allows the client to determine if their password (or a similar password) is present in the breach dataset.

Full protocol description from the MIGP paper: clients learn if their credentials are in the breach dataset, leaking only the hash prefix of the queried username to the server

Precomputation

At a high level, the MIGP server partitions the breach dataset into buckets based on the hash prefix of the username (the bucket identifier), which is usually 16-20 bits in length.

During the precomputation phase of the MIGP protocol, the server derives password variants, encrypts entries, and stores them in buckets based on the hash prefix of the username

We use server-side precomputation as the variant generation mechanism in our implementation. The server derives one ciphertext for each exact username-password pair in the dataset, and an additional ciphertext per password variant. A bucket consists of the set ciphertexts for all breach entries and variants with the same username hash prefix. For instance, suppose there are n breach entries assigned to a particular bucket. If we compute m variants per entry, counting the original entry as one of the variants, there will be n*m ciphertexts stored in the bucket. This introduces a large expansion in the size of the processed dataset, so in practice it is necessary to limit the number of variants computed per entry. Our demo server stores 10 ciphertexts per breach entry in the input: the exact entry, eight variants (see Appendix A of the MIGP paper), and a special variant for allowing username-only checks.

Each ciphertext is the encryption of a username-password (or password variant) pair along with some associated metadata. The metadata describes whether the entry corresponds to an exact password appearing in the breach, or a variant of a breached password. The server derives a per-entry secret key pad using a key derivation function (KDF) with the username-password pair and server secret as inputs, and uses XOR encryption to derive the entry ciphertext. The bucket format also supports storing optional encrypted metadata, such as the date the breach was discovered.

The precomputation phase only needs to be done rarely, such as when the MIGP parameters are changed (in which case the entire dataset must be re-processed), or when new breach datasets are added (in which case the new data can be appended to the existing buckets).

Online phase

During the online phase of the MIGP protocol, the client requests a bucket of encrypted breach entries corresponding to the queried username, and with the server’s help derives a key that allows it to decrypt an entry corresponding to the queried credentials

The online phase of the MIGP protocol allows a client to check if a username-password pair (or variant) appears in the server’s breach dataset, while only leaking the hash prefix of the username to the server. The client and server engage in an OPRF protocol message exchange to allow the client to derive the per-entry decryption key, without leaking the username and password to the server, or the server’s secret key to the client. The client then computes the bucket identifier from the queried username and downloads the corresponding bucket of entries from the server. Using the decryption key derived in the previous step, the client scans through the entries in the bucket attempting to decrypt each one. If the decryption succeeds, this signals to the client that their queried credentials (or a variant thereof) are in the server’s dataset. The decrypted metadata flag indicates whether the entry corresponds to the exact password or a password variant.

The MIGP protocol solves many of the shortcomings of existing credential checking services with its solution that avoids leaking any information about the client’s queried password to the server, while also providing a mechanism for checking for similar password compromise. Read on to see the protocol in action!

MIGP demo

As the state of the art in attack methodologies evolve with new techniques such as credential tweaking, so must the defenses. To that end, we’ve collaborated with the designers of the MIGP protocol to prototype and deploy the MIGP protocol within Cloudflare’s infrastructure.

Our MIGP demo server is deployed at migp.cloudflare.com, and runs entirely on top of Cloudflare Workers. We use Workers KV for efficient storage and retrieval of buckets of encrypted breach entries, capping out each bucket size at the current KV value limit of 25MB. In our instantiation, we set the username hash prefix length to 20 bits, so that there are a total of 2^20 (or just over 1 million) buckets.

There are currently two ways to interact with the demo MIGP service: via the browser client at migp.cloudflare.com, or via the Go client included in our open-sourced MIGP library. As shown in the screenshots below, the browser client displays the request from your device and the response from the MIGP service. You should take caution to not input any sensitive credentials in a third-party service (feel free to use the test credentials username1@example.com and password1 for the demo).

Keep in mind that “absence of evidence is not evidence of absence”, especially in the context of data breaches. We intend to periodically update the breach datasets used by the service as new public breaches become available, but no breach alerting service will be able to provide 100% accuracy in assuring that your credentials are safe.

See the MIGP demo in action in the attached screenshots. Note that in all cases, the username (username1@example.com) and corresponding username prefix hash (000f90f4) remain the same, so the client retrieves the exact same bucket contents from the server each time. However, the blindElement parameter in the client request differs per request, allowing the client to decrypt different bucket elements depending on the queried credentials.

Example query in which the credentials are exposed in the breach dataset

Example query in which similar credentials were exposed in the breach dataset

Example query in which the username is present in the breach dataset

Example query in which the credentials are not found in the dataset

Open-sourced MIGP library

We are open-sourcing our implementation of the MIGP library under the BSD-3 License. The code is written in Go and is available at https://github.com/cloudflare/migp-go. Under the hood, we use Cloudflare’s CIRCL library for OPRF support and Go’s supplementary cryptography library for scrypt support. Check out the repository for instructions on setting up the MIGP client to connect to Cloudflare’s demo MIGP service. Community contributions and feedback are welcome!

Future directions

In this post, we announced our open-sourced implementation and demo deployment of MIGP, a next-generation breach alerting service. Our deployment is intended to lead the way for other credential compromise checking services to migrate to a more privacy-friendly model, but is not itself currently meant for production use. However, we identify several concrete steps that can be taken to improve our service in the future:

Add more breach datasets to the database of precomputed entries
Increase the number of variants in server-side precomputation
Add library support in more programming languages to reach a broader developer base
Hide the number of ciphertexts per bucket by padding with dummy entries
Add support for efficient client-side variant checking by batching API calls to the server

For exciting future research directions that we are investigating — including one proposal to remove the transmission of plaintext passwords from client to server entirely — take a look at https://blog.cloudflare.com/research-directions-in-password-security.

We are excited to share and build upon these ideas with the wider Internet community, and hope that our efforts impact positive change in the password security ecosystem. We are particularly interested in collaborating with stakeholders in the space to develop, test, and deploy next-generation protocols to improve user security and privacy. You can reach us with questions, comments, and research ideas at ask-research@cloudflare.com. For those interested in joining our team, please visit our Careers Page.

Cloudflare and the IETF

Jonathan Hoyland — Wed, 13 Oct 2021 12:59:37 GMT

The Internet, far from being just a series of tubes, is a huge, incredibly complex, decentralized system. Every action and interaction in the system is enabled by a complicated mass of protocols woven together to accomplish their task, each handing off to the next like trapeze artists high above a virtual circus ring. Stop to think about details, and it is a marvel.

Consider one of the simplest tasks enabled by the Internet: Sending a message from sender to receiver.

The location (address) of a receiver is discovered using DNS, a connection between sender and receiver is established using a transport protocol like TCP, and (hopefully!) secured with a protocol like TLS. The sender's message is encoded in a format that the receiver can recognize and parse, like HTTP, because the two disparate parties need a common language to communicate. Then, ultimately, the message is sent and carried in an IP datagram that is forwarded from sender to receiver based on routes established with BGP.

Even an explanation this dense is laughably oversimplified. For example, the four protocols listed are just the start, and ignore many others with acronyms of their own. The truth is that things are complicated. And because things are complicated, how these protocols and systems interact and influence the user experience on the Internet is complicated. Extra round trips to establish a secure connection increase the amount of time before useful work is done, harming user performance. The use of unauthenticated or unencrypted protocols reveals potentially sensitive information to the network or, worse, to malicious entities, which harms user security and privacy. And finally, consolidation and centralization — seemingly a prerequisite for reducing costs and protecting against attacks — makes it challenging to provide high availability even for essential services. (What happens when that one system goes down or is otherwise unavailable, or to extend our earlier metaphor, when a trapeze isn’t there to catch?)

These four properties — performance, security, privacy, and availability — are crucial to the Internet. At Cloudflare, and especially in the Cloudflare Research team, where we use all these various protocols, we're committed to improving them at every layer in the stack. We work on problems as diverse as helping network security and privacy with TLS 1.3 and QUIC, improving DNS privacy via Oblivious DNS-over-HTTPS, reducing end-user CAPTCHA annoyances with Privacy Pass and Cryptographic Attestation of Personhood (CAP), performing Internet-wide measurements to understand how things work in the real world, and much, much more.

Above all else, these projects are meant to do one thing: focus beyond the horizon to help build a better Internet. We do that by developing, advocating, and advancing open standards for the many protocols in use on the Internet, all backed by implementation, experimentation, and analysis.

Standards

The Internet is a network of interconnected autonomous networks. Computers attached to these networks have to be able to route messages to each other. However, even if we can send messages back and forth across the Internet, much like the storied Tower of Babel, to achieve anything those computers have to use a common language, a lingua franca, so to speak. And for the Internet, standards are that common language.

Many of the parts of the Internet that Cloudflare is interested in are standardized by the IETF, which is a standards development organization responsible for producing technical specifications for the Internet's most important protocols, including IP, BGP, DNS, TCP, TLS, QUIC, HTTP, and so on. The IETF's mission is:

> to make the Internet work better by producing high-quality, relevant technical documents that influence the way people design, use, and manage the Internet.

Our individual contributions to the IETF help further this mission, especially given our role on the Internet. We can only do so much on our own to improve the end-user experience. So, through standards, we engage with those who use, manage, and operate the Internet to achieve three simple goals that lead to a better Internet:

1. Incrementally improve existing and deployed protocols with innovative solutions;

2. Provide holistic solutions to long-standing architectural problems and enable new use cases; and

3. Identify key problems and help specify reusable, extensible, easy-to-implement abstractions for solving them.

Below, we’ll give an example of how we helped achieve each goal, touching on a number of important technical specifications produced in recent years, including DNS-over-HTTPS, QUIC, and (the still work-in-progress) TLS Encrypted Client Hello.

Incremental innovation: metadata privacy with DoH and ECH

The Internet is not only complicated — it is leaky. Metadata seeps like toxic waste from nearly every protocol in use, from DNS to TLS, and even to HTTP at the application layer.

One critically important piece of metadata that still leaks today is the name of the server that clients connect to. When a client opens a connection to a server, it reveals the name and identity of that server in many places, including DNS, TLS, and even sometimes at the IP layer (if the destination IP address is unique to that server). Linking client identity (IP address) to target server names enables third parties to build a profile of per-user behavior without end-user consent. The result is a set of protocols that does not respect end-user privacy.

Fortunately, it’s possible to incrementally address this problem without regressing security. For years, Cloudflare has been working with the standards community to plug all of these individual leaks through separate specialized protocols:

DNS-over-HTTPS encrypts DNS queries between clients and recursive resolvers, ensuring only clients and trusted recursive resolvers see plaintext DNS traffic.
TLS Encrypted Client Hello encrypts metadata in the TLS handshake, ensuring only the client and authoritative TLS server see sensitive TLS information.

These protocols impose a barrier between the client and server and everyone else. However, neither of them prevent the server from building per-user profiles. Servers can track users via one critically important piece of information: the client IP address. Fortunately, for the overwhelming majority of cases, the IP address is not essential for providing a service. For example, DNS recursive resolvers do not need the full client IP address to provide accurate answers, as is evidenced by the EDNS(0) Client Subnet extension. To further reduce information exposure on the web, we helped push further with two more incremental improvements:

Oblivious DNS-over-HTTPS (ODoH) uses cryptography and network proxies to break linkability between client identity (IP address) and DNS traffic, ensuring that recursive resolvers have only the minimal amount of information to provide DNS answers -- the queries themselves, without any per-client information.
MASQUE is standardizing techniques for proxying UDP and IP protocols over QUIC connections, similar to the existing HTTP CONNECT method for TCP-based protocols. Generally, the CONNECT method allows clients to use services without revealing any client identity (IP address).

While each of these protocols may seem only an incremental improvement over what we have today, together, they raise many possibilities for the future of the Internet. Are DoH and ECH sufficient for end-user privacy, or are technologies like ODoH and MASQUE necessary? How do proxy technologies like MASQUE complement or even subsume protocols like ODoH and ECH? These are questions the Cloudflare Research team strives to answer through experimentation, analysis, and deployment together with other stakeholders on the Internet through the IETF. And we could not ask the questions without first laying the groundwork.

Architectural advancement: QUIC and HTTP/3

QUIC and HTTP/3 are transformative technologies. Whilst the TLS handshake forms the heart of QUIC’s security model, QUIC is an improvement beyond TLS over TCP, in many respects, including more encryption (privacy), better protection against active attacks and ossification at the network layer, fewer round trips to establish a secure connection, and generally better security properties. QUIC and HTTP/3 give us a clean slate for future innovation.

Perhaps one of QUIC’s most important contributions is that it challenges and even breaks many established conventions and norms used on the Internet. For example, the antiquated socket API for networking, which treats the network connection as an in-order bit pipe is no longer appropriate for modern applications and developers. Modern networking APIs such as Apple’s Network.framework provide high-level interfaces that take advantage of the new transport features provided by QUIC. Applications using this or even higher-level HTTP abstractions can take advantage of the many security, privacy, and performance improvements of QUIC and HTTP/3 today with minimal code changes, and without being constrained by sockets and their inherent limitations.

Another salient feature of QUIC is its wire format. Nearly every bit of every QUIC packet is encrypted and authenticated between sender and receiver. And within a QUIC packet, individual frames can be rearranged, repackaged, and otherwise transformed by the sender.

Together, these are powerful tools to help mitigate future network ossification and enable continued extensibility. (TLS’s wire format ultimately led to the middlebox compatibility mode for TLS 1.3 due to the many middlebox ossification problems that were encountered during early deployment tests.)

Exercising these features of QUIC is important for the long-term health of the protocol and applications built on top of it. Indeed, this sort of extensibility is what enables innovation.

In fact, we've already seen a flurry of new work based on QUIC: extensions to enable multipath QUIC, different congestion control approaches, and ways to carry data unreliably in the DATAGRAM frame.

Beyond functional extensions, we’ve also seen a number of new use cases emerge as a result of QUIC. DNS-over-QUIC is an upcoming proposal that complements DNS-over-TLS for recursive to authoritative DNS query protection. As mentioned above, MASQUE is a working group focused on standardizing methods for proxying arbitrary UDP and IP protocols over QUIC connections, enabling a number of fascinating solutions and unlocking the future of proxy and VPN technologies. In the context of the web, the WebTransport working group is standardizing methods to use QUIC as a “supercharged WebSocket” for transporting data efficiently between client and server while also depending on the WebPKI for security.

By definition, these extensions are nowhere near complete. The future of the Internet with QUIC is sure to be a fascinating adventure.

Specifying abstractions: Cryptographic algorithms and protocol design

Standards allow us to build abstractions. An ideal standard is one that is usable in many contexts and contains all the information a sufficiently skilled engineer needs to build a compliant implementation that successfully interoperates with other independent implementations. Writing a new standard is sort of like creating a new Lego brick. Creating a new Lego brick allows us to build things that we couldn’t have built before. For example, one new “brick” that’s nearly finished (as of this writing) is Hybrid Public Key Encryption (HPKE). HPKE allows us to efficiently encrypt arbitrary plaintexts under the recipient’s public key.

Mixing asymmetric and symmetric cryptography for efficiency is a common technique that has been used for many years in all sorts of protocols, from TLS to PGP. However, each of these applications has come up with their own design, each with its own security properties. HPKE is intended to be a single, standard, interoperable version of this technique that turns this complex and technical corner of protocol design into an easy-to-use black box. The standard has undergone extensive analysis by cryptographers throughout its development and has numerous implementations available. The end result is a simple abstraction that protocol designers can include without having to consider how it works under-the-hood. In fact, HPKE is already a dependency for a number of other draft protocols in the IETF, such as TLS Encrypted Client Hello, Oblivious DNS-over-HTTPS, and Message Layer Security.

Modes of Interaction

We engage with the IETF in the specification, implementation, experimentation, and analysis phases of a standard to help achieve our three goals of incremental innovation, architectural advancement, and production of simple abstractions.

Our participation in the standards process hits all four phases. Individuals in Cloudflare bring a diversity of knowledge and domain expertise to each phase, especially in the production of technical specifications. Today, we also published a blog post about an upcoming standard that we’ve been working on for a number of years and will be sharing details about how we used formal analysis to make sure that we ruled out as many security issues in the design as possible. We work in close collaboration with people from all around the world as an investment in the future of the Internet. Open standards mean that everyone can take advantage of the latest and greatest in protocol design, whether they use Cloudflare or not.

Cloudflare’s scale and perspective on the Internet are essential to the standards process. We have experience rapidly implementing, deploying, and experimenting with emerging technologies to gain confidence in their maturity. We also have a proven track record of publishing the results of these experiments to help inform the standards process. Moreover, we open source as much of the code we use for these experiments as possible to enable reproducibility and transparency. Our unique collection of engineering expertise and wide perspective allows us to help build standards that work in a wide variety of use cases. By investing time in developing standards that everyone can benefit from, we can make a clear contribution to building a better Internet.

One final contribution we make to the IETF is more procedural and based around building consensus in the community. A challenge to any open process is gathering consensus to make forward progress and avoiding deadlock. We help build consensus through the production of running code, leadership on technical documents such as QUIC and ECH, and even logistically by chairing working groups. (Working groups at the IETF are chaired by volunteers, and Cloudflare numbers a few working group chairs amongst its employees, covering a broad spectrum of the IETF (and its related research-oriented group, the IRTF) from security and privacy to transport and applications.) Collaboration is a cornerstone of the standards process and a hallmark of Cloudflare Research, and we apply it most prominently in the standards process.

If you too want to help build a better Internet, check out some IETF Working Groups and mailing lists. All you need to start contributing is an Internet connection and an email address, so why not give it a go? And if you want to join us on our mission to help build a better Internet through open and interoperable standards, check out our open positions, visiting researcher program, and many internship opportunities!

Handshake Encryption: Endgame (an ECH update)

Christopher Wood — Tue, 12 Oct 2021 12:59:22 GMT

Privacy and security are fundamental to Cloudflare, and we believe in and champion the use of cryptography to help provide these fundamentals for customers, end-users, and the Internet at large. In the past, we helped specify, implement, and ship TLS 1.3, the latest version of the transport security protocol underlying the web, to all of our users. TLS 1.3 vastly improved upon prior versions of the protocol with respect to security, privacy, and performance: simpler cryptographic algorithms, more handshake encryption, and fewer round trips are just a few of the many great features of this protocol.

TLS 1.3 was a tremendous improvement over TLS 1.2, but there is still room for improvement. Sensitive metadata relating to application or user intent is still visible in plaintext on the wire. In particular, all client parameters, including the name of the target server the client is connecting to, are visible in plaintext. For obvious reasons, this is problematic from a privacy perspective: Even if your application traffic to crypto.cloudflare.com is encrypted, the fact you’re visiting crypto.cloudflare.com can be quite revealing.

And so, in collaboration with other participants in the standardization community and members of industry, we embarked towards a solution for encrypting all sensitive TLS metadata in transit. The result: TLS Encrypted ClientHello (ECH), an extension to protect this sensitive metadata during connection establishment.

Last year, we described the current status of this standard and its relation to the TLS 1.3 standardization effort, as well as ECH's predecessor, Encrypted SNI (ESNI). The protocol has come a long way since then, but when will we know when it's ready? There are many ways by which one can measure a protocol. Is it implementable? Is it easy to enable? Does it seamlessly integrate with existing protocols or applications? In order to assess these questions and see if the Internet is ready for ECH, the community needs deployment experience. Hence, for the past year, we’ve been focused on making the protocol stable, interoperable, and, ultimately, deployable. And today, we’re pleased to announce that we’ve begun our initial deployment of TLS ECH.

What does ECH mean for connection security and privacy on the network? How does it relate to similar technologies and concepts such as domain fronting? In this post, we’ll dig into ECH details and describe what this protocol does to move the needle to help build a better Internet.

Connection privacy

For most Internet users, connections are made to perform some type of task, such as loading a web page, sending a message to a friend, purchasing some items online, or accessing bank account information. Each of these connections reveals some limited information about user behavior. For example, a connection to a messaging platform reveals that one might be trying to send or receive a message. Similarly, a connection to a bank or financial institution reveals when the user typically makes financial transactions. Individually, this metadata might seem harmless. But consider what happens when it accumulates: does the set of websites you visit on a regular basis uniquely identify you as a user? The safe answer is: yes.

This type of metadata is privacy-sensitive, and ultimately something that should only be known by two entities: the user who initiates the connection, and the service which accepts the connection. However, the reality today is that this metadata is known to more than those two entities.

Making this information private is no easy feat. The nature or intent of a connection, i.e., the name of the service such as crypto.cloudflare.com, is revealed in multiple places during the course of connection establishment: during DNS resolution, wherein clients map service names to IP addresses; and during connection establishment, wherein clients indicate the service name to the target server. (Note: there are other small leaks, though DNS and TLS are the primary problems on the Internet today.)

As is common in recent years, the solution to this problem is encryption. DNS-over-HTTPS (DoH) is a protocol for encrypting DNS queries and responses to hide this information from onpath observers. Encrypted Client Hello (ECH) is the complementary protocol for TLS.

The TLS handshake begins when the client sends a ClientHello message to the server over a TCP connection (or, in the context of QUIC, over UDP) with relevant parameters, including those that are sensitive. The server responds with a ServerHello, encrypted parameters, and all that’s needed to finish the handshake.

The goal of ECH is as simple as its name suggests: to encrypt the ClientHello so that privacy-sensitive parameters, such as the service name, are unintelligible to anyone listening on the network. The client encrypts this message using a public key it learns by making a DNS query for a special record known as the HTTPS resource record. This record advertises the server's various TLS and HTTPS capabilities, including ECH support. The server decrypts the encrypted ClientHello using the corresponding secret key.

Conceptually, DoH and ECH are somewhat similar. With DoH, clients establish an encrypted connection (HTTPS) to a DNS recursive resolver such as 1.1.1.1 and, within that connection, perform DNS transactions.

With ECH, clients establish an encrypted connection to a TLS-terminating server such as crypto.cloudflare.com, and within that connection, request resources for an authorized domain such as cloudflareresearch.com.

There is one very important difference between DoH and ECH that is worth highlighting. Whereas a DoH recursive resolver is specifically designed to allow queries for any domain, a TLS server is configured to allow connections for a select set of authorized domains. Typically, the set of authorized domains for a TLS server are those which appear on its certificate, as these constitute the set of names for which the server is authorized to terminate a connection.

Basically, this means the DNS resolver is open, whereas the ECH client-facing server is closed. And this closed set of authorized domains is informally referred to as the anonymity set. (This will become important later on in this post.) Moreover, the anonymity set is assumed to be public information. Anyone can query DNS to discover what domains map to the same client-facing server.

Why is this distinction important? It means that one cannot use ECH for the purposes of connecting to an authorized domain and then interacting with a different domain, a practice commonly referred to as domain fronting. When a client connects to a server using an authorized domain but then tries to interact with a different domain within that connection, e.g., by sending HTTP requests for an origin that does not match the domain of the connection, the request will fail.

From a high level, encrypting names in DNS and TLS may seem like a simple feat. However, as we’ll show, ECH demands a different look at security and an updated threat model.

A changing threat model and design confidence

The typical threat model for TLS is known as the Dolev-Yao model, in which an active network attacker can read, write, and delete packets from the network. This attacker’s goal is to derive the shared session key. There has been a tremendous amount of research analyzing the security of TLS to gain confidence that the protocol achieves this goal.

The threat model for ECH is somewhat stronger than considered in previous work. Not only should it be hard to derive the session key, it should also be hard for the attacker to determine the identity of the server from a known anonymity set. That is, ideally, it should have no more advantage in identifying the server than if it simply guessed from the set of servers in the anonymity set. And recall that the attacker is free to read, write, and modify any packet as part of the TLS connection. This means, for example, that an attacker can replay a ClientHello and observe the server’s response. It can also extract pieces of the ClientHello — including the ECH extension — and use them in its own modified ClientHello.

The design of ECH ensures that this sort of attack is virtually impossible by ensuring the server certificate can only be decrypted by either the client or client-facing server.

Something else an attacker might try is masquerade as the server and actively interfere with the client to observe its behavior. If the client reacted differently based on whether the server-provided certificate was correct, this would allow the attacker to test whether a given connection using ECH was for a particular name.

ECH also defends against this attack by ensuring that an attacker without access to the private ECH key material cannot actively inject anything into the connection.

The attacker can also be entirely passive and try to infer encrypted information from other visible metadata, such as packet sizes and timing. (Indeed, traffic analysis is an open problem for ECH and in general for TLS and related protocols.) Passive attackers simply sit and listen to TLS connections, and use what they see and, importantly, what they know to make determinations about the connection contents. For example, if a passive attacker knows that the name of the client-facing server is crypto.cloudflare.com, and it sees a ClientHello with ECH to crypto.cloudflare.com, it can conclude, with reasonable certainty, that the connection is to some domain in the anonymity set of crypto.cloudflare.com.

The number of potential attack vectors is astonishing, and something that the TLS working group has tripped over in prior iterations of the ECH design. Before any sort of real world deployment and experiment, we needed confidence in the design of this protocol. To that end, we are working closely with external researchers on a formal analysis of the ECH design which captures the following security goals:

Use of ECH does not weaken the security properties of TLS without ECH.
TLS connection establishment to a host in the client-facing server’s anonymity set is indistinguishable from a connection to any other host in that anonymity set.

We’ll write more about the model and analysis when they’re ready. Stay tuned!

There are plenty of other subtle security properties we desire for ECH, and some of these drill right into the most important question for a privacy-enhancing technology: Is this deployable?

Focusing on deployability

With confidence in the security and privacy properties of the protocol, we then turned our attention towards deployability. In the past, significant protocol changes to fundamental Internet protocols such as TCP or TLS have been complicated by some form of benign interference. Network software, like any software, is prone to bugs, and sometimes these bugs manifest in ways that we only detect when there’s a change elsewhere in the protocol. For example, TLS 1.3 unveiled middlebox ossification bugs that ultimately led to the middlebox compatibility mode for TLS 1.3.

While itself just an extension, the risk of ECH exposing (or introducing!) similar bugs is real. To combat this problem, ECH supports a variant of GREASE whose goal is to ensure that all ECH-capable clients produce syntactically equivalent ClientHello messages. In particular, if a client supports ECH but does not have the corresponding ECH configuration, it uses GREASE. Otherwise, it produces a ClientHello with real ECH support. In both cases, the syntax of the ClientHello messages is equivalent.

This hopefully avoids network bugs that would otherwise trigger upon real or fake ECH. Or, in other words, it helps ensure that all ECH-capable client connections are treated similarly in the presence of benign network bugs or otherwise passive attackers. Interestingly, active attackers can easily distinguish -- with some probability -- between real or fake ECH. Using GREASE, the ClientHello carries an ECH extension, though its contents are effectively randomized, whereas a real ClientHello using ECH has information that will match what is contained in DNS. This means an active attacker can simply compare the ClientHello against what’s in the DNS. Indeed, anyone can query DNS and use it to determine if a ClientHello is real or fake:

Despite this obvious distinguisher, the end result isn’t that interesting. If a server is capable of ECH and a client is capable of ECH, then the connection most likely used ECH, and whether clients and servers are capable of ECH is assumed public information. Thus, GREASE is primarily intended to ease deployment against benign network bugs and otherwise passive attackers.

Note, importantly, that GREASE (or fake) ECH ClientHello messages are semantically different from real ECH ClientHello messages. This presents a real problem for networks such as enterprise settings or school environments that otherwise use plaintext TLS information for the purposes of implementing various features like filtering or parental controls. (Encrypted DNS protocols like DoH also encountered similar obstacles in their deployment.) Fundamentally, this problem reduces to the following: How can networks securely disable features like DoH and ECH? Fortunately, there are a number of approaches that might work, with the more promising one centered around DNS discovery. In particular, if clients could securely discover encrypted recursive resolvers that can perform filtering in lieu of it being done at the TLS layer, then TLS-layer filtering might be wholly unnecessary. (Other approaches, such as the use of canary domains to give networks an opportunity to signal that certain features are not permitted, may work, though it’s not clear if these could or would be abused to disable ECH.)

We are eager to collaborate with browser vendors, network operators, and other stakeholders to find a feasible deployment model that works well for users without ultimately stifling connection privacy for everyone else.

Next steps

ECH is rolling out for some FREE zones on our network in select geographic regions. We will continue to expand the set of zones and regions that support ECH slowly, monitoring for failures in the process. Ultimately, the goal is to work with the rest of the TLS working group and IETF towards updating the specification based on this experiment in hopes of making it safe, secure, usable, and, ultimately, deployable for the Internet.

ECH is one part of the connection privacy story. Like a leaky boat, it’s important to look for and plug all the gaps before taking on lots of passengers! Cloudflare Research is committed to these narrow technical problems and their long-term solutions. Stay tuned for more updates on this and related protocols.