The Cloudflare Blog

Connection coalescing with ORIGIN Frames: fewer DNS queries, fewer connections

Suleman Ahmad — Mon, 04 Sep 2023 13:00:51 GMT

This blog reports and summarizes the contents of a Cloudflare research paper which appeared at the ACM Internet Measurement Conference, that measures and prototypes connection coalescing with ORIGIN Frames.

Some readers might be surprised to hear that a single visit to a web page can cause a browser to make tens, sometimes even hundreds, of web connections. Take this very blog as an example. If it is your first visit to the Cloudflare blog, or it has been a while since your last visit, your browser will make multiple connections to render the page. The browser will make DNS queries to find IP addresses corresponding to blog.cloudflare.com and then subsequent requests to retrieve any necessary subresources on the web page needed to successfully render the complete page. How many? Looking below, at the time of writing, there are 32 different hostnames used to load the Cloudflare Blog. That means 32 DNS queries and at least 32 TCP (or QUIC) connections, unless the client is able to reuse (or coalesce) some of those connections.

Each new web connection not only introduces additional load on a server's processing capabilities – potentially leading to scalability challenges during peak usage hours – but also exposes client metadata to the network, such as the plaintext hostnames being accessed by an individual. Such meta information can potentially reveal a user’s online activities and browsing behaviors to on-path network adversaries and eavesdroppers!

In this blog we’re going to take a closer look at “connection coalescing”. Since our initial look at IP-based coalescing in 2021, we have done further large-scale measurements and modeling across the Internet, to understand and predict if and where coalescing would work best. Since IP coalescing is difficult to manage at large scale, last year we implemented and experimented with a promising standard called the HTTP/2 ORIGIN Frame extension that we leveraged to coalesce connections to our edge without worrying about managing IP addresses.

All told, there are opportunities being missed by many large providers. We hope that this blog (and our publication at ACM IMC 2022 with full details) offers a first step that helps servers and clients take advantage of the ORIGIN Frame standard.

Setting the stage

At a high level, as a user navigates the web, the browser renders web pages by retrieving dependent subresources to construct the complete web page. This process bears a striking resemblance to the way physical products are assembled in a factory. In this sense, a modern web page can be considered like an assembly plant. It relies on a ‘supply chain’ of resources that are needed to produce the final product.

An assembly plant in the physical world can place a single order for different parts and get a single shipment from the supplier (similar to the kitting process for maximizing value and minimizing response time); no matter the manufacturer of those parts or where they are made -- one ‘connection’ to the supplier is all that is needed. Any single truck from a supplier to an assembly plant can be filled with parts from multiple manufacturers.

The design of the web causes browsers to typically do the opposite in nature. To retrieve the images, JavaScript, and other resources on a web page (the parts), web clients (assembly plants) have to make at least one connection to every hostname (the manufacturers) defined in the HTML that is returned by the server (the supplier). It makes no difference if the connections to those hostnames go to the same server or not, for example they could go to a reverse proxy like Cloudflare. For each manufacturer a ‘new’ truck would be needed to transfer the materials to the assembly plant from the same supplier, or more formally, a new connection would need to be made to request a subresource from a hostname on the same web page.

Without connection coalescing

The number of connections used to load a web page can be surprisingly high. It is also common for the subresources to need yet other sub-subresources, and so new connections emerge as a result of earlier ones. Remember, too, that HTTP connections to hostnames are often preceded by DNS queries! Connection coalescing allows us to use fewer connections_, or ‘reuse’ the same set of trucks to carry parts from multiple manufacturers from a single supplier._

With connection coalescing

Connection coalescing in principle

Connection coalescing was introduced in HTTP/2, and carried over into HTTP/3. We’ve blogged about connection coalescing previously (for a detailed primer we encourage going over that blog). While the idea is simple, implementing it can present a number of engineering challenges. For example, recall from above that there are 32 hostnames (at the time of writing) to load the web page you are reading right now. Among the 32 hostnames are 16 unique domains (defined as “Effective TLD+1”). Can we create fewer connections or ‘coalesce’ existing connections for each unique domain? The answer is ‘Yes, but it depends’.

The exact number of connections to load the blog page is not at all obvious, and hard to know. There may be 32 hostnames attached to 16 domains but, counter-intuitively, this does not mean the answer to “how many unique connections?” is 16. The true answer could be as few as one connection if all the hostnames are reachable at a single server; or as many as 32 independent connections if a different and distinct server is needed to access each individual hostname.

Connection reuse comes in many forms, so it’s important to define “connection coalescing” in the HTTP space. For example, the reuse of an existing TCP or TLS connection to a hostname to make multiple requests for subresources from that same hostname is connection reuse, but not coalescing.

Coalescing occurs when an existing TLS channel for some hostname can be repurposed or used for connecting to a different hostname. For example, upon visiting blog.cloudflare.com, the HTML points to subresources at cdnjs.cloudflare.com. To reuse the same TLS connection for the subresources, it is necessary for both hostnames to appear together in the TLS certificate's “Server Alternative Name (SAN)” list, but this step alone is not sufficient to convince browsers to coalesce. After all, the cdnjs.cloudflare.com service may or may not be reachable at the same server as blog.cloudflare.com, despite being on the same certificate. So how can the browser know? Coalescing only works if servers set up the right conditions, but clients have to decide whether to coalesce or not – thus, browsers require a signal to coalesce beyond the SANs list on the certificate. Revisiting our analogy, the assembly plant may order a part from a manufacturer directly, not knowing that the supplier already has the same part in its warehouse.

There are two explicit signals a browser can use to decide whether connections can be coalesced: one is IP-based, the other ORIGIN Frame-based. The former requires the server operators to tightly bind DNS records to the HTTP resources available on the server. This is difficult to manage and deploy, and actually creates a risky dependency, because you have to place all the resources behind a specific set or a single IP address. The way IP addresses influence coalescing decisions varies among browsers, with some choosing to be more conservative and others more permissive. Alternatively, the HTTP ORIGIN Frame is an easier signal for the servers to orchestrate; it’s also flexible and has graceful failure with no interruption to service (for a specification compliant implementation).

A foundational difference between both these coalescing signals is: IP-based coalescing signals are implicit, even accidental, and force clients to infer coalescing possibilities that may exist, or not. None of this is surprising since IP addresses are designed to have no real relationship with names! In contrast, ORIGIN Frame is an explicit signal from servers to clients that coalescing is available no matter what DNS says for any particular hostname.

We have experimented with IP-based coalescing previously; for the purpose of this blog we will take a deeper look at ORIGIN Frame-based coalescing.

What is the ORIGIN Frame standard?

The ORIGIN Frame is an extension to the HTTP/2 and HTTP/3 specification, a special Frame sent on stream 0 or the control stream of the connection respectively. The Frame allows the servers to send an ‘origin-set’ to the clients on an existing established TLS connection, which includes hostnames that it is authorized for and will not incur any HTTP 421 errors. Hostnames in the origin-set MUST also appear in the certificate SAN list for the server, even if those hostnames are announced on different IP addresses via DNS.

Specifically, two different steps are required:

Web servers must send a list enumerating the Origin Set (the hostnames that a given connection might be used for) in the ORIGIN Frame extension.
The TLS certificate returned by the web server must cover the additional hostnames being returned in the ORIGIN Frame in the DNS names SAN entries.

At a high-level ORIGIN Frames are a supplement to the TLS certificate that operators can attach to say, “Psst! Hey, client, here are the names in the SANs that are available on this connection -- you can coalesce!” Since the ORIGIN Frame is not part of the certificate itself, its contents can be made to change independently. No new certificate is required. There is also no dependency on IP addresses. For a coalesceable hostname, existing TCP/QUIC+TLS connections can be reused without requiring new connections or DNS queries.

Many websites today rely on content which is served by CDNs, like Cloudflare CDN service. The practice of using external CDN services offers websites the advantages of speed, reliability, and reduces the load of content served by their origin servers. When both the website, and the resources are served by the same CDN, despite being different hostnames, owned by different entities, it opens up some very interesting opportunities for CDN operators to allow connections to be reused and coalesced since they can control both the certificate management and connection requests for sending ORIGIN frames on behalf of the real origin server.

Unfortunately, there has been no way to turn the possibilities enabled by ORIGIN Frame into practice. To the best of our knowledge, until today, there has been no server implementation that supports ORIGIN Frames. Among browsers, only Firefox supports ORIGIN Frames. Since IP coalescing is challenging and ORIGIN Frame has no deployed support, is the engineering time and energy to better support coalescing worth the investment? We decided to find out with a large-scale Internet-wide measurement to understand the opportunities and predict the possibilities, and then implemented the ORIGIN Frame to experiment on production traffic.

Experiment #1: What is the scale of required changes?

In February 2021, we collected data for 500K of the most popular websites on the Internet, using a modified Web Page Test on 100 virtual machines. An automated Chrome (v88) browser instance was launched for every visit to a web page to eliminate caching effects (because we wanted to understand coalescing, not caching). On successful completion of each session, Chrome developer tools were used to retrieve and write the page load data as an HTTP Archive format (HAR) file with a full timeline of events, as well as additional information about certificates and their validation. Additionally, we parsed the certificate chains for the root web page and new TLS connections triggered by subresource requests to (i) identify certificate issuers for the hostnames, (ii) inspect the presence of the Subject Alternative Name (SAN) extension, and (iii) validate that DNS names resolve to the IP address used. Further details about our methodology and results can be found in the technical paper.

The first step was to understand what resources are requested by web pages to successfully render the page contents, and where these resources were present on the Internet. Connection coalescing becomes possible when subresource domains are ideally co-located. We approximated the location of a domain by finding its corresponding autonomous system (AS). For example, the domain attached to cdnjs is reachable via AS 13335 in the BGP routing table, and that AS number belongs to Cloudflare. The figure below describes the percentage of web pages and the number of unique ASes needed to fully load a web page.

Around 14% of the web pages need two ASes to fully load i.e. pages that have a dependency on one additional AS for subresources. More than 50% of the web pages need to contact no more than six ASes to obtain all the necessary subresources. This finding as shown in the plot above implies that a relatively small number of operators serve the sub-resource content necessary for a majority (~50%) of the websites, and any usage of ORIGIN Frames would need only a few changes to have its intended impact. The potential for connection coalescing can therefore be optimistically approximated to the number of unique ASes needed to retrieve all subresources in a web page. In practice however, this may be superseded by operational factors such as SLAs or helped by flexible mappings between sockets, names, and IP addresses which we worked on previously at Cloudflare.

We then tried to understand the impact of coalescing on connection metrics. The measured and ideal number of DNS queries and TLS connections needed to load a web page are summarized by their CDFs in the figure below.

Through modeling and extensive analysis, we identify that connection coalescing through ORIGIN Frames could reduce the number of DNS and TLS connections made by browsers by over 60% at the median. We performed this modeling by identifying the number of times the clients requested DNS records, and combined them with the ideal ORIGIN Frames to serve.

Many multi-origin servers such as those operated by CDNs tend to reuse certificates and serve the same certificate with multiple DNS SAN entries. This allows the operators to manage fewer certificates through their creation and renewal cycles. While theoretically one can have millions of names in the certificate, creating such certificates is unreasonable and a challenge to manage effectively. By continuing to rely on existing certificates, our modeling measurements bring to light the volume of changes required to enable perfect coalescing, while presenting information about the scale of changes needed, as highlighted in the figure below.

We identify that over 60% of the certificates served by websites do not need any modifications and could benefit from ORIGIN Frames, while with no more than 10 additions to the DNS SAN names in certificates we’re able to successfully coalesce connections to over 92% of the websites in our measurement. The most effective changes could be made by CDN providers by adding three or four of their most popular requested hostnames into each certificate.

Experiment #2: ORIGIN Frames in action

In order to validate our modeling expectations, we then took a more active approach in early 2022. Our next experiment focused on 5,000 websites that make extensive use of cdnjs.cloudflare.com as a subresource. By modifying our experimental TLS termination endpoint we deployed HTTP/2 ORIGIN Frame support as defined in the RFC standard. This involved changing the internal fork of net and http dependency modules of Golang which we have open sourced (see here, and here).

During the experiments, connecting to a website in the experiment set would return cdnjs.cloudflare.com in the ORIGIN frame, while the control set returned an arbitrary (unused) hostname. All existing edge certificates for the 5000 websites were also modified. For the experimental group, the corresponding certificates were renewed with cdnjs.cloudflare.com added to the SAN. To ensure integrity between control and experimental sets, control group domains certificates were also renewed with a valid and identical size third party domain used by none of the control domains. This is done to ensure that the relative size changes to the certificates is kept constant avoiding potential biases due to different certificate sizes. Our results were striking!

Sampling 1% of the requests we received from Firefox to the websites in the experiment, we identified over 50% reduction in new TLS connections per second indicating a lesser number of cryptographic verification operations done by both the client and reduced server compute overheads. As expected there were no differences in the control set indicating the effectiveness of connection re-use as seen by the CDN or server operators.

Discussion and insights

While our modeling measurements indicated that we could anticipate some performance improvements, in practice it was not significantly better suggesting that ‘no-worse’ is the appropriate mental model regarding performance. The subtle interplay between resource object sizes, competing connections, and congestion control is subject to network conditions. Bottleneck-share capacity, for example, diminishes as fewer connections compete for bottleneck resources on network links. It would be interesting to revisit these measurements as more operators deploy support on their servers for ORIGIN Frames.

Apart from performance, one major benefit of ORIGIN frames is in terms of privacy. How? Well, each coalesced connection hides client metadata that is otherwise leaked from non-coalesced connections. Certain resources on a web page are loaded depending on how one is interacting with the website. This means for every new connection for retrieving some resource from the server, TLS plaintext metadata like SNI (in the absence of Encrypted Client Hello) and at least one plaintext DNS query, if transmitted over UDP or TCP on port 53, is exposed to the network. Coalescing connections helps remove the need for browsers to open new TLS connections, and the need to do extra DNS queries. This prevents metadata leakage from anyone listening on the network. ORIGIN Frames help minimize those signals from the network path, improving privacy by reducing the amount of cleartext information leaked on path to network eavesdroppers.

While the browsers benefit from reduced cryptographic computations needed to verify multiple certificates, a major advantage comes from the fact that it opens up very interesting future opportunities for resource scheduling at the endpoints (the browsers, and the origin servers) such as prioritization, or recent proposals like HTTP early hints to provide clients experiences where connections are not overloaded or competing for those resources. When coupled with CERTIFICATE Frames IETF draft, we can further eliminate the need for manual certificate modifications as a server can prove its authority of hostnames after connection establishment without any additional SAN entries on the website’s TLS certificate.

Conclusion and call to action

In summary, the current Internet ecosystem has a lot of opportunities for connection coalescing with only a few changes to certificates and their server infrastructure. Servers can significantly reduce the number of TLS handshakes by roughly 50%, while reducing the number of render blocking DNS queries by over 60%. Clients additionally reap these benefits in privacy by reducing cleartext DNS exposure to network on-lookers.

To help make this a reality we are currently planning to add support for both HTTP/2 and HTTP/3 ORIGIN Frames for our customers. We also encourage other operators that manage third party resources to adopt support of ORIGIN Frame to improve the Internet ecosystem.Our paper submission was accepted to the ACM Internet Measurement Conference 2022 and is available for download. If you’d like to work on projects like this, where you get to see the rubber meet the road for new standards, visit our careers page!

Stronger than a promise: proving Oblivious HTTP privacy properties

Christopher Wood — Thu, 27 Oct 2022 13:00:00 GMT

We recently announced Privacy Gateway, a fully managed, scalable, and performant Oblivious HTTP (OHTTP) relay. Conceptually, OHTTP is a simple protocol: end-to-end encrypted requests and responses are forwarded between client and server through a relay, decoupling who from what was sent. This is a common pattern, as evidenced by deployed technologies like Oblivious DoH and Apple Private Relay. Nevertheless, OHTTP is still new, and as a new protocol it’s imperative that we analyze the protocol carefully.

To that end, we conducted a formal, computer-aided security analysis to complement the ongoing standardization process and deployment of this protocol. In this post, we describe this analysis in more depth, digging deeper into the cryptographic details of the protocol and the model we developed to analyze it. If you’re already familiar with the OHTTP protocol, feel free to skip ahead to the analysis to dive right in. Otherwise, let’s first review what OHTTP sets out to achieve and how the protocol is designed to meet those goals.

Decoupling who from what was sent

OHTTP is a protocol that combines public key encryption with a proxy to separate the contents of an HTTP request (and response) from the sender of an HTTP request. In OHTTP, clients generate encrypted requests and send them to a relay, the relay forwards them to a gateway server, and then finally the gateway decrypts the message to handle the request. The relay only ever sees ciphertext and the client and gateway identities, and the gateway only ever sees the relay identity and plaintext.

In this way, OHTTP is a lightweight application-layer proxy protocol. This means that it proxies application messages rather than network-layer connections. This distinction is important, so let’s make sure we understand the differences. Proxying connections involves a whole other suite of protocols typically built on HTTP CONNECT. (Technologies like VPNs and WireGuard, including Cloudflare WARP, can also be used, but let’s focus on HTTP CONNECT for comparison.)

Connection-oriented proxy depiction

Since the entire TCP connection itself is proxied, connection-oriented proxies are compatible with any application that uses TCP. In effect, they are general purpose proxy protocols that support any type of application traffic. In contrast, proxying application messages is compatible with application use cases that require transferring entire objects (messages) between a client and server.

Message-oriented proxy depiction

Examples include DNS requests and responses, or, in the case of OHTTP, HTTP requests and responses. In other words, OHTTP is not a general purpose proxy protocol: it’s fit for purpose, aimed at transactional interactions between clients and servers (such as app-level APIs). As a result, it is much simpler in comparison.

Applications use OHTTP to ensure that requests are not linked to either of the following:

Client identifying information, including the IP address, TLS fingerprint, and so on. As a proxy protocol, this is a fundamental requirement.
Future requests from the same client. This is necessary for applications that do not carry state across requests.

These two properties make OHTTP a perfect fit for applications that wish to provide privacy to their users without compromising basic functionality. It’s served as the foundation for a widespread deployment of Oblivious DoH for over a year now, and as of recently, serves as the foundation for Flo Health Inc.’s Anonymous Mode feature.

It’s worth noting that both of these properties could be achieved with a connection-oriented protocol, but at the cost of a new end-to-end TLS connection for each message that clients wish to transmit. This can be prohibitively expensive for all entities that participate in the protocol.

So how exactly does OHTTP achieve these goals? Let’s dig deeper into OHTTP to find out.

Oblivious HTTP protocol design

A single transaction in OHTTP involves the following steps:

A client encapsulates an HTTP request using the public key of the gateway server, and sends it to the relay over a client<>relay HTTPS connection.
The relay forwards the request to the server over its own relay<>gateway HTTPS connection.
The gateway decapsulates the request, forwarding it to the target server which can produce the resource.
The gateway returns an encapsulated response to the relay, which then forwards the result to the client.

Observe that in this transaction the relay only ever sees the client and gateway identities (the client IP address and the gateway URL, respectively), but does not see any application data. Conversely, the gateway sees the application data and the relay IP address, but does not see the client IP address. Neither party has the full picture, and unless the relay and gateway collude, it stays that way.

The HTTP details for forwarding requests and responses in the transaction above are not technically interesting – a message is sent from sender to receiver over HTTPS using a POST – so we’ll skip over them. The fascinating bits are in the request and response encapsulation, which build upon HPKE, a recently ratified standard for hybrid public key encryption.

Let’s begin with request encapsulation, which is hybrid public key encryption. Clients first transform their HTTP request into a binary format, called Binary HTTP, as specified by RFC9292. Binary HTTP is, as the name suggests, a binary format for encoding HTTP messages. This representation lets clients encode HTTP requests to binary-encoded values and for the gateway to reverse this process, recovering an HTTP request from a binary-encoded value. Binary encoding is necessary because the public key encryption layer expects binary-encoded inputs.

Once the HTTP request is encoded in binary format, it is then fed into HPKE to produce an encrypted message, which clients then send to the relay to be forwarded to the gateway. The gateway decrypts this message, transforms the binary-encoded request back to its equivalent HTTP request, and then forwards it to the target server for processing.

Responses from the gateway are encapsulated back to the client in a very similar fashion. The gateway first encodes the response in an equivalent binary HTTP message, encrypts it using a symmetric key known only to the client and gateway, and then returns it to the relay to be forwarded to the client. The client decrypts and transforms this message to recover the result.

Simplified model and security goals

In our formal analysis, we set out to make sure that OHTTP’s use of encryption and proxying achieves the desired privacy goals described above.

To motivate the analysis, consider the following simplified model where there exists two clients C1 and C2, one relay R, and one gateway, G. OHTTP assumes an attacker that can observe all network activity and can adaptively compromise either R or G, but not C1 or C2. OHTTP assumes that R and G do not collude, and so we assume only one of R and G is compromised. Once compromised, the attacker has access to all session information and private key material for the compromised party. The attacker is prohibited from sending client-identifying information, such as IP addresses, to the gateway. (This would allow the attacker to trivially link a query to the corresponding client.)

In this model, both C1 and C2 send OHTTP requests Q1 and Q2, respectively, through R to G, and G provides answers A1 and A2. The attacker aims to link C1 to (Q1, A1) and C2 to (Q2, A2), respectively. The attacker succeeds if this linkability is possible without any additional interaction. OHTTP prevents such linkability. Informally, this means:

Requests and responses are known only to clients and gateways in possession of the corresponding response key and HPKE keying material.
The gateway cannot distinguish between two identical requests generated from the same client, and two identical requests generated from different clients, in the absence of unique per-client keys.

And informally it might seem clear that OHTTP achieves these properties. But we want to prove this formally, which means that the design, if implemented perfectly, would have these properties. This type of formal analysis is distinct from formal verification, where you take a protocol design and prove that some code implements it correctly. Whilst both are useful they are different processes, and in this blog post we’ll be talking about the former. But first, let’s give some background on formal analysis.

Formal analysis programming model

In our setting, a formal analysis involves producing an algebraic description of the protocol and then using math to prove that the algebraic description has the properties we want. The end result is proof that shows that our idealized algebraic version of the protocol is “secure”, i.e. has the desired properties, with respect to an attacker we want to defend against. In our case, we chose to model our idealized algebraic version of OHTTP using a tool called Tamarin, a security-focused theorem prover and model checker. Tamarin is an intimidating tool to use, but makes intuitive sense once you get familiar with it. We’ll break down the various parts of a Tamarin model in the context of our OHTTP model below.

Modeling the Protocol Behavior

Tamarin uses a technique known as multiset rewriting to describe protocols. A protocol description is formed of a series of “rules” that can “fire” when certain requirements are met. Each rule represents a discrete step in the protocol, and when a rule fires that means the step was taken. For example, we have a rule representing the gateway generating its long-term public encapsulation key, and for different parties in the protocol establishing secure TLS connections. These rules can be triggered pretty much any time as they have no requirements.

Basic rule for OHTTP gateway key generation

Tamarin represents these requirements as “facts”. A rule can be triggered when the right facts are available. Tamarin stores all the available facts in a “bag” or multiset. A multiset is similar to an ordinary set, in that it stores a collection of objects in an unordered fashion, but unlike an ordinary set, duplicate objects are allowed. This is the “multiset” part of “multiset rewriting”.

The rewriting part refers to the output of our rules. When a rule triggers it takes some available facts out of the bag and, when finished, inserts some new facts into the bag. These new facts might fulfill the requirements of some other rule, which can then be triggered, producing even more new facts, and so on¹. In this way we can represent progress through the protocol. Using input and output facts, we can describe our rule for generating long-term public encapsulation keys, which has no requirements and produces a long-term key as output, as follows.

A rule requirement is satisfied if there exist output facts that match the rule’s input facts. As an example, in OHTTP, one requirement for the client rule for generating a request is that the long-term public encapsulation key exists. This matching is shown below.

Let’s put some of these pieces together to show a very small but concrete part of OHTTP as an example: the client generating its encapsulated request and sending it to the relay. This step should produce a message for the relay, as well as any corresponding state needed to process the eventual response from the relay. As a precondition, the client requires (1) the gateway public key and (2) a TLS connection to the relay. And as mentioned earlier, generating the public key and TLS connection do not require any inputs, so they can be done at any time.

Modeling events in time

Beyond consuming and producing new facts, each Tamarin rule can also create side effects, called “action facts.” Tamarin records the action facts each time a rule is triggered. An action fact might be something like “a client message containing the contents m was sent at time t.” Sometimes rules can only be triggered in a strict sequence, and we can therefore put their action facts in a fixed time order. At other times multiple rules might have their prerequisites met at the same time, and therefore we can’t put their action facts into a strict time sequence. We can represent this pattern of partially ordered implications as a directed acyclic graph, or DAG for short.

Altogether, multiset rewriting rules describe the steps of a protocol, and the resulting DAG records the actions associated with the protocol description. We refer to the DAG of actions as the action graph. If we’ve done our job well it’s possible to follow these rules and produce every possible combination of messages or actions allowed by the protocol, and their corresponding action graph.

As an example of the action graph, let’s consider what happens when the client successfully finishes the protocol. When the requirements for this rule are satisfied, the rule triggers, marking that the client is done and that the response was valid. Since the protocol is done at this point, there are no output facts produced.

Action graph for terminal client response handler rule

Modeling the attacker

The action graph is core to reasoning about the protocol’s security properties. We can check a graph for various properties, e.g. “does the first action taken by the relay happen after the first action taken by the client?”. Our rules allow for multiple runs of the protocol to happen at the same time. This is very powerful. We can look at a graph and ask “did something bad happen here that might break the protocol’s security properties?”

In particular, we can prove (security and correctness) properties by querying this graph, or by asserting various properties about it. For example, we might say “for all runs of the protocol, if the client finishes the protocol and can decrypt the response from the gateway, then the response must have been generated and encrypted by an entity which has the corresponding shared secret.”

This is a useful statement, but it doesn’t say much about security. What happens if the gateway private key is compromised, for example? In order to prove security properties, we need to define our threat model, which includes the adversary and their capabilities. In Tamarin, we encode the threat model as part of the protocol model. For example, when we define messages being passed from the client to the relay, we can add a special rule that allows the attacker to read it as it goes past. This gives us the ability to describe properties such as “for all runs of the protocol in our language the attacker never learns the secret key.”

For security protocols, we typically give the attacker the ability to read, modify, drop, and replay any message. This is sometimes described as “the attacker controls the network”, or a Dolev-Yao attacker. However, the attacker can also sometimes compromise different entities in a protocol, learning state associated with that entity. This is sometimes called an extended Dolev-Yao attacker, and it is precisely the attacker we consider in our model.

Going back to our model, we give the attacker the ability to compromise long-term key pairs and TLS sessions as needed through different rules. These set various action facts that mark the fact that compromise took place.

Action graph for key compromise rule

Putting everything together, we have a way to model the protocol behavior, attacker capabilities, and security properties. Let’s now dive into how we applied these to prove OHTTP secure.

OHTTP Tamarin model

In our model, we give the attacker the ability to compromise the server’s long-term keys and the key between the client and the relay. Against this attacker, we aim to prove these two informal statements stated above:

Requests and responses are known only to clients and gateways in possession of the corresponding response key and HPKE keying material.
The gateway cannot distinguish between two requests generated from the same client, and two requests generated from different clients, in the absence of unique per-client keys.

To prove these formally, we express them somewhat differently. First, we assert that the protocol actually completes. This is an important step, because if your model has a bug in it where the protocol can’t even run as intended, then Tamarin is likely to say it’s “secure” because nothing bad ever happens.

For the core security properties, we translate the desired goals into questions we ask about the model. In this way, formal analysis only provides us proof (or disproof!) of the questions we ask, not the questions we should have asked, and so this translation relies on experience and expertise. We break down this translation for each of the questions we want to ask below, starting with gateway authentication.

Gateway authentication Unless the attacker has compromised the gateway’s long term keys, if the client completes the protocol and is able to decrypt the gateway’s response, then it knows that: the responder was the gateway it intended to use, the gateway derived the same keys, the gateway saw the request the client sent, and the response the client received is the one the gateway sent.

This tells us that the protocol actually worked, and that the messages sent and received were as they were supposed to be. One aspect of authentication can be that the participants agree on some data, so although this protocol seems to be a bit of a grab bag of properties they’re all part of one authentication property.

Next, we need to prove that the request and response remain secret. There are several ways in which secrecy may be violated, e.g., if encryption or decryption keys are compromised. We do so by proving the following properties.

Request and response secrecyThe request and response are both secret, i.e., the attacker never learns them, unless the attacker has compromised the gateway’s long term keys.

In a sense, request and response secrecy covers the case where the gateway is malicious, because if the gateway is malicious then the “attacker” knows the gateway’s long term keys.

Relay connection securityThe contents of the connection between the client and relay are secret unless the attacker has compromised the relay.

We don’t have to worry about the secrecy of the connection if the client is compromised because in that scenario the attacker knows the query before it’s even been sent, and can learn the response by making an honest query itself. If your client is compromised then it’s game over.

AEAD nonce reuse resistanceIf the gateway sends a message to the client, and the attacker finds a different message encrypted with the same key and nonce, then either the attacker has already compromised the gateway, or they already knew the query.

In translation, this property means that the response encryption is correct and not vulnerable to attack, such as through AEAD nonce reuse. This would obviously be a disaster for OHTTP, so we were careful to check that this situation never arises, especially as we’d already detected this issue in ODoH.

Finally, and perhaps most importantly, we want to prove that an attacker can’t link a particular query to a client. We prove a slightly different property which effectively argues that, unless the relay and gateway collude, then the attacker cannot link the encrypted query to its decrypted query together. In particular, we prove the following:

Client unlinkabilityIf an attacker knows the query and the contents of the connection sent to the relay (i.e. the encrypted query), then it must have compromised both the gateway and the relay.

This doesn’t in general prove indistinguishability. There are two techniques an attacker can use to link two queries. Direct inference and statistical analysis. Because of the anonymity trilemma we know that we cannot defend against statistical analysis, so we have to declare it out of scope and move on. To prevent direct inference we need to make sure that the attacker doesn't compromise either the client, or both the relay and the gateway together, which would let it directly link the queries. So is there anything we can protect against? Thankfully there is one thing. We can make sure that a malicious gateway can't identify that a single client sent two messages. We prove that by not keeping any state between connections. If a returning client acts in exactly the same way as a new client, and doesn't carry any state between requests, there's nothing for the malicious gateway to analyze.

And that’s it! If you want to have a go at proving some of these properties yourself our models and proofs are available on our GitHub, as are our ODoH models and proofs. The Tamarin prover is freely available too, so you can double-check all our work. Hopefully this post has given you a flavor of what we mean when we say that we’ve proven a protocol secure, and inspired you to have a go yourself. If you want to work on great projects like this check out our careers page.

¹Depending on the model, this process can lead to an exponential blow-up in search space, making it impossible to prove anything automatically. Moreover, if the new output facts do not fulfill the requirements of any remaining rule(s) then the process hangs.

Building Confidence in Cryptographic Protocols

Thom Wiggers — Thu, 24 Feb 2022 17:30:00 GMT

An introduction to formal analysis and our proof of the security of KEMTLS

Good morning everyone, and welcome to another Post-Quantum–themed blog post! Today we’re going to look at something a little different. Rather than look into the past or future quantum we’re going to look as far back as the ‘80s and ‘90s, to try and get some perspective on how we can determine whether a protocol is or is not secure. Unsurprisingly, this question comes up all the time. Cryptographers like to build fancy new cryptosystems, but just because we, the authors, can’t break our own designs, it doesn’t mean they are secure: it just means we are not smart enough to break them.

One might at this point wonder why in a post-quantum themed blog post we are talking about security proofs. The reason is simple: the new algorithms that claim to be safe against quantum threats need proofs showing that they actually are safe. In this blog post, not only are we going to introduce how we go about proving a protocol is secure, we’re going to introduce the security proofs of KEMTLS, a version of TLS designed to be more secure against quantum computers, and give you a whistle-stop tour of the formal analysis we did of it.

Let’s go back for the moment to not being smart enough to break a cryptosystem. Saying “I tried very hard to break this, and couldn’t” isn’t a very satisfying answer, and so for many years cryptographers (and others) have been trying to find a better one. There are some obvious approaches to building confidence in your cryptosystem, for example you could try all previously known attacks, and see if the system breaks. This approach will probably weed out any simple flaws, but it doesn’t mean that some new attack won’t be found or even that some new twist on an old one won’t be discovered.

Another approach you can take is to offer a large prize to anyone who can break your new system; but to do that not only do you need a big prize that you can afford to give away if you’re wrong, you can’t be sure that everyone would prefer your prize to, for example, selling an attack to cybercriminals, or even to a government.

Simply trying hard, and inducing other people to do so too still felt unsatisfactory, so in the late ‘80s researchers started trying to use mathematical techniques to prove that their protocol was secure. Now, if you aren’t versed in theoretical computer science you might not even have a clear idea of what it even means to “prove” a protocol is secure, let alone how you might go about it, so let’s start at the very beginning.

A proof

First things first: let’s nail down what we mean by a proof. At its most general level, a mathematical proof starts with some assumptions, and by making logical inferences it builds towards a statement. If you can derive your target statement from your initial assumptions then you can be sure that, if your assumptions are right, then your final statement is true.

Euclid’s famous work, The Elements, a standard math textbook for over 2,000 years, is written in this style. Euclid gives five “postulates”, or assumptions, from which he can derive a huge portion of the geometry known in his day. Euclid’s first postulate, that you can draw a straight line between any two points, is never proven, but taken as read. You can take his first postulate, and his third, that you can draw a circle with any center and radius, and use it to prove his first proposition, that you can draw an equilateral triangle given any finite line. For the curious, you can find public-domain translations of Euclid’s work.

Euclid’s method of drawing an equilateral triangle based on the finite line AB, by drawing two circles around points A and B, with the radius AB. The intersection finds point C of the triangle. Original raster file uploader was Mcgill at en.wikibooks SVG: Beao, Public domain, via Wikimedia Commons.

Whilst it’s fairly easy to intuit how such geometry proofs work, it’s not immediately clear how one could prove something as abstract as the security of a cryptographic protocol. Proofs of protocols operate in a similar way. We build a logical argument starting from a set of assumptions. Security proofs, however, can be much, much bigger than anything in The Elements (for example, our proof of the security properties of KEMTLS, which we will talk about later, is nearly 500,000 lines long) and the only reason we are able to do such complex proofs is that we have something of an advantage over Euclid. We have computers. Using a mix of human-guided theorem proving and automated algorithms we can prove incredibly complex things, such as the security of protocols as the one we will discuss.

Now we know that a proof is a set of logical steps built from a set of assumptions, let’s talk a bit about how security proofs work. First, we need to work out how to describe the protocol in terms that we can reason about. Over the years researchers have come up with many ways for describing computer processes mathematically, most famously Alan Turing defined a-machines, which we now know as Turing Machines, which describe a computer program in an algebraic form. A protocol is slightly more complex than a single program. A protocol can be seen as a number of computers running a set of computer programs that interact with each other.

We’re going to use a class of techniques called process algebras to describe the interacting processes of a protocol. At its most basic level, algebra is the art of generalizing a statement by replacing specific values with general symbols. In standard algebra, these specific values are numbers, so for example we can write (cos 37)² + (sin 37)² = 1, which is true, but we can generalize it to (cos θ)² + (sin θ)² = 1, replacing the specific value, 37, with the symbol θ.

Now you might be wondering why it’s useful to replace things with symbols. The answer is it lets us solve entire classes of problems instead of solving each individual instance. When it comes to security protocols, this is especially important. We can’t possibly try every possible set of inputs to a protocol and check nothing weird happens to one of them. In fact, one of the assumptions we’re going to make when we prove KEMTLS secure is that trying every possible value for some inputs is impossible¹. By representing the protocol symbolically, we can write a proof that applies to all possible inputs of the protocol.

Let’s go back to algebra. A process algebra is similar to the kind of algebra you might have learnt in high school: we represent a computer program with symbols for the specific values. We also treat functions symbolically. Rather than try and compute what happens when we apply a function f to a value x, we just create a new symbol f(x). An algebra also provides rules for manipulating expressions. For example, in standard algebra we can transform y + 5 = x² - x into y = x² - x - 5. A process algebra is the same: it not only defines a language to describe interacting processes, it also defines rules for how we can manipulate those expressions.

We can use tools, such as the one we use called Tamarin, to help us do this reasoning. Every protocol has its own rules for what transformations are allowed. It is very useful to have a tool, like Tamarin, to which we can tell these rules and allow it to do all the work of symbol manipulation. Tamarin does far, far more than that, though.

A rule, that we tell Tamarin, might look like this:

rule Register_pk:
  [ Fr(~ltkA) ]
--[ GenLtk($A, ~ltkA)]->
  [ !Ltk($A, ~ltkA), !Pk($A, pk(~ltkA)), Out(pk(~ltkA)) ]

This rule is used to represent that a protocol participant has acquired a new public/private key pair. The rule has three parts:

The first part lists the preconditions. In this case, there is only one: we take a Fresh value called ~ltkA, the “long-term key of A”. This precondition is always met, because Tamarin always allows us to generate fresh values.
The third part lists the postconditions (what we get back when we apply the rule). Rather than operating on an initial statement, as in high-school algebra, Tamarin instead operates on what we call a model of “bag of facts”. Instead of starting with y + 5 = x² - x, we start with an empty “bag”, and from there, apply rules. These rules take facts out of the bag and put new ones in. In this case, we put in:
- !Ltk($A, ~ltkA) which represents the private portion of the key, ~ltkA, and the name of the participant it was issued to, $A.
- !Pk($A, pk(~ltkA)), which represents the public portion of the key, pk(~ltkA), and the participant was issued to, $A.
- Out(pk(~ltkA)), which represents us publishing the public portion of the key, pk(~ltkA) to the network. Tamarin is based on the Dolev-Yao model, which assumes the attacker controls the network. Thus, this fact makes $A’s public key available to the attacker.

We can only apply a rule if the preconditions are met: the facts we need appear in the bag. By having rules for each step of the protocol, we can apply the rules in order and simulate a run of the protocol. But, as I’m sure you’ve noticed, we skipped the second part of the rule. The second part of the rule is where we list what we call actions.

We use actions to record what happened in a protocol run. In our example, we have the action GenLtk($A, ~ltkA). GenLtk means that a new Long-Term Key (LTK) has been Generated. Whenever we trigger the Register_pk rule, we note this with the two parameters:, $A, the party to whom the new key pair belongs; and ~ltkA, the private part of the generated key².

If we simulate a single run of the protocol, we can record a list of all the actions executed and put them in a list. However, at any point in the protocol, there may be multiple rules that can be triggered. A list only captures a single run of a protocol, but we want to reason about all possible runs of the protocol. We can arrange our rules into a tree: every time we have multiple rules that could be executed, we give each one of them its own branch.

If we could write this entire tree, it would represent every possible run of the protocol. Because every possible run appears in this tree, if we can show that there are no “bad” runs on this tree, we can be sure that the protocol is “secure”. We put “bad” and “secure” in quotation marks here because we still haven’t actually defined what those terms actually mean.

But before we get to that, let’s quickly recap what we have so far. We have:

A protocol we want to prove.
A definition of protocol, as a number of computers running a set of computer programs that interact with each other.
A technique, process algebras, to describe the interacting processes of the protocol: this technique provides us with symbols and rules for manipulating them.
A tree that represents every possible run of the protocol.

We can reason about a protocol by looking at properties that our tree gives. As we are interested in cryptographic protocols, we would like to reason about its security. “Security” is a pretty abstract concept and its meaning changes in different contexts. In our case, to prove something is secure, we first have to say what our security goals are. One thing we might want to prove is, for example, that an attacker can never learn the encryption key of a session. We capture this idea with a reachability lemma.

A reachability lemma asks whether there is a path in the tree that leads to a specific state: can we “reach” this state? In this case, we ask: “can we reach a state where the attacker knows the session key?” If the answer is “no”, we are sure that our protocol has that property (an attacker never learns the session key), or at least that that property is true in our protocol model.

So, if we want to prove the security of a cryptographic protocol, we need to:

Define the goals of the security that is being proven.
Describe the protocol as an interacting process of symbols, rules, and expressions.
Build a tree of all the steps the protocol can take.
Check that the trees of protocol runs attain the goals of security we specified.

This process of creating a model of a program and writing rigorous proofs about that model is called “formal analysis”.

Writing formal proofs of protocol correctness has been very effective at finding and fixing all kinds of issues. During the design of TLS 1.3, for example, it uncovered a number of serious security flaws that were eventually fixed prior to standardization. However, something we need to be wary of with formal analysis is being over-reliant on its results. It’s very possible to be so taken with the rigour of the process and of its mathematical proofs, that the result gets overinterpreted. Not only can a mistake exist in a proof, even in a machine-checked one, the proof may not actually prove what you think it does. There are many examples of this: Needham-Schroeder had a proof of security written in the BAN logic, before Lowe found an attack on a case that the BAN logic did not cover.

In fact, the initial specification of the TLS 1.3 proof made the assumption that nobody uses the same certificate for both a client and a server, even though this is not explicitly disallowed in the specification. This gap led to the “Selfie” vulnerability where a client could be tricked into connecting to itself, potentially leading to resource exhaustion attacks.

Formal analysis of protocol designs also tells you nothing about whether a particular implementation correctly implements a protocol. In other blog posts, we will talk about this. Let’s now return to our core topic: the formal analysis of KEMTLS.

Proving KEMTLS’s security

Now that we have the needed notions, let’s get down to the nitty-gritty: we show you how we proved KEMTLS is secure. KEMTLS is a proposal to do authentication in TLS handshakes using key exchange (via key encapsulation mechanisms or KEMs). KEMTLS examines the trade-offs between post-quantum signature schemes and post-quantum key exchange, as we discussed in other blog posts.

The main idea of KEMTLS is the following: instead of using a signature to prove that you have access to the private key that corresponds to the (signing) public key in the certificate presented, we derive a shared secret encapsulated to a (KEM) public key. The party that presented the certificate can only derive (decapsulate) it from the resulting encapsulation (often also called the ciphertext) if they have access to the correct private key; and only then can they read encrypted traffic. A brief overview of how this looks in the “traditional arrows on paper” form is given below.

Brief overview of the core idea of KEMTLS.

We want to show that the KEMTLS handshake is secure, no matter how an adversary might mess with, reorder, or even create new protocol messages. Symbolic analysis tools such as Tamarin or ProVerif are well suited to this task: as said, they allow us to consider every possible combination or manipulation of protocol messages, participants, and key information. We can then write lemmas about the behavior of the protocol.

Why prove it in Tamarin?

There exists a pen-and-paper proof of the KEMTLS handshake. You might ask: why should we still invest the effort of modeling it in a tool like Tamarin?

Pen-and-paper proofs are in theory fairly straightforward. However, they are very hard to get right. We need to carefully express the security properties of the protocol, and it is very easy to let assumptions lead you to write something that your model does not correctly cover. Verifying that a proof has been done correctly is also very difficult and requires almost as much careful attention as writing the proof itself. In fact, several mistakes in definitions of the properties of the model of the original KEMTLS proof were found, after the paper had been accepted and published at a top-tier security conference.

? For those familiar with these kinds of game-based proofs, another “war story”: while modeling the ephemeral key exchange, the authors of KEMTLS initially assumed all we needed was an IND-CPA secure KEM. After writing out all the simulations in pseudo code (which is not part of the proof or the paper otherwise!), it turned out that we needed an additional oracle to answer a single decapsulation query, resulting in requiring a weaker variant of IND-CCA security of our KEM (namely, IND-1CCA security). Using an “only” IND-CPA-secure KEM turned out to not be secure!

Part of the problem with pen-and-paper proofs is perhaps the human nature to read between the lines: we quickly figure out what is intended by a certain sentence, even if the intent is not strictly clarified. Computers do not allow that: to everyone’s familiar frustration whenever a computer has not done what you wanted, but just what you told it to do.

A benefit of computer code, though, is that all the effort is in writing the instructions. A carefully constructed model and proof result in an executable program: verifying should be as simple as being able to run it to the end. However, as always we need to:

Be very careful that we have modeled the right thing and,
Note that even the machine prover might have bugs: this second computer-assisted proof is a complement to, and not a replacement of, the pen-and-paper proof.

Another reason why computer proofs are interesting is because they give the ability to construct extensions. The “pre-distributed keys” extension of KEMTLS, for example, has been only proven on paper in isolation. Tamarin allows us to construct that extension in the same space as the main proof, which will help rule out any cross-functional attacks. With this, the complexity of the proof is increased exponentially, but Tamarin allows us to handle that just by using more computer power. Doing the same on paper requires very, very careful consideration.

One final reason we wanted to perform this computer analysis is because whilst the pen-and-paper proof was in the computational model, our computer analysis is in the symbolic model. Computational proofs attain “high resolution” proofs, giving very tight bounds on exactly how secure a protocol is. Symbolic models are “low resolution”: giving a binary yes/no answer on whether a protocol meets the security goals (with the assumption that the underlying cryptographic primitives are secure). This might sound like computational proofs are the best: their downside is that one has to simplify the model in other areas. The computational proof of KEMTLS, for example, does not model TLS message formats, which a symbolic model can.

Modeling KEMTLS in Tamarin

Before we can start making security claims and asking Tamarin to prove them, we first need to explain to Tamarin what KEMTLS is. As we mentioned earlier, Tamarin treats the world as a “bag of facts”. Keys, certificates, identities, and protocol messages are all facts. Tamarin can take those facts and apply rules to them to construct (or deconstruct) new facts. Executing steps in the KEMTLS protocol is, in a very literal sense, just another way to perform such transformations — and if everything is well-designed, the only “honest” way to reach the end state of the protocol.

We need to start by modeling the protocol. We were fortunate to be able to reuse the work of Cremers et al., who contributed their significant modeling talent to the TLS 1.3 standardization effort. They created a very complete model of the TLS 1.3 protocol, which showed that the protocol is generally secure. For more details, see their paper.

We modified the ephemeral key exchange by substituting the Diffie-Hellman operations in TLS 1.3 with the appropriate KEM operations. Similarly, we modified the messages that perform the certificate handling: instead of verifying a signature, we send back a KemEncapsulation message with the ciphertext. Let’s have a look at one of the changed rules. Don’t worry, it looks a bit scary, but we’re going to break it down for you. And also, don’t worry if you do not grasp all the details: we will cover the most necessary bits when they come up again, so you can also just skip to the next section “Modeling the adversary” instead.

rule client_recv_server_cert_emit_kex:
let
  // … snip
  ss = kemss($k, ~saseed)
  ciphertext = kemencaps($k, ss, certpk)
  // NOTE: the TLS model uses M4 preprocessor macros for notation
  // We also made some edits for clarity
in
  [
    State(C2d, tid, $C, $S, PrevClientState),
    In(senc{<'certificate', certpk>}hs_keys),
    !Pk($S, certpk),
    Fr(~saseed)
  ]
  --[
    C2d(tid),
    KemEncap($k, certpk, ss)
  ]->
  [
    State(C3, tid, $C, $S, ClientState),
    Out(senc{<'encaps', ciphertext>}hs_keyc)
  ]

This rule represents the client getting the server’s certificate and encapsulating a fresh key to it. It then sends the encapsulated key back to the server.

Note that the let … in part of the rule is used to assign expressions to variables. The real meat of the rule starts with the preconditions. As we can see, in this rule there are four preconditions that Tamarin needs to already have in its bag for this rule to be triggered:

The first precondition is State(C2d, …). This condition tells us that we have some client that has reached the stage C2d, which is what we call this stage in our representation. The remaining variables define the state of that client.
The second precondition is an In one. This is how Tamarin denotes messages received from the network. As we mentioned before, we assume that the network is controlled by the attacker. Until we can prove otherwise, we don’t know whether this message was created by the honest server, if it has been manipulated by the attacker, or even forged. The message contents, senc{<'certificate', certpk>}hs_keys, is symmetrically encrypted ( senc{}) under the server’s handshake key (we’ve slightly edited this message for clarity, and removed various other bits to keep this at least somewhat readable, but you can see the whole definition in our model).
The third precondition states the public key of the server, !Pk(S, certpk). This condition is preceded by a ! symbol, which means that it’s a permanent fact to be consumed many times. Usually, once a fact is removed from the bag, it is gone; but permanent facts remain. S is the name of the server, and certpk is the KEM public key.
The fourth precondition states the fresh random value, ~saseed.

The postconditions of this rule are a little simpler. We have:

State(C3, …), which represents that the client (which was at the start of the rule in state C2d) is now in state C3.
Out(senc{<'encaps', ciphertext>}hs_keyc), which represents the action of the client sending the encapsulated key to the network, encrypted under the client’s handshake key.

The four actions recorded in this rule are:

First, we record that the client with thread id tid reached the state C2d.
Second and third, we record that the client was running the protocol with various intermediate values. We use the phrase “running with” to indicate that although the client believes these values to be the correct, it can’t yet be certain that they haven’t been tampered with, so the client hasn’t committed yet to them.
Finally, we record the parameters we put into the KEM with the KemEncap action.

We modify and add such rules to the TLS 1.3 model, so we can run KEMTLS instead of TLS 1.3. For a sanity check, we need to make sure that the protocol can actually be executed: a protocol that can not run, can not leak your secrets. We use a reachability lemma to do that:

lemma exists_C2d:
    exists-trace
    "Ex tid #j. C2d(tid)@#j"

This was the first lemma that we asked Tamarin to prove. Because we’ve marked this lemma exists-trace, it does not need to hold in all traces, all runs of the protocol. It just needs one. This lemma asks if there exists a trace ( exists-trace), where there exists ( Ex ) a variable tid and a time #j (times are marked with #) at which action C2d(tid) is recorded. What this captures is that Tamarin could find a branch of the tree where the rule described above was triggered. Thus, we know that our model can be executed, at least as far as C2d.

Modeling the adversary

In the symbolic model, all cryptography is perfect: if the adversary does not have a particular key, they can not perform any deconstructions to, for example, decrypt a message or decapsulate a ciphertext. Although a proof with this default adversary would show the protocol to be secure against, for example, reordering or replaying messages, we want it to be secure against a slightly stronger adversary. Fortunately, we can model this adversary. Let’s see how.

Let’s take an example. We have a rule that honestly generates long-term keys (certificates) for participants:

rule Register_pk:
  [ Fr(~ltkA) ] 
  --[  GenLtk($A, ~ltkA)  ]->
  [ !Ltk($A, ~ltkA), 
    !Pk($A, kempk($k, ~ltkA)), 
    Out(kempk($k, ~ltkA))
  ]

This rule is very similar to the one we saw at the beginning of this blog post, but we’ve tweaked it to generate KEM public keys. It goes as follows: it generates a fresh value, and registers it as the actor $A’s long-term private key symbol !Ltk and $A’s public key symbol !Pk that we use to model our certificate infrastructure. It also sends ( Out ) the public key to the network such that the adversary has access to it.

The adversary can not deconstruct symbols like Ltk without rules to do so. Thus, we provide the adversary with a special Reveal query, that takes the !Ltk fact and reveals the private key:

rule Reveal_Ltk:
   [ !Ltk($A, ~ltkA) ] --[ RevLtk($A) ]-> [ Out(~ltkA) ]

Executing this rule registers the RevLtk($A) action, so that we know that $A’s certificate can no longer be trusted after RevLtk occurred.

Writing security lemmas

KEMTLS, like TLS, is a cryptographic handshake protocol. These protocols have the general goal to generate session keys that we can use to encrypt user’s traffic, preferably as quickly as possible. One thing we might want to prove is that these session keys are secret:

lemma secret_session_keys [/*snip*/]:
  "All tid actor peer kw kr aas #i.
      SessionKey(tid, actor, peer, , )@#i &
      not (Ex #r. RevLtk(peer)@#r & #r < #i) &
      not (Ex tid3 esk #r. RevEKemSk(tid3, peer, esk)@#r & #r < #i) &
      not (Ex tid4 esk #r. RevEKemSk(tid4, actor, esk)@#r & #r < #i)
    ==> not Ex #j. K(kr)@#j"

This lemma states that if the actor has completed the protocol and the attacker hasn’t used one of their special actions, then the attacker doesn’t know the actor’s read key, kr. We’ll go through the details of the lemma in a moment, but first let’s address some questions you might have about this proof statement.

The first question that might arise is: “If we are only secure in the case the attacker doesn’t use their special abilities then why bother modeling those abilities?” The answer has two parts:

We do not restrict the attacker from using their abilities: they can compromise every key except the ones used by the participants in this session. If they managed to somehow make a different participant use the same ephemeral key, then this lemma wouldn’t hold, and we would not be able to prove it.
We allow the attacker to compromise keys used in this session after the session has completed. This means that what we are proving is: an attacker who recorded this session in the past and now has access to the long-term keys (by using their shiny new quantum computer, for example) can’t decrypt what happened in the session. This property is also known as forward secrecy.

The second question you might ask is: “Why do we only care about the read key?”. We only care about the read key because this lemma is symmetric, it holds for all actors. When a client and server have established a TLS session, the client’s read key is the server’s write key and vice versa. Because this lemma applies symmetrically to the client and the server, we prove that the attacker doesn’t know either of those keys.

Let’s return now to the syntax of this lemma.

This first line of this lemma is a “For all” statement over seven variables, which means that we are trying to prove that no matter what values these seven variables hold, the rest of the statement is true. These variables are:

the thread id tid,
a protocol participant, actor,
the person they think they’re talking to, peer,
the final read and write keys, kr and kw respectively,
the actor’s authentication status, aas,
and a time #i.

The next line of the lemma is about the SessionKey action. We record the SessionKey action when the client or the server thinks they have completed the protocol.

The next lines are about two attacker abilities: RevLtk, as discussed earlier; and RevEKemSk, which the attacker can use to reveal ephemeral secrets. The K(x) action means that the attacker learns (or rather, Knows) x. We, then, assert that if there does not Exist a RevEKemSk or RevLtk action on one of the keys used in the session, then there also does not exist a time when K(kr) (when the attacker learns the read key). Quod erat demonstrandum. Let’s run the proofs now.

Proving lemmas in Tamarin

Tamarin offers two methods to prove security lemmas: it has an autoprover that can try to find the solution for you, or you can do it manually. Tamarin sometimes has a hard time figuring out what is important for proving a particular security property, and so manual effort is sometimes unavoidable.

The manual prover interface allows you to select what goal Tamarin should pursue step by step. A proof quickly splits into separate branches: in the picture, you see that Tamarin has already been able to prove the branches that are green, leaving us to make a choice for case 1.

Screenshot from the Tamarin user interface, showing a prompt for the next step in a proof. The branches of the proof that are marked green have already been proven.

Sometimes whilst working in the manual interface, you realize that there are certain subgoals that Tamarin is trying to prove while working on a bigger lemma. By writing what we call a helper lemma we can give Tamarin a shortcut of sorts. Rather than trying to solve all the subgoals of one big lemma, we can split the proof into more digestible chunks. Tamarin can then later reuse these helper lemmas when trying to prove bigger lemmas; much like factoring out functions while programming. Sometimes this even allows us to make lemmas auto-provable. Other times we can extract the kinds of decisions we’re making and heuristics we’re manually applying into a separate “oracle” script: a script that interacts with Tamarin’s prover heuristics on our behalf. This can also automate proving tricky lemmas.

Once you realize how much easier certain things are to prove with helper lemmas, you can get a bit carried away. However, you quickly find that Tamarin is being “distracted” by one of the helper lemmas and starts going down long chains of irrelevant reasoning. When this happens, you can hide the helper lemma from the main lemma you’re trying to prove, and sometimes that allows the autoprover to figure out the rest.

Unfortunately, all these strategies require a lot of intuition that is very hard to obtain without spending a lot of time hands-on with Tamarin. Tamarin can sometimes be a bit unclear about what lemmas it’s trying to apply. We had to resort to tricks, like using unique, highly recognizable variable names in lemmas, such that we can reconstruct where a certain goal in the Tamarin interface is coming from.

While doing this work, auto-proving lemmas has been incredibly helpful. Each time you make a tiny change in either a lemma (or any of the lemmas that are reused by it) or in the whole model, you have to re-prove everything. If we needed to put in lots of manual effort each time, this project would be nowhere near done.

This was demonstrated by two bugs we found in one of the core lemmas of the TLS 1.3 model. It turned out that after completing the proof, some refactoring changes were made to the session_key_agreement lemma. These changes seemed innocent, but actually changed the meaning of the lemma, so that it didn’t make sense anymore (the original definition did cover the right security properties, so luckily this doesn’t cause a security problem). Unfortunately, this took a lot of our time to figure out. However, after a huge effort, we’ve done it. We have a proof that KEMTLS achieves its security goals.

Conclusions

Formal methods definitely have a place in the development of security protocols; the development process of TLS 1.3 has really demonstrated this. We think that any proposal for new security protocols should be accompanied by a machine-verified proof of its security properties. Furthermore, because many protocols are currently specified in natural language, formal specification languages should definitely be under consideration. Natural language is inherently ambiguous, and the inevitable differences in interpretation that come from that lead to all kinds of problems.

However, this work cannot be done by academics alone. Many protocols come out of industry who will need to do this for themselves. We would be the first to admit that the usability of these tools for non-experts is not all the way there yet — and industry and academia should collaborate on making these tools more accessible for everyone. We welcome and look forward to these collaborations in the future!

References

“Why does cryptographic software fail?: a case study and open problems” by David Lazar, Haogang Chen, Xi Wang and Nickolai Zeldovich
“SoK: Computer-Aided Cryptography” by Manuel Barbosa, Gilles Barthe, Karthik Bhargavan, Bruno Blanchet, Cas Cremers, Kevin Liao and Bryan Parno
“Post-quantum TLS without handshake signatures” by Peter Schwabe and Douglas Stebila and Thom Wiggers(*)
“More efficient post-quantum KEMTLS with pre-distributed public keys” by Peter Schwabe and Douglas Stebila and Thom Wiggers (*)
"On computable numbers, with an application to the Entscheidungsproblem." by Alan Turing
"A comprehensive symbolic analysis of TLS 1.3." by Cas Cremers, Marko Horvat, Jonathan Hoyland, Sam Scott, and Thyla van der Merwe (*)
"A Logic of Authentication" by Michael Burrows, Martin Abadi, and Roger Michael Needham

* The authors of this blog post were authors on these papers

.....

¹Of course, trying every value isn’t technically impossible, it’s just infeasible, so we make the simplifying assumption that it’s impossible, and just say our proof only applies if the attacker can’t just try every value. Other styles of proof that don’t make that assumption are possible, but we’re not going to go into them.

²For simplicity, this representation assumes that the public portion of a key pair can be derived from the private part, which may not be true in practice. Usually this simplification won’t affect the analysis.

Cloudflare and the IETF

Jonathan Hoyland — Wed, 13 Oct 2021 12:59:37 GMT

The Internet, far from being just a series of tubes, is a huge, incredibly complex, decentralized system. Every action and interaction in the system is enabled by a complicated mass of protocols woven together to accomplish their task, each handing off to the next like trapeze artists high above a virtual circus ring. Stop to think about details, and it is a marvel.

Consider one of the simplest tasks enabled by the Internet: Sending a message from sender to receiver.

The location (address) of a receiver is discovered using DNS, a connection between sender and receiver is established using a transport protocol like TCP, and (hopefully!) secured with a protocol like TLS. The sender's message is encoded in a format that the receiver can recognize and parse, like HTTP, because the two disparate parties need a common language to communicate. Then, ultimately, the message is sent and carried in an IP datagram that is forwarded from sender to receiver based on routes established with BGP.

Even an explanation this dense is laughably oversimplified. For example, the four protocols listed are just the start, and ignore many others with acronyms of their own. The truth is that things are complicated. And because things are complicated, how these protocols and systems interact and influence the user experience on the Internet is complicated. Extra round trips to establish a secure connection increase the amount of time before useful work is done, harming user performance. The use of unauthenticated or unencrypted protocols reveals potentially sensitive information to the network or, worse, to malicious entities, which harms user security and privacy. And finally, consolidation and centralization — seemingly a prerequisite for reducing costs and protecting against attacks — makes it challenging to provide high availability even for essential services. (What happens when that one system goes down or is otherwise unavailable, or to extend our earlier metaphor, when a trapeze isn’t there to catch?)

These four properties — performance, security, privacy, and availability — are crucial to the Internet. At Cloudflare, and especially in the Cloudflare Research team, where we use all these various protocols, we're committed to improving them at every layer in the stack. We work on problems as diverse as helping network security and privacy with TLS 1.3 and QUIC, improving DNS privacy via Oblivious DNS-over-HTTPS, reducing end-user CAPTCHA annoyances with Privacy Pass and Cryptographic Attestation of Personhood (CAP), performing Internet-wide measurements to understand how things work in the real world, and much, much more.

Above all else, these projects are meant to do one thing: focus beyond the horizon to help build a better Internet. We do that by developing, advocating, and advancing open standards for the many protocols in use on the Internet, all backed by implementation, experimentation, and analysis.

Standards

The Internet is a network of interconnected autonomous networks. Computers attached to these networks have to be able to route messages to each other. However, even if we can send messages back and forth across the Internet, much like the storied Tower of Babel, to achieve anything those computers have to use a common language, a lingua franca, so to speak. And for the Internet, standards are that common language.

Many of the parts of the Internet that Cloudflare is interested in are standardized by the IETF, which is a standards development organization responsible for producing technical specifications for the Internet's most important protocols, including IP, BGP, DNS, TCP, TLS, QUIC, HTTP, and so on. The IETF's mission is:

> to make the Internet work better by producing high-quality, relevant technical documents that influence the way people design, use, and manage the Internet.

Our individual contributions to the IETF help further this mission, especially given our role on the Internet. We can only do so much on our own to improve the end-user experience. So, through standards, we engage with those who use, manage, and operate the Internet to achieve three simple goals that lead to a better Internet:

1. Incrementally improve existing and deployed protocols with innovative solutions;

2. Provide holistic solutions to long-standing architectural problems and enable new use cases; and

3. Identify key problems and help specify reusable, extensible, easy-to-implement abstractions for solving them.

Below, we’ll give an example of how we helped achieve each goal, touching on a number of important technical specifications produced in recent years, including DNS-over-HTTPS, QUIC, and (the still work-in-progress) TLS Encrypted Client Hello.

Incremental innovation: metadata privacy with DoH and ECH

The Internet is not only complicated — it is leaky. Metadata seeps like toxic waste from nearly every protocol in use, from DNS to TLS, and even to HTTP at the application layer.

One critically important piece of metadata that still leaks today is the name of the server that clients connect to. When a client opens a connection to a server, it reveals the name and identity of that server in many places, including DNS, TLS, and even sometimes at the IP layer (if the destination IP address is unique to that server). Linking client identity (IP address) to target server names enables third parties to build a profile of per-user behavior without end-user consent. The result is a set of protocols that does not respect end-user privacy.

Fortunately, it’s possible to incrementally address this problem without regressing security. For years, Cloudflare has been working with the standards community to plug all of these individual leaks through separate specialized protocols:

DNS-over-HTTPS encrypts DNS queries between clients and recursive resolvers, ensuring only clients and trusted recursive resolvers see plaintext DNS traffic.
TLS Encrypted Client Hello encrypts metadata in the TLS handshake, ensuring only the client and authoritative TLS server see sensitive TLS information.

These protocols impose a barrier between the client and server and everyone else. However, neither of them prevent the server from building per-user profiles. Servers can track users via one critically important piece of information: the client IP address. Fortunately, for the overwhelming majority of cases, the IP address is not essential for providing a service. For example, DNS recursive resolvers do not need the full client IP address to provide accurate answers, as is evidenced by the EDNS(0) Client Subnet extension. To further reduce information exposure on the web, we helped push further with two more incremental improvements:

Oblivious DNS-over-HTTPS (ODoH) uses cryptography and network proxies to break linkability between client identity (IP address) and DNS traffic, ensuring that recursive resolvers have only the minimal amount of information to provide DNS answers -- the queries themselves, without any per-client information.
MASQUE is standardizing techniques for proxying UDP and IP protocols over QUIC connections, similar to the existing HTTP CONNECT method for TCP-based protocols. Generally, the CONNECT method allows clients to use services without revealing any client identity (IP address).

While each of these protocols may seem only an incremental improvement over what we have today, together, they raise many possibilities for the future of the Internet. Are DoH and ECH sufficient for end-user privacy, or are technologies like ODoH and MASQUE necessary? How do proxy technologies like MASQUE complement or even subsume protocols like ODoH and ECH? These are questions the Cloudflare Research team strives to answer through experimentation, analysis, and deployment together with other stakeholders on the Internet through the IETF. And we could not ask the questions without first laying the groundwork.

Architectural advancement: QUIC and HTTP/3

QUIC and HTTP/3 are transformative technologies. Whilst the TLS handshake forms the heart of QUIC’s security model, QUIC is an improvement beyond TLS over TCP, in many respects, including more encryption (privacy), better protection against active attacks and ossification at the network layer, fewer round trips to establish a secure connection, and generally better security properties. QUIC and HTTP/3 give us a clean slate for future innovation.

Perhaps one of QUIC’s most important contributions is that it challenges and even breaks many established conventions and norms used on the Internet. For example, the antiquated socket API for networking, which treats the network connection as an in-order bit pipe is no longer appropriate for modern applications and developers. Modern networking APIs such as Apple’s Network.framework provide high-level interfaces that take advantage of the new transport features provided by QUIC. Applications using this or even higher-level HTTP abstractions can take advantage of the many security, privacy, and performance improvements of QUIC and HTTP/3 today with minimal code changes, and without being constrained by sockets and their inherent limitations.

Another salient feature of QUIC is its wire format. Nearly every bit of every QUIC packet is encrypted and authenticated between sender and receiver. And within a QUIC packet, individual frames can be rearranged, repackaged, and otherwise transformed by the sender.

Together, these are powerful tools to help mitigate future network ossification and enable continued extensibility. (TLS’s wire format ultimately led to the middlebox compatibility mode for TLS 1.3 due to the many middlebox ossification problems that were encountered during early deployment tests.)

Exercising these features of QUIC is important for the long-term health of the protocol and applications built on top of it. Indeed, this sort of extensibility is what enables innovation.

In fact, we've already seen a flurry of new work based on QUIC: extensions to enable multipath QUIC, different congestion control approaches, and ways to carry data unreliably in the DATAGRAM frame.

Beyond functional extensions, we’ve also seen a number of new use cases emerge as a result of QUIC. DNS-over-QUIC is an upcoming proposal that complements DNS-over-TLS for recursive to authoritative DNS query protection. As mentioned above, MASQUE is a working group focused on standardizing methods for proxying arbitrary UDP and IP protocols over QUIC connections, enabling a number of fascinating solutions and unlocking the future of proxy and VPN technologies. In the context of the web, the WebTransport working group is standardizing methods to use QUIC as a “supercharged WebSocket” for transporting data efficiently between client and server while also depending on the WebPKI for security.

By definition, these extensions are nowhere near complete. The future of the Internet with QUIC is sure to be a fascinating adventure.

Specifying abstractions: Cryptographic algorithms and protocol design

Standards allow us to build abstractions. An ideal standard is one that is usable in many contexts and contains all the information a sufficiently skilled engineer needs to build a compliant implementation that successfully interoperates with other independent implementations. Writing a new standard is sort of like creating a new Lego brick. Creating a new Lego brick allows us to build things that we couldn’t have built before. For example, one new “brick” that’s nearly finished (as of this writing) is Hybrid Public Key Encryption (HPKE). HPKE allows us to efficiently encrypt arbitrary plaintexts under the recipient’s public key.

Mixing asymmetric and symmetric cryptography for efficiency is a common technique that has been used for many years in all sorts of protocols, from TLS to PGP. However, each of these applications has come up with their own design, each with its own security properties. HPKE is intended to be a single, standard, interoperable version of this technique that turns this complex and technical corner of protocol design into an easy-to-use black box. The standard has undergone extensive analysis by cryptographers throughout its development and has numerous implementations available. The end result is a simple abstraction that protocol designers can include without having to consider how it works under-the-hood. In fact, HPKE is already a dependency for a number of other draft protocols in the IETF, such as TLS Encrypted Client Hello, Oblivious DNS-over-HTTPS, and Message Layer Security.

Modes of Interaction

We engage with the IETF in the specification, implementation, experimentation, and analysis phases of a standard to help achieve our three goals of incremental innovation, architectural advancement, and production of simple abstractions.

Our participation in the standards process hits all four phases. Individuals in Cloudflare bring a diversity of knowledge and domain expertise to each phase, especially in the production of technical specifications. Today, we also published a blog post about an upcoming standard that we’ve been working on for a number of years and will be sharing details about how we used formal analysis to make sure that we ruled out as many security issues in the design as possible. We work in close collaboration with people from all around the world as an investment in the future of the Internet. Open standards mean that everyone can take advantage of the latest and greatest in protocol design, whether they use Cloudflare or not.

Cloudflare’s scale and perspective on the Internet are essential to the standards process. We have experience rapidly implementing, deploying, and experimenting with emerging technologies to gain confidence in their maturity. We also have a proven track record of publishing the results of these experiments to help inform the standards process. Moreover, we open source as much of the code we use for these experiments as possible to enable reproducibility and transparency. Our unique collection of engineering expertise and wide perspective allows us to help build standards that work in a wide variety of use cases. By investing time in developing standards that everyone can benefit from, we can make a clear contribution to building a better Internet.

One final contribution we make to the IETF is more procedural and based around building consensus in the community. A challenge to any open process is gathering consensus to make forward progress and avoiding deadlock. We help build consensus through the production of running code, leadership on technical documents such as QUIC and ECH, and even logistically by chairing working groups. (Working groups at the IETF are chaired by volunteers, and Cloudflare numbers a few working group chairs amongst its employees, covering a broad spectrum of the IETF (and its related research-oriented group, the IRTF) from security and privacy to transport and applications.) Collaboration is a cornerstone of the standards process and a hallmark of Cloudflare Research, and we apply it most prominently in the standards process.

If you too want to help build a better Internet, check out some IETF Working Groups and mailing lists. All you need to start contributing is an Internet connection and an email address, so why not give it a go? And if you want to join us on our mission to help build a better Internet through open and interoperable standards, check out our open positions, visiting researcher program, and many internship opportunities!

Exported Authenticators: The long road to RFC

Jonathan Hoyland — Wed, 13 Oct 2021 12:59:28 GMT

Our earlier blog post talked in general terms about how we work with the IETF. In this post we’re going to talk about a particular IETF project we’ve been working on, Exported Authenticators (EAs). Exported Authenticators is a new extension to TLS that we think will prove really exciting. It unlocks all sorts of fancy new authentication possibilities, from TLS connections with multiple certificates attached, to logging in to a website without ever revealing your password.

Now, you might have thought that given the innumerable hours that went into the design of TLS 1.3 that it couldn’t possibly be improved, but it turns out that there are a number of places where the design falls a little short. TLS allows us to establish a secure connection between a client and a server. The TLS connection presents a certificate to the browser, which proves the server is authorised to use the name written on the certificate, for example blog.cloudflare.com. One of the most common things we use that ability for is delivering webpages. In fact, if you’re reading this, your browser has already done this for you. The Cloudflare Blog is delivered over TLS, and by presenting a certificate for blog.cloudflare.com the server proves that it’s allowed to deliver Cloudflare’s blog.

When your browser requests blog.cloudflare.com you receive a big blob of HTML that your browser then starts to render. In the dim and distant past, this might have been the end of the story. Your browser would render the HTML, and display it. Nowadays, the web has become more complex, and the HTML your browser receives often tells it to go and load lots of other resources. For example, when I loaded the Cloudflare blog just now, my browser made 73 subrequests.

As we mentioned in our connection coalescing blog post, sometimes those resources are also served by Cloudflare, but on a different domain. In our connection coalescing experiment, we acquired certificates with a special extension, called a Subject Alternative Name (SAN), that tells the browser that the owner of the certificate can act as two different websites. Along with some further shenanigans that you can read about in our blog post, this lets us serve the resources for both the domains over a single TLS connection.

Cloudflare, however, services millions of domains, and we have millions of certificates. It’s possible to generate certificates that cover lots of domains, and in fact this is what Cloudflare used to do. We used to use so-called “cruise-liner” certificates, with dozens of names on them. But for connection coalescing this quickly becomes impractical, as we would need to know what sub-resources each webpage might request, and acquire certificates to match. We switched away from this model because issues with individual domains could affect other customers.

What we’d like to be able to do is serve as much content as possible down a single connection. When a user requests a resource from a different domain they need to perform a new TLS handshake, costing valuable time and resources. Our connection coalescing experiment showed the benefits when we know in advance what resources are likely to be requested, but most of the time we don’t know what subresources are going to be requested until the requests actually arrive. What we’d rather do is attach extra identities to a connection after it’s been established, and we know what extra domains the client actually wants. Because the TLS connection is just a transport mechanism and doesn’t understand the information being sent across it, it doesn’t actually know what domains might subsequently be requested. This is only available to higher-layer protocols such as HTTP. However, we don’t want any website to be able to impersonate another, so we still need to have strong authentication.

Exported Authenticators

Enter Exported Authenticators. They give us even more than we asked for. They allow us to do application layer authentication that’s just as strong as the authentication you get from TLS, and then tie it to the TLS channel. Now that’s a pretty complicated idea, so let’s break it down.

To understand application layer authentication we first need to explain what the application layer is. The application layer is a reference to the OSI model. The OSI model describes the various layers of abstraction we use, to make things work across the Internet. When you’re developing your latest web application you don’t want to have to worry about how light is flickered down a fibre optic cable, or even how the TLS handshake is encoded (although that’s a fascinating topic in its own right, let’s leave that for another time.)

All you want to care about is having your content delivered to your end-user, and using TLS gives you a guaranteed in-order, reliable, authenticated channel over which you can communicate. You just shove bits in one end of the pipe, and after lots of blinky lights, fancy routing, maybe a touch of congestion control, and a little decoding, *poof*, your data arrives at the end-user.

The application layer is the top of the OSI stack, and contains things like HTTP. Because the TLS handshake is lower in the stack, the application is oblivious to this process. So, what Exported Authenticators give us is the ability for the very top of the stack to reliably authenticate their partner.

The seven-layered OSI model

Now let’s jump back a bit, and discuss what we mean when we say that EAs give us authentication that’s as strong as TLS authentication. TLS, as we know, is used to create a secure connection between two endpoints, but lots of us are hazy when we try and pin down exactly what we mean by “secure”. The TLS standard makes eight specific promises, but rather than get buried in that particular ocean of weeds, let’s just pick out the one guarantee that we care about most: Peer Authentication.

Peer authentication: The client's view of the peer identity should reflect the server's identity. [...]

In other words, if the client thinks that it’s talking to example.com then it should, in fact, be talking to example.com.

What we want from EAs is that if I receive an EA then I have cryptographic proof that the person I’m talking to is the person I think I’m talking to. Now at this point you might be wondering what an EA actually looks like, and what it has to do with certificates. Well, an EA is actually a trio of messages, the first of which is a Certificate. The second is a CertificateVerify, a cryptographic proof that the sender knows the private key for the certificate. Finally there is a Finished message, which acts as a MAC, and proves the first two parts of the message haven’t been tampered with. If this structure sounds familiar to you, it’s because it’s the same structure as used by the server in the TLS handshake to prove it is the owner of the certificate.

The final piece of unpacking we need to do is explaining what we mean by tying the authentication to the TLS channel. Because EAs are an application layer construct they don’t provide any transport mechanism. So, whilst I know that the EA was created by the server I want to talk to, without binding the EA to a TLS connection I can’t be sure that I’m talking directly to the server I want.

Without protection, a malicious server can move Exported Authenticators from one connection to another.

For all I know, the TLS server I’m talking to is creating a new TLS connection to the EA Server, and relaying my request, and then returning the response. This would be very bad, because it would allow a malicious server to impersonate any server that supports EAs.

Because EAs are bound to a single TLS connection, if a malicious server copies an EA from one connection to another it will fail to verify.

EAs therefore have an extra security feature. They use the fact that every TLS connection is guaranteed to produce a unique set of keys. EAs take one of these keys and use it to construct the EA. This means that if some malicious third-party copies an EA from one TLS session to another, the recipient wouldn’t be able to validate it. This technique is called channel binding, and is another fascinating topic, but this post is already getting a bit long, so we’ll have to revisit channel binding in a future blog post.

How the sausage is made

OK, now we know what EAs do, let’s talk about how they were designed and built. EAs are going through the IETF standardisation process. Draft standards move through the IETF process starting as Internet Drafts (I-Ds), and ending up as published Requests For Comment (RFCs). RFCs are voluntary standards that underpin much of the global Internet plumbing, and not just for security protocols like TLS. RFCs define DNS, UDP, TCP, and many, many more.

The first step in producing a new IETF standard is coming up with a proposal. Designing security protocols is a very conservative business, firstly because it’s very easy to introduce really subtle bugs, and secondly, because if you do introduce a security issue, things can go very wrong, very quickly. A flaw in the design of a protocol can be especially problematic as it can be replicated across multiple independent implementations — for example the TLS renegotiation vulnerabilities reported in 2009 and the custom EC(DH) parameters vulnerability from 2012. To minimise the risks of design issues, EAs hew closely to the design of the TLS 1.3 handshake.

Security and Assurance

Before making a big change to how authentication works on the Internet, we want as much assurance as possible that we’re not going to break anything. To give us more confidence that EAs are secure, they reuse parts of the design of TLS 1.3. The TLS 1.3 design was carefully examined by dozens of experts, and underwent multiple rounds of formal analysis — more on that in a moment. Using well understood design patterns is a super important part of security protocols. Making something secure is incredibly difficult, because security issues can be introduced in thousands of ways, and an attacker only needs to find one. By starting from a well understood design we can leverage the years of expertise that went into it.

Another vital step in catching design errors early is baked into the IETF process: achieving rough consensus. Although the ins and outs of the IETF process are worthy of their own blog post, suffice it to say the IETF works to ensure that all technical objections get addressed, and even if they aren’t solved they are given due care and attention. Exported Authenticators were proposed way back in 2016, and after many rounds of comments, feedback, and analysis the TLS Working Group (WG) at the IETF has finally reached consensus on the protocol. All that’s left before the EA I-D becomes an RFC is for a final revision of the text to be submitted and sent to the RFC Editors, leading hopefully to a published standard very soon.

As we just mentioned, the WG has to come to a consensus on the design of the protocol. One thing that can hold up achieving consensus are worries about security. After the Snowden revelations there was a barrage of attacks on TLS 1.2, not to mention some even earlier attacks from academia. Changing how trust works on the Internet can be pretty scary, and the TLS WG didn’t want to be caught flat-footed. Luckily this coincided with the maturation of some tools and techniques we can use to get mathematical guarantees that a protocol is secure. This class of techniques is known as formal methods. To help ensure that people are confident in the security of EAs I performed a formal analysis.

Formal Analysis

Formal analysis is a special technique that can be used to examine security protocols. It creates a mathematical description of the protocol, the security properties we want it to have, and a model attacker. Then, aided by some sophisticated software, we create a proof that the protocol has the properties we want even in the presence of our model attacker. This approach is able to catch incredibly subtle edge cases, which, if not addressed, could lead to attacks, as has happened before. Trotting out a formal analysis gives us strong assurances that we haven’t missed any horrible issues. By sticking as closely as possible to the design of TLS 1.3 we were able to repurpose much of the original analysis for EAs, giving us a big leg up in our ability to prove their security. Our EA model is available in Bitbucket, along with the proofs. You can check it out using Tamarin, a theorem prover for security protocols.

Formal analysis, and formal methods in general, give very strong guarantees that rule out entire classes of attack. However, they are not a panacea. TLS 1.3 was subject to a number of rounds of formal analysis, and yet an attack was still found. However, this attack in many ways confirms our faith in formal methods. The attack was found in a blind spot of the proof, showing that attackers have been pushed to the very edges of the protocol. As our formal analyses get more and more rigorous, attackers will have fewer and fewer places to search for attacks. As formal analysis has become more and more practical, more and more groups at the IETF have been asking to see proofs of security before standardising new protocols. This hopefully will mean that future attacks on protocol design will become rarer and rarer.

Once the EA I-D becomes an RFC, then all sorts of cool stuff gets unlocked — for example OPAQUE-EAs, which will allow us to do password-based login on the web without the server ever seeing the password! Watch this space.

Cloudflare's Ethereum Gateway

Jonathan Hoyland — Wed, 19 Jun 2019 13:01:00 GMT

Today, as part of Crypto Week 2019, we are excited to announce Cloudflare's Ethereum Gateway, where you can interact with the Ethereum network without installing any additional software on your computer.

This is another tool in Cloudflare’s Distributed Web Gateway tool set. Currently, Cloudflare lets you host content on the InterPlanetary File System (IPFS) and access it through your own custom domain. Similarly, the new Ethereum Gateway allows access to the Ethereum network, which you can provision through your custom hostname.

This setup makes it possible to add interactive elements to sites powered by Ethereum smart contracts, a decentralized computing platform. And, in conjunction with the IPFS gateway, this allows hosting websites and resources in a decentralized manner, and has the extra bonus of the added speed, security, and reliability provided by the Cloudflare edge network. You can access our Ethereum gateway directly at https://cloudflare-eth.com.

This brief primer on how Ethereum and smart contracts work has examples of the many possibilities of using the Cloudflare Distributed Web Gateway.

Primer on Ethereum

You may have heard of Ethereum as a cryptocurrency. What you may not know is that Ethereum is so much more. Ethereum is a distributed virtual computing network that stores and enforces smart contracts.

So, what is a smart contract?

Good question. Ethereum smart contracts are simply a piece of code stored on the Ethereum blockchain. When the contract is triggered, it runs on the Ethereum Virtual Machine (EVM). The EVM is a distributed virtual machine that runs smart contract code and produces cryptographically verified changes to the state of the Ethereum blockchain as its result.

To illustrate the power of smart contracts, let's consider a little example.

Anna wants to start a VPN provider, but she lacks the capital. To raise funds for her venture she decides to hold an Initial Coin Offering (ICO). Rather than design an ICO contract from scratch Anna bases her contract off of ERC-20. ERC-20 is a template for issuing fungible tokens, perfect for ICOs. Anna sends her ERC-20 compliant contract to the Ethereum network, and starts to sell stock in her new company, VPN Co.

Once she's sorted out funds, Anna sits down and starts to write a smart contract. Anna’s contract asks customers to send her their public key, along with some Ether (the coin product of Ethereum). She then authorizes the public key to access her VPN service. All without having to hold any secret information. Huzzah!

Next, rather than set up the infrastructure to run a VPN herself, Anna decides to use the blockchain again, but this time as a customer. Cloud Co. sells managed cloud infrastructure using their own smart contract. Anna programs her contract to send the appropriate amount of Ether to Cloud Co.'s contract. Cloud Co. then provisions the servers she needs to host her VPN. By automatically purchasing more infrastructure every time she has a new customer, her VPN company can scale totally autonomously.

Finally, Anna pays dividends to her investors out of the profits, keeping a little for herself.

And there you have it.

A decentralised, autonomous, smart VPN provider.

A smart contract stored on the blockchain has an associated account for storing funds, and the contract is triggered when someone sends Ether to that account. So for our VPN example, the provisioning contract triggers when someone transfers money into the account associated with Anna’s contract.

What distinguishes smart contracts from ordinary code?

The "smart" part of a smart contract is they run autonomously. The "contract" part is the guarantee that the code runs as written.

Because this contract is enforced cryptographically, maintained in the tamper-resistant medium of the blockchain and verified by the consensus of the network, these contracts are more reliable than regular contracts which can provoke dispute.

Ethereum Smart Contracts vs. Traditional Contracts

A regular contract is enforced by the court system, litigated by lawyers. The outcome is uncertain; different courts rule differently and hiring more or better lawyers can swing the odds in your favor.

Smart contract outcomes are predetermined and are nearly incorruptible. However, here be dragons: though the outcome can be predetermined and incorruptible, a poorly written contract might not have the intended behavior, and because contracts are immutable, this is difficult to fix.

How are smart contracts written?

You can write smart contracts in a number of languages, some of which are Turing complete, e.g. Solidity. A Turing complete language lets you write code that can evaluate any computable function. This puts Solidity in the same class of languages as Python and Java. The compiled bytecode is then run on the EVM.

The EVM differs from a standard VM in a number of ways:

The EVM is distributed

Each piece of code is run by numerous nodes. Nodes verify the computation before accepting a block, and therefore ensure that miners who want their blocks accepted must always run the EVM honestly. A block is only considered accepted when more than half of the network accepts it. This is the consensus part of Ethereum.

The EVM is entirely deterministic

This means that the same inputs to a function always produce the same outputs. Because regular VMs have access to file storage and the network, the results of a function call can be non-deterministic. Every EVM has the same start state, thus a given set of inputs always gives the same outputs. This makes the EVM more reliable than a standard VM.

There are two big gotchas that come with this determinism:

EVM bytecode is Turing complete and therefore discerning the outputs without running the computation is not always possible.
Ethereum smart contracts can store state on the blockchain. This means that the output of the function can vary as the blockchain changes. Although, technically this is deterministic in that the blockchain is an input to the function, it may still be impossible to derive the output in advance.

This however means that they suffer from the same problems as any piece of software – bugs. However, unlike normal code where the authors can issue a patch, code stored on the blockchain is immutable. More problematically, even if the author provides a new smart contract, the old one is always still available on the blockchain.

This means that when writing contracts authors must be especially careful to write secure code, and include a kill switch to ensure that if bugs do reside in the code, they can be squashed. If there is no kill switch and there are vulnerabilities in the smart contract that can be exploited, it can potentially lead to the theft of resources from the smart contract or from other individuals. EVM Bytecode includes a special SELFDESTRUCT opcode that deletes a contract, and sends all funds to the specified address for just this purpose.

The need to include a kill switch was brought into sharp focus during the infamous DAO incident. The DAO smart contract acted as a complex decentralized venture capital (VC) fund and held Ether worth 250 million dollars at its peack collected from a group of investors. Hackers exploited vulnerabilities in the smart contract and stole Ether wirth 50 million dollars.

Because there is no way to undo transactions in Ethereum, there was a highly controversial “hard fork,” where the majority of the community agreed to accept a block with an “irregular state change” that essentially drained all DAO funds into a special “WithdrawDAO” recovery contract. By convincing enough miners to accept this irregular block as valid, the DAO could return funds.

Not everyone agreed with the change. Those who disagreed rejected the irregular block and formed the Ethereum Classic network, with both branches of the fork growing independently.

Kill switches, however, can cause their own problems. For example, when a contract used as a library flips its kill switch, all contracts relying on this contract can no longer operate as intended, even though the underlying library code is immutable. This caused over 500,000 ETH to become stuck in multi-signature wallets when an attacker triggered the kill switch of an underlying library.

Users of the multi-signature library assumed the immutability of the code meant that the library would always operate as anticipated. But the smart contracts that interact with the blockchain are only deterministic when accounting for the state of the blockchain.

In the wake of the DAO, various tools were created that check smart contracts for bugs or enable bug bounties, for example Securify and The Hydra.

Come here, you ...

Another way smart contracts avoid bugs is using standardized patterns. For example, ERC-20 defines a standardized interface for producing tokens such as those used in ICOs, and ERC-721 defines a standardized interface for implementing non-fungible tokens. Non-fungible tokens can be used for trading-card games like CryptoKitties. CryptoKitties is a trading-card style game built on the Ethereum blockchain. Players can buy, sell, and breed cats, with each cat being unique.

CryptoKitties is built on a collection of smart contracts that provides an open-source Application Binary Interface (ABI) for interacting with the KittyVerse -- the virtual world of the CryptoKitties application. An ABI simply allows you to call functions in a contract and receive any returned data. The KittyBase code may look like this:

Contract KittyBase is KittyAccessControl {
	event Birth(address owner, uint256 kittyId, uint256 matronId, uint256 sireId, uint256 genes);
	event Transfer(address from, address to, uint256 tokenId);
    struct Kitty {
        uint256 genes;
        uint64 birthTime;
        uint64 cooldownEndBlock;
        uint32 matronId;
        uint32 sireId;
        uint32 siringWithId;
        uint16 cooldownIndex;
        uint16 generation;
    }
	[...]
    function _transfer(address _from, address _to, uint256 _tokenId) internal {
    ...
    }
    function _createKitty(uint256 _matronId, uint256 _sireId, uint256 _generation, uint256 _genes, address _owner) internal returns (uint) {
    ...
    }
	[...]
}

Besides defining what a Kitty is, this contract defines two basic functions for transferring and creating kitties. Both are internal and can only be called by contracts that implement KittyBase. The KittyOwnership contract implements both ERC-721 and KittyBase, and implements an external transfer function that calls the internal _transfer function. This code is compiled into bytecode written to the blockchain.

By implementing a standardised interface like ERC-721, smart contracts that aren’t specifically aware of CryptoKitties can still interact with the KittyVerse. The CryptoKitties ABI functions allow users to create distributed apps (dApps), of their own design on top of the KittyVerse, and allow other users to use their dApps. This extensibility helps demonstrate the potential of smart contracts.

How is this so different?

Smart contracts are, by definition, public. Everyone can see the terms and understand where the money goes. This is a radically different approach to providing transparency and accountability. Because all contracts and transactions are public and verified by consensus, trust is distributed between the people, rather than centralized in a few big institutions.

The trust given to institutions is historic in that we trust them because they have previously demonstrated trustworthiness.

The trust placed in consensus-based algorithms is based on the assumption that most people are honest, or more accurately, that no sufficiently large subset of people can collude to produce a malicious outcome. This is the democratisation of trust.

In the case of the DAO attack, a majority of nodes agreed to accept an “irregular” state transition. This effectively undid the damage of the attack and demonstrates how, at least in the world of blockchain, perception is reality. Because most people “believed” (accepted) this irregular block, it became a “real,” valid block. Most people think of the blockchain as immutable, and trust the power of consensus to ensure correctness, however if enough people agree to do something irregular, they don't have to keep the rules.

So where does Cloudflare fit in?

Accessing the Ethereum network and its attendant benefits directly requires running complex software, including downloading and cryptographically verifying hundreds of gigabytes of data, which apart from producing technical barriers to entry for users, can also exclude people with low-power devices.

To help those users and devices access the Ethereum network, the Cloudflare Ethereum gateway allows any device capable of accessing the web to interact with the Ethereum network in a safe, reliable way.

Through our gateway, not only can you explore the blockchain, but if you give our gateway a signed transaction, we’ll push it to the network to allow miners to add it to their blockchain. This means that you can send Ether and even put new contracts on the blockchain without having to run a node.

"But Jonathan," I hear you say, "by providing a gateway aren't you just making Cloudflare a centralizing institution?"

That’s a fair question. Thankfully, Cloudflare won’t be alone in offering these gateways. We’re joining alongside organizations, such as Infura, to expand the constellation of gateways that already exist. We hope that, by providing a fast, reliable service, we can enable people who never previously used smart-contracts to do so, and in so doing bring the benefits they offer to billions of regular Internet users.

"We're excited that Cloudflare is bringing their infrastructure expertise to the Ethereum ecosystem. Infura has always believed in the importance of standardized, open APIs and compatibility between gateway providers, so we look forward to collaborating with their team to build a better distributed web." - E.G. Galano, Infura co-founder.

By providing a gateway to the Ethereum network, we help users make the jump from general web-user to cryptocurrency native, and eventually make the distributed web a fundamental part of the Internet.

What can you do with Cloudflare's Gateway?

Visit cloudflare-eth.com to interact with our example app. But to really explore the Ethereum world, access the RPC API, where you can do anything that can be done on the Ethereum network itself, from examining contracts, to transferring funds.

Our Gateway accepts POST requests containing JSON. For a complete list of calls, visit the Ethereum github page. So, to get the block number of the most recent block, you could run:

curl https://cloudflare-eth.com -H "Content-Type: application/json" --data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}'

and you would get a response something like this:

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": "0x780f17"
}

We also invite developers to build dApps based on our Ethereum gateway using our API. Our API allows developers to build websites powered by the Ethereum blockchain. Check out developer docs to get started. If you want to read more about how Ethereum works check out this deep dive.

The architecture

Cloudflare is uniquely positioned to host an Ethereum gateway, and we have the utmost faith in the products we offer to customers. This is why the Cloudflare Ethereum gateway runs as a Cloudflare customer and we dogfood our own products to provide a fast and reliable gateway. The domain we run the gateway on (https://cloudflare-eth.com) uses Cloudflare Workers to cache responses for popular queries made to the gateway. Responses for these queries are answered directly from the Cloudflare edge, which can result in a ~6x speed-up.

We also use Load balancing and Argo Tunnel for fast, redundant, and secure content delivery. With Argo Smart Routing enabled, requests and responses to our Ethereum gateway are tunnelled directly from our Ethereum node to the Cloudflare edge using the best possible routing.

Similar to our IPFS gateway, cloudflare-eth.com is an SSL for SaaS provider. This means that anyone can set up the Cloudflare Ethereum gateway as a backend for access to the Ethereum network through their own registered domains. For more details on how to set up your own domain with this functionality, see the Ethereum tab on cloudflare.com/distributed-web-gateway.

With these features, you can use Cloudflare’s Distributed Web Gateway to create a fully decentralized website with an interactive backend that allows interaction with the IPFS and Ethereum networks. For example, you can host your content on IPFS (using something like Pinata to pin the files), and then host the website backend as a smart contract on Ethereum. This architecture does not require a centralized server for hosting files or the actual website. Added to the power, speed, and security provided by Cloudflare’s edge network, your website is delivered to users around the world with unparalleled efficiency.

Embracing a distributed future

At Cloudflare, we support technologies that help distribute trust. By providing a gateway to the Ethereum network, we hope to facilitate the growth of a decentralized future.

We thank the Ethereum Foundation for their support of a new gateway in expanding the distributed web:

“Cloudflare's Ethereum Gateway increases the options for thin-client applications as well as decentralization of the Ethereum ecosystem, and I can't think of a better person to do this work than Cloudflare. Allowing access through a user's custom hostname is a particularly nice touch. Bravo.” - Dr. Virgil Griffith, Head of Special Projects, Ethereum Foundation.

We hope that by allowing anyone to use the gateway as the backend for their domain, we make the Ethereum network more accessible for everyone; with the added speed and security brought by serving this content directly from Cloudflare’s global edge network.

So, go forth and build our vision – the distributed crypto-future!