Private by design: building privacy-preserving products with Cloudflare's Privacy Edge

When Cloudflare was founded, our value proposition had three pillars: more secure, more reliable, and more performant. Over time, we’ve realized that a better Internet is also a more private Internet, and we want to play a role in building it.

User awareness and expectations of and for privacy are higher than ever, but we believe that application developers and platforms shouldn’t have to start from scratch. We’re excited to introduce Privacy Edge – Code Auditability, Privacy Gateway, Privacy Proxy, and Cooperative Analytics – a suite of products that make it easy for site owners and developers to build privacy into their products, by default.

Building network-level privacy into the foundations of app infrastructure

As you’re browsing the web every day, information from the networks and apps you use can expose more information than you intend. When accumulated over time, identifiers like your IP address, cookies, browser and device characteristics create a unique profile that can be used to track your browsing activity. We don’t think this status quo is right for the Internet, or that consumers should have to understand the complex ecosystem of third-party trackers to maintain privacy. Instead, we’ve been working on technologies that encourage and enable website operators and app developers to build privacy into their products at the protocol level.

Getting privacy right is hard. We figured we’d start in the area we know best: building privacy into our network infrastructure. Like other work we’ve done in this space – offering free SSL certificates to make encrypted HTTP requests the norm, and launching 1.1.1.1, a privacy-respecting DNS resolver, for example – the products we’re announcing today are built upon the foundations of open Internet standards, many of which are co-authored by members of our Research Team.

Privacy Edge – the collection of products we’re announcing today, includes:

Privacy Gateway: A lightweight proxy that encrypts request data and forwards it through an IP-blinding relay
Code Auditability: A solution to verifying that code delivered in your browser hasn’t been tampered with
Private Proxy: A proxy that offers the protection of a VPN, built natively into application architecture
Cooperative Analytics: A multi-party computation approach to measurement and analytics based on an emerging distributed aggregation protocol.

Today’s announcement of Privacy Edge isn’t exhaustive. We’re continuing to explore, research and develop new privacy-enabling technologies, and we’re excited about all of them.

Privacy Gateway: IP address privacy for your users

There are situations in which applications only need to receive certain HTTP requests for app functionality, but linking that data with who or where it came from creates a privacy concern.

We recently partnered with Flo Health, a period tracking app, to solve exactly that privacy concern: for users that have turned on “Anonymous mode,” Flo encrypts and forwards traffic through Privacy Gateway so that the network-level request information (most importantly, users’ IP addresses) are replaced by the Cloudflare network.

How data is encapsulated, forwarded, and decapsulated in the Privacy Gateway system.

So how does it work? Privacy Gateway is based on Oblivious HTTP, an emerging IETF standard, and at a high level describes the following data flow:

The client encapsulates an HTTP request using the public key of the customer’s gateway server, and sends it to the relay over a client<>relay HTTPS connection.
The relay forwards the request to the server over its own relay<>gateway HTTPS connection.
The gateway server decapsulates the request, forwarding it to the application server.
The gateway server returns an encapsulated response to the relay, which then forwards the result to the client.

The novel feature Privacy Gateway implements from the OHTTP specification is that messages sent through the relay are encrypted (via HPKE) to the application server, so that the relay learns nothing of the application data beyond the source and destination of each message.

The end result is that the relay will know where the data request is coming from (i.e. users’ IP addresses) but not what it contains (i.e. contents of the request), and the application can see what the data contains but won’t know where it comes from. A win for end-user privacy.

Delivering verifiable and authentic code for privacy-critical applications

How can you ensure that the code — the JavaScript, CSS or even HTML —delivered to a browser hasn’t been tampered with?

One way is to generate a hash (a consistent, unique, and shorter representation) of the code, and have two independent parties compare those hashes when delivered to the user's browser.

Our Code Auditability service does exactly that, and our recent partnership with Meta deployed it at scale to WhatsApp Web. Installing their Code Verify browser extension ensures users can be sure that they are delivered the code they’re intended to run – free of tampering or corrupted files.

With WhatsApp Web:

WhatsApp publishes the latest version of their JavaScript libraries to their servers, and the corresponding hash for that version to Cloudflare’s audit endpoint.
A WhatsApp web client fetches the latest libraries from WhatsApp.
The Code Verify browser extension subsequently fetches the hash for that version from Cloudflare over a separate, secure connection.
Code Verify compares the “known good” hash from Cloudflare with the hash of the libraries it locally computed.

If the hashes match, as they should under almost any circumstance, the code is “verified” from the perspective of the extension. If the hashes don’t match, it indicates that the code running on the user's browser is different from the code WhatsApp intended to run on all its user's browsers.

How Cloudflare and WhatsApp Web verify code shipped to users isn't tampered with.

Right now, we call this "Code Auditability" and we see a ton of other potential use cases including password managers, email applications, certificate issuance – all technologies that are potentially targets of tampering or security threats because of the sensitive data they handle.

In the near term, we’re working with other app developers to co-design solutions that meet their needs for privacy-critical products. In the long term, we’re working on standardizing the approach, including building on existing Content Security Policy standards, or the Isolated Web Apps proposal, and even an approach towards building Code Auditability natively into the browser so that a browser extension (existing or new) isn't required.

Privacy-preserving proxying – built into applications

What if applications could build the protection of a VPN into their products, by default?

Privacy Proxy is our platform to proxy traffic through Cloudflare using a combination of privacy protocols that make it much more difficult to track users’ web browsing activity over time. At a high level, the Privacy Proxy Platform encrypts browsing traffic, replaces a device’s IP address with one from the Cloudflare network, and then forwards it onto its destination.

System architecture for Privacy Proxy.

The Privacy Proxy platform consists of several pieces and protocols to make it work:

Privacy API: a service that issues unique cryptographic tokens, later redeemed against the proxy service to ensure that only valid clients are able to connect to the service.
Geolocated IP assignment: a service that assigns each connection a new Cloudflare IP address based on the client’s approximate location.
Privacy Proxy: the HTTP CONNECT-based service running on Cloudflare’s network that handles the proxying of traffic. This service validates the privacy token passed by the client, enforces any double spend prevention necessary for the token.

We’re working on several partnerships to provide network-level protection for user’s browsing traffic, most recently with Apple for Private Relay. Private Relay’s design adds privacy to the traditional proxy design by adding an additional hop – an ingress proxy, operated by Apple – that separates handling users’ identities (i.e., whether they’re a valid iCloud+ user) from the proxying of traffic – the egress proxy, operated by Cloudflare.

Measurements and analytics without seeing individual inputs

What if you could calculate the results of a poll, without seeing individuals' votes, or update inputs to a machine learning model that predicted COVID-19 exposure without seeing who was exposed?

It might seem like magic, but it's actually just cryptography. Cooperative Analytics is a multi-party computation system for aggregating privacy-sensitive user measurements that doesn’t reveal individual inputs, based on the Distributed Aggregation Protocol (DAP).

How data flows through the Cooperative Analytics system.

At a high-level, DAP takes the core concept behind MapReduce — what became a fundamental way to aggregate large amounts of data — and rethinks how it would work with privacy-built in, so that each individual input cannot be (provably) mapped back to the original user.

Specifically:

Measurements are first "secret shared," or split into multiple pieces. For example, if a user's input is the number 5, her input could be split into two shares of [10,-5].
The input share pieces are then distributed between different, non-colluding servers for aggregation (in this example, simply summed up). Similar to Privacy Gateway or Private Proxy, no one party has all the information needed to reconstruct any user's input.
Depending on the use case, the servers will then communicate with one another in order to verify that the input is "valid" – so that no one can insert an input that throws off the entire results. The magic of multi-party computation is that the servers can perform this computation without learning anything about the input beyond its validity.
Once enough input shares have been aggregated to ensure strong anonymity and a statistically significant sample size – each server sends its sum of the input shares to the overall consumer of this service to then compute the final result.

For simplicity, the above example talks about measurements as summed up numbers, but DAP describes algorithms for multiple different types of inputs: the most common string input, or a linear regression, for example.

Early iterations of this system have been implemented by Apple and Google for COVID-19 exposure notifications, but there are many other potential use cases for a system like this: think sensitive browser telemetry, geolocation data – any situation where one has a question about a population of users, but doesn't want to have to measure them directly.

Because this system requires different parties to operate separate aggregation servers, Cloudflare is working with several partners to act as one of the aggregation servers for DAP. We’re calling our implementation Daphne, and it’s built on top of Cloudflare Workers.

Privacy still requires trust

Part of what's cool about these systems is that they distribute information — whether user data, network traffic, or both — amongst multiple parties.

While we think that products included in Privacy Edge are moving the Internet in the right direction, we understand that trust only goes so far. To that end, we're trying to be as transparent as possible.

We've open sourced the code for Privacy Gateway's server and DAP's aggregation server, and all the standards work we're doing is in public with the IETF.
We're also working on detailed and accessible privacy notices for each product that describe exactly what kind of network data Cloudflare sees, doesn't see, and how long we retain it for.
And, most importantly, we’re continuing to develop new protocols (like Oblivious HTTP) and technologies that don’t just require trust, but that can provably minimize the data observed or logged.

We'd love to see more folks get involved in the standards space, and we welcome feedback from privacy experts and potential customers on how we can improve the integrity of these systems.

We’re looking for collaborators

Privacy Edge products are currently in early access.

We're looking for application developers who want to build more private user-facing apps with Privacy Gateway; browser and existing VPN vendors looking to improve network-level security for their users via Privacy Proxy; and anyone shipping sensitive software on the Internet that is looking to iterate with us on code auditability and web app signing.

If you're interested in working with us on furthering privacy on the Internet, then please reach out, and we’ll be in touch!

The Cloudflare Blog

Private by design: building privacy-preserving products with Cloudflare's Privacy Edge

Building network-level privacy into the foundations of app infrastructure

Privacy Gateway: IP address privacy for your users

Delivering verifiable and authentic code for privacy-critical applications

Privacy-preserving proxying – built into applications

Measurements and analytics without seeing individual inputs

Privacy still requires trust

We’re looking for collaborators

Fixing request smuggling vulnerabilities in Pingora OSS deployments

Active defense: introducing a stateful vulnerability scanner for APIs

From reactive to proactive: closing the phishing gap with LLMs

How Cloudy translates complex security into human action