The Cloudflare Blog

Shared Dictionaries: compression that keeps up with the agentic web

Alex Krivit — Fri, 17 Apr 2026 13:02:00 GMT

Web pages have grown 6-9% heavier every year for the past decade, spurred by the web becoming more framework-driven, interactive, and media-rich. Nothing about that trajectory is changing. What is changing is how often those pages get rebuilt and how many clients request them. Both are skyrocketing because of agents.

Shared dictionaries shrink asset transfers from servers to browsers so pages load faster with less bloat on the wire, especially for returning users or visitors on a slow connection. Instead of re-downloading entire JavaScript bundles after every deploy, the browser tells the server what it already has cached, and the server only sends the file diffs.

Today, we’re excited to give you a sneak peek of our support for shared compression dictionaries, show you what we’ve seen in early testing, and reveal when you’ll be able to try the beta yourself (hint: it’s April 30, 2026!).

The problem: more shipping = less caching

Agentic crawlers, browsers, and other tools hit endpoints repeatedly, fetching full pages, often to extract a fragment of information. Agentic actors represented just under 10% of total requests across Cloudflare's network during March 2026¹, up ~60% year-over-year.

Every page shipped is heavier than last year and read more often by machines than ever before. But agents aren’t just consuming the web, they’re helping to build it. AI-assisted development means teams ship faster. Increasing the frequency of deploys, experiments, and iterations is great for product velocity, but terrible for caching.

As agents push a one-line fix, the bundler re-chunks, filenames change, and every user on earth could re-download the entire application. Not because the code is meaningfully any different, but because the browser/client has no way to know specifically what changed. It sees a new URL and starts from zero. Traditional compression helps with the size of each download, but it can't help with the redundancy. It doesn't know the client already has 95% of the file cached. So every deploy, across every user, across every bot, sends redundant bytes again and again. Ship ten small changes a day and you've effectively opted out of caching. This wastes bandwidth and CPU in a web where hardware is quickly becoming the bottleneck.

In order to scale with more requests hitting heavier pages that are re-deployed more often, compression has to get smarter.

What are shared dictionaries?

A compression dictionary is a shared reference between server and client that works like a cheat sheet. Instead of compressing a response from scratch, the server says "you already know this part of the file because you’ve cached it before" and only sends what's new. The client holds the same reference and uses it to reconstruct the full response during decompression. The more the dictionary can reference content in the file, the smaller the compressed output that is transferred to the client.

This principle of compressing against what's already known is how modern compression algorithms pull ahead of their predecessors. Brotli ships with a built-in dictionary of common web patterns like HTML attributes and common phrases; Zstandard is purpose-built for custom dictionaries: you can feed it representative content samples and it generates an optimized dictionary for the kind of content you serve. Gzip has neither; it must build dictionaries by finding patterns in real-time as it’s compressing. These “traditional compression” algorithms are already available on Cloudflare today.

Shared dictionaries take this principle a step further: the previously cached version of the resource becomes the dictionary. Remember the deploy problem where a team ships a one-line fix and every user re-downloads the full bundle? With shared dictionaries, the browser already has the old version cached. The server compresses against it, sending only the diff. That 500KB bundle with a one-line change becomes only a few kilobytes on the wire. At 100K daily users and 10 deploys a day, that's the difference between 500GB of transfer and a few hundred megabytes.

Delta compression

Delta compression is what turns the version the browser already has into the dictionary. The protocol looks to when the server first serves a resource, it attaches a Use-As-Dictionary response header, telling the browser to essentially hold onto the file because it’ll be useful later. On the next request for that resource, the browser sends an Available-Dictionary header back, telling the server, "here's what I've got." The server then proceeds to compress the new version against the old one and sends only the diff. No separate dictionary file needed.

This is where the payoff lands for real applications. Versioned JS bundles, CSS files, framework updates, and anything that changes incrementally between releases. The browser has app.bundle.v1.js cached already and the developer makes an update and deploys app.bundle.v2.js. Delta compression only sends the diff between these versions. Every subsequent version after is also just a diff. Version three compresses against version two. Version 47 compresses against version 46. The savings don't reset, they persist across the entire release history.

There's also active discussion in the community about custom and dynamic dictionaries for non-static content. That's future work, but the implications are significant. We'll save that for another post.

So why the wait?

If shared dictionaries are so powerful, why doesn't everyone use them already?

Because the last time they were tried, the implementation couldn't survive contact with the open web.

Google shipped Shared Dictionary Compression for HTTP (SDCH) in Chrome in 2008. It worked well with some early adopters reporting double-digit improvements in page load times. But SDCH accumulated problems faster than anyone was able to fix them.

The most memorable was a class of compression side-channel attacks (CRIME, BREACH). Researchers showed that if an attacker could inject content alongside something sensitive that gets compressed (like a session cookie, token, etc.) the size of the compressed output could leak information about the secret. The attacker could guess a byte at a time, watch whether the asset size shrank, and repeat until they extracted the whole secret.

But security wasn't the only problem, or even the main reason why adoption didn’t happen. SDCH surfaced a few architectural problems like violating the Same-Origin Policy (which ironically is partially why it performed so well). Its cross-origin dictionary model couldn't be reconciled with CORS, and it lacked some specification regarding interactions with things like the Cache API. After a while it became clear that adoption wasn’t ready, so in 2017 Chrome (the only browser supporting at the time) unshipped it.

Getting the web community to pick up the baton took a decade, but it was worth it.

The modern standard, RFC 9842: Compression Dictionary Transport, closes key design gaps that made SDCH untenable. For example, it enforces that an advertised dictionary is only usable on responses from the same-origin, preventing many conditions that made side-channel compression attacks possible.

Chrome and Edge have shipped support with Firefox working to follow. The standard is moving toward broad adoption, but complete cross-browser support is still catching up.

The RFC mitigates the security problems but dictionary transport has always been complex to implement. An origin may have to generate dictionaries, serve them with the right headers, check every request for an Available-Dictionary match, delta-compress the response on the fly, and fall back gracefully when a client doesn't have a dictionary. Caching gets complex too. Responses vary on both encoding and dictionary hash, so every dictionary version creates a separate cache variant. Mid-deploy, you have clients with the old dictionary, clients with the new one, and clients with none. Your cache is storing separate copies for each. Hit rates drop, storage climbs, and the dictionaries themselves have to stay fresh under normal HTTP caching rules.

This complexity is a coordination problem. And exactly the kind of thing that belongs at the edge. A CDN already sits in front of every request, already manages compression, and already handles cache variants (watch this space for a soon-to-come announcement blog).

How Cloudflare is building shared dictionary support

Shared dictionary compression touches every layer of the stack between the browser and the origin. We've seen strong customer interest: some people have already built their own implementations like RFC author Patrick Meenan's dictionary-worker, which runs the full dictionary lifecycle inside a Cloudflare Worker using WASM-compiled Zstandard (as an example). We want to make this accessible to everyone and as easy as possible to implement. So we’re rolling it out across the platform in three phases, starting with the plumbing.

Phase 1: Passthrough support is currently in active development. Cloudflare forwards the headers and encodings that shared dictionaries require like Use-As-Dictionary, Available-Dictionary, and the dcb and dcz content encodings, without stripping, modifying, or recompressing them. The Cache keys are extended to vary on Available-Dictionary and Accept-Encoding so dictionary-compressed responses are cached correctly. This phase serves customers who manage their own dictionaries at the origin.

We plan to have an open beta of Phase 1 ready by April 30, 2026. To use it, you'll need to be on a Cloudflare zone with the feature enabled, have an origin that serves dictionary-compressed responses with the correct headers (Use-As-Dictionary, Content-Encoding: dcb or dcz, Vary on Accept-Encoding, Available-Dictionary), and your visitors need to be on a browser that can use dictionary transport. Today, that means Chrome 130+ and Edge 130+, with Firefox support in progress.

Keep your eyes fixed on the changelog for when this becomes available and more documentation for how to use it.

We’ve already started testing passthrough internally. In a controlled test, we deployed two js bundles in sequence. They were nearly identical except for a few localized changes between the versions representing successive deploys of the same web application. 200 tests were run hitting our San Jose, California PoP with an origin located in Council Bluffs, Iowa serving the dictionary, JS bundle, and dictionary-compressed bundle (note that the dcz compressed bundle was precomputed). For each request we captured TTFB and total download time via curl.

Uncompressed, the asset is 272KB. Gzip brought that down to 92.2KB, a solid 66% reduction. With shared dictionary compression over DCZ, using the previous version as the dictionary, that same asset dropped to 2.6KB. That's a 99% reduction from the uncompressed asset, and still 97% smaller than gzip.

In the same lab test, we measured two timing milestones from the client: time to first byte (TTFB) and full download completion. The timing difference is stark. At P99 on a cache miss, DCZ TTFB is roughly 60 ms faster than gzip. On a cache hit the TTFB gap narrows to a negligible 10ms.

Download completion is where it matters. "Transfer" in the graphs indicates total time curl spent receiving the response minus TTFB. On a cache miss, this time for the DCZ response is 1ms versus 161ms for gzip (99.4% faster - the body essentially arrived alongside the headers). On a cache hit, 1ms versus 54ms (98% faster). The localized changes between versions were already captured by the dictionary, which is exactly the point: for successive deploys of roughly the same application, shared dictionaries eliminate nearly all redundant transfer.

Initial lab results simulating minimal JS bundle diffs, results will vary based on the actual delta between the dictionary and the asset.

Phase 2: This is where Cloudflare starts doing the work for you. Instead of handling dictionary headers, compression, and fallback logic on the origin, in this phase you tell Cloudflare which assets should be used as dictionaries via a rule and we manage the rest for you. We inject the Use-As-Dictionary headers, store the dictionary bytes, delta-compress new versions against old ones, and serve the right variant to each client. Your origin serves normal responses. Any dictionary complexity moves off your infrastructure and onto ours.

To demonstrate this, we've built a live demo to show what this looks like in practice. Try it here: Can I Compress (with Dictionaries)?

The demo deploys a new ~94KB JavaScript bundle every minute, meant to mimic a typical production single page application bundle. The bulk of the code is static between deploys; only a small configuration block changes each time, which also mirrors real-world deploys where most of the bundle is unchanged framework and library code. When the first version loads, Cloudflare's edge stores it as a dictionary. When the next deploy arrives, the browser sends the hash of the version it already has, and the edge delta-compresses the new bundle against it. The result: 94KB compresses to roughly 450 bytes. That's a 99.5% reduction over gzip, because the only thing on the wire is the actual diff.

The demo site includes walkthroughs so you can verify the compression ratios on your own via curl, your browser, or your agent of choice.

Phase 3: The dictionary is automatically generated on behalf of the website. Instead of customers specifying which assets to use as dictionaries, Cloudflare identifies them automatically. Our network already sees every version of every resource that flows through it, which includes millions of sites, billions of requests, and every new deployment. The idea is that when the network observes a URL pattern where successive responses share most of their content, it has a strong signal that the resource is a good candidate for delta compression. If safe to do so, it stores the previous version as a dictionary and compresses subsequent versions against it. No customer configuration. No maintenance.

This is a simple idea, but is genuinely hard. Safely generating dictionaries that avoid revealing private data and identifying traffic for which dictionaries will offer the most benefit are real engineering problems. But Cloudflare has the right pieces: we see the traffic patterns across the entire network, we already manage the cache layer where dictionaries need to live, and our RUM beacon to clients can help give us a validation loop to confirm that a dictionary actually improves compression before we commit to serving it. The combination of traffic visibility, edge storage, and synthetic testing is what makes automatic generation feasible, though there are still many pieces to figure out.

The performance and bandwidth benefits of phase 3 are the crux of our motivation. This is what makes shared dictionaries accessible to everyone using Cloudflare, including the millions of zones that would never have had the engineering time to implement custom dictionaries manually.

The bigger picture

For most of the web's history, compression was stateless. Every response was compressed as if the client had never seen anything before. Shared dictionaries change that: they give compression a memory.

That matters more now than it would have five years ago. Agentic coding tools are compressing the interval between deploys, while also driving a growing share of the traffic that consumes them. While today AI tools can produce massive diffs, agents are gaining more context and becoming surgical in their code changes. This, coupled with more frequent releases and more automated clients means more redundant bytes on every request. Delta compression helps both sides of that equation by reducing the number of bytes per transfer, and the number of transfers that need to happen at all.

Shared Dictionaries took decades to standardize. Cloudflare is helping to build the infrastructure to make it work for every client that touches your site, human or not. Phase 1 beta opens April 30 and we’re excited for you to try it.

_____

^{1Bots =}^~31.3%^{of all HTTP requests. AI = ~}^29-30%^{of all Bot traffic (March 2026).}

Fixing request smuggling vulnerabilities in Pingora OSS deployments

Edward Wang — Mon, 09 Mar 2026 14:00:00 GMT

In December 2025, Cloudflare received reports of HTTP/1.x request smuggling vulnerabilities in the Pingora open source framework when Pingora is used to build an ingress proxy. Today we are discussing how these vulnerabilities work and how we patched them in Pingora 0.8.0.

The vulnerabilities are CVE-2026-2833, CVE-2026-2835, and CVE-2026-2836. These issues were responsibly reported to us by Rajat Raghav (xclow3n) through our Bug Bounty Program.

Cloudflare’s CDN and customer traffic were not affected, our investigation found. No action is needed for Cloudflare customers, and no impact was detected.

Due to the architecture of Cloudflare’s network, these vulnerabilities could not be exploited: Pingora is not used as an ingress proxy in Cloudflare’s CDN.

However, these issues impact standalone Pingora deployments exposed to the Internet, and may enable an attacker to:

Bypass Pingora proxy-layer security controls
Desync HTTP request/responses with backends for cross-user hijacking attacks (session or credential theft)
Poison Pingora proxy-layer caches retrieving content from shared backends

We have released Pingora 0.8.0 with fixes and hardening. While Cloudflare customers were not affected, we strongly recommend users of the Pingora framework to upgrade as soon as possible.

What was the vulnerability?

The reports described a few different HTTP/1 attack payloads that could cause desync attacks. Such requests could cause the proxy and backend to disagree about where the request body ends, allowing a second request to be “smuggled” past proxy‑layer checks. The researcher provided a proof-of-concept to validate how a basic Pingora reverse proxy misinterpreted request body lengths and forwarded those requests to server backends such as Node/Express or uvicorn.

Upon receiving the reports, our engineering team immediately investigated and validated that, as the reporter also confirmed, the Cloudflare CDN itself was not vulnerable. However, the team did also validate that vulnerabilities exist when Pingora acts as the ingress proxy to shared backends.

By design, the Pingora framework does allow edge case HTTP requests or responses that are not strictly RFC compliant, because we must accept this sort of traffic for customers with legacy HTTP stacks. But this leniency has limits to avoid exposing Cloudflare itself to vulnerabilities.

In this case, Pingora had non-RFC-compliant interpretations of request bodies within its HTTP/1 stack that allowed these desync attacks to exist. Pingora deployments within Cloudflare are not directly exposed to ingress traffic, and we found that production traffic that arrived at Pingora services were not subject to these misinterpretations. Thus, the attacks were not exploitable on Cloudflare traffic itself, unlike a previous Pingora smuggling vulnerability disclosed in May 2025.

We’ll explain, case-by-case, how these attack payloads worked.

1. Premature upgrade without 101 handshake

The first report showed that a request with an Upgrade header value would cause Pingora to pass through subsequent bytes on the HTTP connection immediately, before the backend had accepted an upgrade (by returning 101 Switching Protocols). The attacker could thus pipeline a second HTTP request after the upgrade request on the same connection:

GET / HTTP/1.1
Host: example.com
Upgrade: foo


GET /admin HTTP/1.1
Host: example.com

Pingora would parse only the initial request, then treat the remaining buffered bytes as the “upgraded” stream and forward them directly to the backend in a “passthrough” mode due to the Upgrade header (until the response was received).

This is not at all how the HTTP/1.1 Upgrade process per RFC 9110 is intended to work. The subsequent bytes should only be interpreted as part of an upgraded stream if a 101 Switching Protocols header is received, and if a 200 OK response is received instead, the subsequent bytes should continue to be interpreted as HTTP.

^{An attacker that sends an Upgrade request, then pipelines a partial HTTP request may cause a desync attack. Pingora will incorrectly interpret both as the same upgraded request, even if the backend server declines the upgrade with a 200.}

Via the improper pass-through, a Pingora deployment that received a non-101 response could still forward the second partial HTTP request to the upstream as-is, bypassing any Pingora user‑defined ACL-handling or WAF logic, and poison the connection to the upstream so that a subsequent request from a different user could improperly receive the /admin response.

^{After the attack payload, Pingora and the backend server are now “desynced.” The backend server will wait until it thinks the rest of the partial /attack request header that Pingora forwarded is complete. When Pingora forwards a different user’s request, the two headers are combined from the backend server’s perspective, and the attacker has now poisoned the other user’s response.}

We’ve since patched Pingora to switch the interpretation of subsequent bytes only once the upstream responds with 101 Switching Protocols.

We verified Cloudflare was not affected for two reasons:

The ingress CDN proxies do not have this improper behavior.
The clients to our internal Pingora services do not attempt to pipeline HTTP/1 requests. Furthermore, the Pingora service these clients talk directly with disables keep-alive on these Upgrade requests by injecting a Connection: close header; this prevents additional requests that would be sent — and subsequently smuggled — over the same connection.

2. HTTP/1.0, close-delimiting, and transfer-encoding

The reporter also demonstrated what appeared to be a more classic “CL.TE” desync-type attack, where the Pingora proxy would use Content-Length as framing while the backend would use Transfer-Encoding as framing:

GET / HTTP/1.0
Host: example.com
Connection: keep-alive
Transfer-Encoding: identity, chunked
Content-Length: 29

0

GET /admin HTTP/1.1
X:

In the reporter’s example, Pingora would treat all subsequent bytes after the first GET / request header as part of that request’s body, but the node.js backend server would interpret the body as chunked and ending at the zero-length chunk. There are actually a few things going on here:

Pingora’s chunked encoding recognition was quite barebones (only checking for whether Transfer-Encoding was “chunked”) and assumed that there could only be one encoding or Transfer-Encoding header. But the RFC only mandates that the final encoding must be chunked to apply chunked framing. So per RFC, this request should have a chunked message body (if it were not HTTP/1.0 — more on that below).
Pingora was also not actually using the Content-Length (because the Transfer-Encoding overrode the Content-Length per RFC). Because of the unrecognized Transfer-Encoding and the HTTP/1.0 version, the request body was instead treated as close-delimited (which means that the response body’s end is marked by closure of the underlying transport connection). An absence of framing headers would also trigger the same misinterpretation on HTTP/1.0. Although response bodies are allowed to be close-delimited, request bodies are never close-delimited. In fact, this clarification is now explicitly called out as a separate note in RFC 9112.
This is an HTTP/1.0 request that did not define Transfer-Encoding. The RFC mandates that HTTP/1.0 requests containing Transfer-Encoding must “treat the message as if the framing is faulty” and close the connection. Parsers such as the ones in nginx and hyper just reject these requests to avoid ambiguous framing.

^{When an attacker pipelines a partial HTTP request header after the HTTP/1.0 + Transfer-Encoding request, Pingora would incorrectly interpret that partial header as part of the same request, rather than as a distinct request. This enables the same kind of desync attack as described in the premature Upgrade example.}

This spoke to a more fundamental misreading of the RFC particularly in terms of response vs. request message framing. We’ve since fixed the improper multiple Transfer-Encoding parsing, adhere strictly to the request length guidelines such that HTTP request bodies can never be considered close-delimited, and reject invalid Content-Length and HTTP/1.0 + Transfer-Encoding request messages. Further protections we’ve added include rejecting CONNECT requests by default because the HTTP proxy logic doesn’t currently treat CONNECT as special for the purposes of CONNECT upgrade proxying, and these requests have special message framing rules. (Note that incoming CONNECT requests are rejected by the Cloudflare CDN.)

When we investigated and instrumented our services internally, we found no requests arriving at our Pingora services that would have been misinterpreted. We found that downstream proxy layers in the CDN would forward as HTTP/1.1 only, reject ambiguous framing such as invalid Content-Length, and only forward a single Transfer-Encoding: chunked header for chunked requests.

3. Cache key construction

The researcher also reported one other cache poisoning vulnerability regarding default CacheKey construction. The naive default implementation factored in only the URI path (without other factors such as host header or upstream server HTTP scheme), which meant different hosts using the same HTTP path could collide and poison each other’s cache.

This would affect users of the alpha proxy caching feature who chose to use the default CacheKey implementation. We have since removed that default, because while using something like HTTP scheme + host + URI makes sense for many applications, we want users to be careful when constructing their cache keys for themselves. If their proxy logic will conditionally adjust the URI or method on the upstream request, for example, that logic likely also must be factored into the cache key scheme to avoid poisoning.

Internally, Cloudflare’s default cache key uses a number of factors to prevent cache key poisoning, and never made use of the previously provided default.

Recommendation

If you use Pingora as a proxy, upgrade to Pingora 0.8.0 at your earliest convenience.

We apologize for the impact this vulnerability may have had on Pingora users. As Pingora earns its place as critical Internet infrastructure beyond Cloudflare, we believe it’s important for the framework to promote use of strict RFC compliance by default and will continue this effort. Very few users of the framework should have to deal with the same “wild Internet” that Cloudflare does. Our intention is that stricter adherence to the latest RFC standards by default will harden security for Pingora users and move the Internet as a whole toward best practices.

Disclosure and response timeline

- 2025‑12‑02: Upgrade‑based smuggling reported via bug bounty.

- 2026‑01‑13: Transfer‑Encoding / HTTP/1.0 parsing issues reported.

- 2026-01-18: Default cache key construction issue reported.

- 2026‑01‑29 to 2026‑02‑13: Fixes validated with the reporter. Work on more RFC-compliance checks continues.

- 2026-02-25: Cache key default removal and additional RFC checks validated with researcher.

- 2026‑03-02: Pingora 0.8.0 released.

- 2026-03-04: CVE advisories published.

Acknowledgements

We thank Rajat Raghav (xclow3n) for the report, detailed reproductions, and verification of the fixes through our bug bounty program. Please see the researcher's corresponding blog post for more information.

We would also extend a heartfelt thank you to the Pingora open source community for their active engagement, issue reports, and contributions to the framework. You truly help us build a better Internet.

Resolving a request smuggling vulnerability in Pingora

Edward Wang — Thu, 22 May 2025 13:00:00 GMT

On April 11, 2025 09:20 UTC, Cloudflare was notified via its Bug Bounty Program of a request smuggling vulnerability (CVE-2025-4366) in the Pingora OSS framework discovered by a security researcher experimenting to find exploits using Cloudflare’s Content Delivery Network (CDN) free tier which serves some cached assets via Pingora.

Customers using the free tier of Cloudflare’s CDN or users of the caching functionality provided in the open source pingora-proxy and pingora-cache crates could have been exposed. Cloudflare’s investigation revealed no evidence that the vulnerability was being exploited, and was able to mitigate the vulnerability by April 12, 2025 06:44 UTC within 22 hours after being notified.

What was the vulnerability?

The bug bounty report detailed that an attacker could potentially exploit an HTTP/1.1 request smuggling vulnerability on Cloudflare’s CDN service. The reporter noted that via this exploit, they were able to cause visitors to Cloudflare sites to make subsequent requests to their own server and observe which URLs the visitor was originally attempting to access.

We treat any potential request smuggling or caching issue with extreme urgency. After our security team escalated the vulnerability, we began investigating immediately, took steps to disable traffic to vulnerable components, and deployed a patch.

We are sharing the details of the vulnerability, how we resolved it, and what we can learn from the action. No action is needed from Cloudflare customers, but if you are using the Pingora OSS framework, we strongly urge you to upgrade to a version of Pingora 0.5.0 or later.

What is request smuggling?

Request smuggling is a type of attack where an attacker can exploit inconsistencies in the way different systems parse HTTP requests. For example, when a client sends an HTTP request to an application server, it typically passes through multiple components such as load balancers, reverse proxies, etc., each of which has to parse the HTTP request independently. If two of the components the request passes through interpret the HTTP request differently, an attacker can craft a request that one component sees as complete, but the other continues to parse into a second, malicious request made on the same connection.

Request smuggling vulnerability in Pingora

In the case of Pingora, the reported request smuggling vulnerability was made possible due to a HTTP/1.1 parsing bug when caching was enabled.

The pingora-cache crate adds an HTTP caching layer to a Pingora proxy, allowing content to be cached on a configured storage backend to help improve response times, and reduce bandwidth and load on backend servers.

HTTP/1.1 supports “persistent connections”, such that one TCP connection can be reused for multiple HTTP requests, instead of needing to establish a connection for each request. However, only one request can be processed on a connection at a time (with rare exceptions such as HTTP/1.1 pipelining). The RFC notes that each request must have a “self-defined message length” for its body, as indicated by headers such as Content-Length or Transfer-Encoding to determine where one request ends and another begins.

Pingora generally handles requests on HTTP/1.1 connections in an RFC-compliant manner, either ensuring the downstream request body is properly consumed or declining to reuse the connection if it encounters an error. After the bug was filed, we discovered that when caching was enabled, this logic was skipped on cache hits (i.e. when the service’s cache backend can serve the response without making an additional upstream request).

This meant on a cache hit request, after the response was sent downstream, any unread request body left in the HTTP/1.1 connection could act as a vector for request smuggling. When formed into a valid (but incomplete) header, the request body could “poison” the subsequent request. The following example is a spec-compliant HTTP/1.1 request which exhibits this behavior:

GET /attack/foo.jpg HTTP/1.1
Host: example.com

content-length: 79

GET / HTTP/1.1
Host: attacker.example.com
Bogus: foo

Let’s say there is a different request to victim.example.com that will be sent after this one on the reused HTTP/1.1 connection to a Pingora reverse proxy. The bug means that a Pingora service may not respect the Content-Length header and instead misinterpret the smuggled request as the beginning of the next request:

GET /attack/foo.jpg HTTP/1.1
Host: example.com

content-length: 79

GET / HTTP/1.1 // <- “smuggled” body start, interpreted as next request
Host: attacker.example.com
Bogus: fooGET /victim/main.css HTTP/1.1 // <- actual next valid req start
Host: victim.example.com

Thus, the smuggled request could inject headers and its URL into a subsequent valid request sent on the same connection to a Pingora reverse proxy service.

CDN request smuggling and hijacking

On April 11, 2025, Cloudflare was in the process of rolling out a Pingora proxy component with caching support enabled to a subset of CDN free plan traffic. This component was vulnerable to this request smuggling attack, which could enable modifying request headers and/or URL sent to customer origins.

As previously noted, the security researcher reported that they were also able to cause visitors to Cloudflare sites to make subsequent requests to their own malicious origin and observe which site URLs the visitor was originally attempting to access. During our investigation, Cloudflare found that certain origin servers would be susceptible to this secondary attack effect. The smuggled request in the example above would be sent to the correct origin IP address per customer configuration, but some origin servers would respond to the rewritten attacker Host header with a 301 redirect. Continuing from the prior example:

GET / HTTP/1.1 // <- “smuggled” body start, interpreted as next request
Host: attacker.example.com
Bogus: fooGET /victim/main.css HTTP/1.1 // <- actual next valid req start
Host: victim.example.com


HTTP/1.1 301 Moved Permanently // <- susceptible victim origin response
Location: https://attacker.example.com/

When the client browser followed the redirect, it would trigger this attack by sending a request to the attacker hostname, along with a Referrer header indicating which URL was originally visited, making it possible to load a malicious asset and observe what traffic a visitor was trying to access.

GET / HTTP/1.1 // <- redirect-following request
Host: attacker.example.com
Referrer: https://victim.example.com/victim/main.css

Upon verifying the Pingora proxy component was susceptible, the team immediately disabled CDN traffic to the vulnerable component on 2025-04-12 06:44 UTC to stop possible exploitation. By 2025-04-19 01:56 UTC and prior to re-enablement of any traffic to the vulnerable component, a patch fix to the component was released, and any assets cached on the component’s backend were invalidated in case of possible cache poisoning as a result of the injected headers.

Remediation and next steps

If you are using the caching functionality in the Pingora framework, you should update to the latest version of 0.5.0. If you are a Cloudflare customer with a free plan, you do not need to do anything, as we have already applied the patch for this vulnerability.

Timeline

All timestamps are in UTC.

2025-04-11 09:20 – Cloudflare is notified of a CDN request smuggling vulnerability via the Bug Bounty Program.
2025-04-11 17:16 to 2025-04-12 03:28 – Cloudflare confirms vulnerability is reproducible and investigates which component(s) require necessary changes to mitigate.
2025-04-12 04:25 – Cloudflare isolates issue to roll out of a Pingora proxy component with caching enabled and prepares release to disable traffic to this component.
2025-04-12 06:44 – Rollout to disable traffic complete, vulnerability mitigated.

Conclusion

We would like to sincerely thank James Kettle & Wannes Verwimp, who responsibly disclosed this issue via our Cloudflare Bug Bounty Program, allowing us to identify and mitigate the vulnerability. We welcome further submissions from our community of researchers to continually improve the security of all of our products and open source projects.

Whether you are a customer of Cloudflare or just a user of our Pingora framework, or both, we know that the trust you place in us is critical to how you connect your properties to the rest of the Internet. Security is a core part of that trust and for that reason we treat these kinds of reports and the actions that follow with serious urgency. We are confident about this patch and the additional safeguards that have been implemented, but we know that these kinds of issues can be concerning. Thank you for your continued trust in our platform. We remain committed to building with security as our top priority and responding swiftly and transparently whenever issues arise.

Open sourcing Pingora: our Rust framework for building programmable network services

Yuchen Wu — Wed, 28 Feb 2024 15:00:11 GMT

Today, we are proud to open source Pingora, the Rust framework we have been using to build services that power a significant portion of the traffic on Cloudflare. Pingora is released under the Apache License version 2.0.

As mentioned in our previous blog post, Pingora is a Rust async multithreaded framework that assists us in constructing HTTP proxy services. Since our last blog post, Pingora has handled nearly a quadrillion Internet requests across our global network.

We are open sourcing Pingora to help build a better and more secure Internet beyond our own infrastructure. We want to provide tools, ideas, and inspiration to our customers, users, and others to build their own Internet infrastructure using a memory safe framework. Having such a framework is especially crucial given the increasing awareness of the importance of memory safety across the industry and the US government. Under this common goal, we are collaborating with the Internet Security Research Group (ISRG) Prossimo project to help advance the adoption of Pingora in the Internet’s most critical infrastructure.

In our previous blog post, we discussed why and how we built Pingora. In this one, we will talk about why and how you might use Pingora.

Pingora provides building blocks for not only proxies but also clients and servers. Along with these components, we also provide a few utility libraries that implement common logic such as event counting, error handling, and caching.

What’s in the box

Pingora provides libraries and APIs to build services on top of HTTP/1 and HTTP/2, TLS, or just TCP/UDS. As a proxy, it supports HTTP/1 and HTTP/2 end-to-end, gRPC, and websocket proxying. (HTTP/3 support is on the roadmap.) It also comes with customizable load balancing and failover strategies. For compliance and security, it supports both the commonly used OpenSSL and BoringSSL libraries, which come with FIPS compliance and post-quantum crypto.

Besides providing these features, Pingora provides filters and callbacks to allow its users to fully customize how the service should process, transform and forward the requests. These APIs will be especially familiar to OpenResty and NGINX users, as many map intuitively onto OpenResty's "*_by_lua" callbacks.

Operationally, Pingora provides zero downtime graceful restarts to upgrade itself without dropping a single incoming request. Syslog, Prometheus, Sentry, OpenTelemetry and other must-have observability tools are also easily integrated with Pingora as well.

Who can benefit from Pingora

You should consider Pingora if:

Security is your top priority: Pingora is a more memory safe alternative for services that are written in C/C++. While some might argue about memory safety among programming languages, from our practical experience, we find ourselves way less likely to make coding mistakes that lead to memory safety issues. Besides, as we spend less time struggling with these issues, we are more productive implementing new features.

Your service is performance-sensitive: Pingora is fast and efficient. As explained in our previous blog post, we saved a lot of CPU and memory resources thanks to Pingora’s multi-threaded architecture. The saving in time and resources could be compelling for workloads that are sensitive to the cost and/or the speed of the system.

Your service requires extensive customization: The APIs that the Pingora proxy framework provides are highly programmable. For users who wish to build a customized and advanced gateway or load balancer, Pingora provides powerful yet simple ways to implement it. We provide examples in the next section.

Let’s build a load balancer

Let's explore Pingora's programmable API by building a simple load balancer. The load balancer will select between https://1.1.1.1/ and https://1.0.0.1/ to be the upstream in a round-robin fashion.

First let’s create a blank HTTP proxy.

pub struct LB();

#[async_trait]
impl ProxyHttp for LB {
    async fn upstream_peer(...) -> Result> {
        todo!()
    }
}

Any object that implements the ProxyHttp trait (similar to the concept of an interface in C++ or Java) is an HTTP proxy. The only required method there is upstream_peer(), which is called for every request. This function should return an HttpPeer which contains the origin IP to connect to and how to connect to it.

Next let’s implement the round-robin selection. The Pingora framework already provides the LoadBalancer with common selection algorithms such as round robin and hashing, so let’s just use it. If the use case requires more sophisticated or customized server selection logic, users can simply implement it themselves in this function.

pub struct LB(Arc>);

#[async_trait]
impl ProxyHttp for LB {
    async fn upstream_peer(...) -> Result> {
        let upstream = self.0
            .select(b"", 256) // hash doesn't matter for round robin
            .unwrap();

        // Set SNI to one.one.one.one
        let peer = Box::new(HttpPeer::new(upstream, true, "one.one.one.one".to_string()));
        Ok(peer)
    }
}

Since we are connecting to an HTTPS server, the SNI also needs to be set. Certificates, timeouts, and other connection options can also be set here in the HttpPeer object if needed.

Finally, let's put the service in action. In this example we hardcode the origin server IPs. In real life workloads, the origin server IPs can also be discovered dynamically when the upstream_peer() is called or in the background. After the service is created, we just tell the LB service to listen to 127.0.0.1:6188. In the end we created a Pingora server, and the server will be the process which runs the load balancing service.

fn main() {
    let mut upstreams = LoadBalancer::try_from_iter(["1.1.1.1:443", "1.0.0.1:443"]).unwrap();

    let mut lb = pingora_proxy::http_proxy_service(&my_server.configuration, LB(upstreams));
    lb.add_tcp("127.0.0.1:6188");

    let mut my_server = Server::new(None).unwrap();
    my_server.add_service(lb);
    my_server.run_forever();
}

Let’s try it out:

curl 127.0.0.1:6188 -svo /dev/null
> GET / HTTP/1.1
> Host: 127.0.0.1:6188
> User-Agent: curl/7.88.1
> Accept: */*
> 
< HTTP/1.1 403 Forbidden

We can see that the proxy is working, but the origin server rejects us with a 403. This is because our service simply proxies the Host header, 127.0.0.1:6188, set by curl, which upsets the origin server. How do we make the proxy correct that? This can simply be done by adding another filter called upstream_request_filter. This filter runs on every request after the origin server is connected and before any HTTP request is sent. We can add, remove or change http request headers in this filter.

async fn upstream_request_filter(…, upstream_request: &mut RequestHeader, …) -> Result<()> {
    upstream_request.insert_header("Host", "one.one.one.one")
}

Let’s try again:

curl 127.0.0.1:6188 -svo /dev/null
< HTTP/1.1 200 OK

This time it works! The complete example can be found here.

Below is a very simple diagram of how this request flows through the callback and filter we used in this example. The Pingora proxy framework currently provides more filters and callbacks at different stages of a request to allow users to modify, reject, route and/or log the request (and response).

Behind the scenes, the Pingora proxy framework takes care of connection pooling, TLS handshakes, reading, writing, parsing requests and any other common proxy tasks so that users can focus on logic that matters to them.

Open source, present and future

Pingora is a library and toolset, not an executable binary. In other words, Pingora is the engine that powers a car, not the car itself. Although Pingora is production-ready for industry use, we understand a lot of folks want a batteries-included, ready-to-go web service with low or no-code config options. Building that application on top of Pingora will be the focus of our collaboration with the ISRG to expand Pingora's reach. Stay tuned for future announcements on that project.

Other caveats to keep in mind:

Today, API stability is not guaranteed. Although we will try to minimize how often we make breaking changes, we still reserve the right to add, remove, or change components such as request and response filters as the library evolves, especially during this pre-1.0 period.
Support for non-Unix based operating systems is not currently on the roadmap. We have no immediate plans to support these systems, though this could change in the future.

How to contribute

Feel free to raise bug reports, documentation issues, or feature requests in our GitHub issue tracker. Before opening a pull request, we strongly suggest you take a look at our contribution guide.

Conclusion

In this blog we announced the open source of our Pingora framework. We showed that Internet entities and infrastructure can benefit from Pingora’s security, performance and customizability. We also demonstrated how easy it is to use Pingora and how customizable it is.

Whether you're building production web services or experimenting with network technologies we hope you find value in Pingora. It's been a long journey, but sharing this project with the open source community has been a goal from the start. We'd like to thank the Rust community as Pingora is built with many great open-sourced Rust crates. Moving to a memory safe Internet may feel like an impossible journey, but it's one we hope you join us on.

Early Hints update: How Cloudflare, Google, and Shopify are working together to build a faster Internet for everyone

Alex Krivit — Thu, 23 Jun 2022 15:59:08 GMT

A few months ago, we wrote a post focused on a product we were building that could vastly improve page load performance. That product, known as Early Hints, has seen wide adoption since that original post. In early benchmarking experiments with Early Hints, we saw performance improvements that were as high as 30%.

Now, with over 100,000 customers using Early Hints on Cloudflare, we are excited to talk about how much Early Hints have improved page loads for our customers in production, how customers can get the most out of Early Hints, and provide an update on the next iteration of Early Hints we’re building.

What Are Early Hints again?

As a reminder, the browser you’re using right now to read this page needed instructions for what to render and what resources (like images, fonts, and scripts) need to be fetched from somewhere else in order to complete the loading of this (or any given) web page. When you decide you want to see a page, your browser sends a request to a server and the instructions for what to load come from the server’s response. These responses are generally composed of a multitude of resources that tell the browser what content to load and how to display it to the user. The servers sending these instructions to your browser often need time to gather up all of the resources in order to compile the whole webpage. This period is known as “server think time.” Traditionally, during the “server think time” the browser would sit waiting until the server has finished gathering all the required resources and is able to return the full response.

Early Hints was designed to take advantage of this “server think time” to send instructions to the browser to begin loading readily-available resources while the server finishes compiling the full response. Concretely, the server sends two responses: the first to instruct the browser on what it can begin loading right away, and the second is the full response with the remaining information. By sending these hints to a browser before the full response is prepared, the browser can figure out what it needs to do to load the webpage faster for the end user.

Early Hints uses the HTTP status code 103 as the first response to the client. The “hints” are HTTP headers attached to the 103 response that are likely to appear in the final response, indicating (with the Link header) resources the browser should begin loading while the server prepares the final response. Sending hints on which assets to expect before the entire response is compiled allows the browser to use this “think time” (when it would otherwise have been sitting idle) to fetch needed assets, prepare parts of the displayed page, and otherwise get ready for the full response to be returned.

Early Hints on Cloudflare accomplishes performance improvements in three ways:

By sending a response where resources are directed to be preloaded by the browser. Preloaded resources direct the browser to begin loading the specified resources as they will be needed soon to load the full page. For example, if the browser needs to fetch a font resource from a third party, that fetch can happen before the full response is returned, so the font is already waiting to be used on the page when the full response returns from the server.
By using preconnect to initiate a connection to places where content will be returned from an origin server. For example, if a Shopify storefront needs content from a Shopify origin to finish loading the page, preconnect will warm up the connection which improves the performance for when the origin returns the content.
By caching and emitting Early Hints on Cloudflare, we make an efficient use of the full waiting period - not just server think time - which includes transit latency to the origin. Cloudflare sits within 50 milliseconds of 95% of the Internet-connected population globally. So while a request is routed to an origin and the final response is being compiled, Cloudflare can send an Early Hint from much closer and the browser can begin loading.

Early Hints is like multitasking across the Internet - at the same time the origin is compiling resources for the final response and making calls to databases or other servers, the browser is already beginning to load assets for the end user.

What’s new with Early Hints?

While developing Early Hints, we’ve been fortunate to work with Google and Shopify to collect data on the performance impact. Chrome provided web developers with experimental access to both preload and preconnect support for Link headers in Early Hints. Shopify worked with us to guide the development by providing test frameworks which were invaluable to getting real performance data.

Today is a big day for Early Hints. Google announced that Early Hints is available in Chrome version 103 with support for preload and preconnect to start. Previously, Early Hints was available via an origin trial so that Chrome could measure the full performance benefit (A/B test). Now that the data has been collected and analyzed, and we’ve been able to prove a substantial improvement to page load, we’re excited that Chrome’s full support of Early Hints will mean that many more requests will see the performance benefits.

That's not the only big news coming out about Early Hints. Shopify battle-tested Cloudflare’s implementation of Early Hints during Black Friday/Cyber Monday 2021 and is sharing the performance benefits they saw during the busiest shopping time of the year:

Today, HTTP 103 Early Hints ships with Chrome 103!
Why is this important for #webperf? How did @Shopify help make all merchant sites faster? (LCP over 500ms faster at p50!) ?
Hint: A little collaboration w/ @cloudflare & @googlechrome pic.twitter.com/Dz7BD4Jplp
— Colin Bendell (@colinbendell) June 21, 2022

While talking to the audience at Cloudflare Connect London last week, Colin Bendell, Director, Performance Engineering at Shopify summarized it best: "when a buyer visits a website, if that first page that (they) experience is just 10% faster, on average there is a 7% increase in conversion". The beauty of Early Hints is you can get that sort of speedup easily, and with Early Hints that can be one click away.

You can see a portion of his talk here:

The headline here is that during a time of vast uncertainty due to the global pandemic, a time when everyone was more online than ever before, when people needed their Internet to be reliably fast — Cloudflare, Google, and Shopify all came together to build and test Early Hints so that the whole Internet would be a faster, better, and more efficient place.

So how much did Early Hints improve performance of customers’ websites?

Performance Improvement with Early Hints

In our simple tests back in September, we were able to accelerate the Largest Contentful Paint (LCP) by 20-30%. Granted, this result was on an artificial page with mostly large images where Early Hints impact could be maximized. As for Shopify, we also knew their storefronts were particularly good candidates for Early Hints. Each mom-and-pop.shop page depends on many assets served from cdn.shopify.com - speeding up a preconnect to that host should meaningfully accelerate loading those assets.

But what about other zones? We expected most origins already using Link preload and preconnect headers to see at least modest improvements if they turned on Early Hints. We wanted to assess performance impact for other uses of Early Hints beyond Shopify’s.

However, getting good data on web page performance impact can be tricky. Not every 103 response from Cloudflare will result in a subsequent request through our network. Some hints tell the browser to preload assets on important third-party origins, for example. And not every Cloudflare zone may have Browser Insights enabled to gather Real User Monitoring data.

Ultimately, we decided to do some lab testing with WebPageTest of a sample of the most popular websites (top 1,000 by request volume) using Early Hints on their URLs with preload and preconnect Link headers. WebPageTest (which we’ve written about in the past) is an excellent tool to visualize and collect metrics on web page performance across a variety of device and connectivity settings.

Lab Testing

In our earlier blog post, we were mainly focused on Largest Contentful Paint (LCP), which is the time at which the browser renders the largest visible image or text block, relative to the start of the page load. Here we’ll focus on improvements not only to LCP, but also FCP (First Contentful Paint), which is the time at which the browser first renders visible content relative to the start of the page load.

We compared test runs with Early Hints support off and on (in Chrome), across four different simulated environments: desktop with a cable connection (5Mbps download / 28ms RTT), mobile with 3G (1.6Mbps / 300ms RTT), mobile with low-latency 3G (1.6Mbps / 150ms RTT) and mobile with 4G (9Mbps / 170ms RTT). After running the tests, we cleaned the data to remove URLs with no visual completeness metrics or less than five DOM elements. (These usually indicated document fragments vs. a page a user might actually navigate to.) This gave us a final sample population of a little more than 750 URLs, each from distinct zones.

In the box plots below, we’re comparing FCP and LCP percentiles between the timing data control runs (no Early Hints) and the runs with Early Hints enabled. Our sample population represents a variety of zones, some of which load relatively quickly and some far slower, thus the long whiskers and string of outlier points climbing the y-axis. The y-axis is constrained to the max p99 of the dataset, to ensure 99% of the data are reflected in the graph while still letting us focus on the p25 / p50 / p75 differences.

The relative shift in the box plot quantiles suggest we should expect modest benefits for Early Hints for the majority of web pages. By comparing FCP / LCP percentage improvement of the web pages from their respective baselines, we can quantify what those median and p75 improvements would look like:

A couple observations:

From the p50 values, we see that for 50% of web pages on desktop, Early Hints improved FCP by more than 9.47% and LCP by more than 6.03%. For the p75, or the upper 25%, FCP improved by more than 20.4% and LCP by more than 15.97%.
The sizable improvements in First Contentful Paint suggest many hints are for render-blocking assets (such as critical but dynamic stylesheets and scripts that can’t be embedded in the HTML document itself).
We see a greater percentage impact on desktop over cable and on mobile over 4G. In theory, the impact of Early Hints is bounded by the load time of the linked asset (i.e. ideally we could preload the entire asset before the browser requires it), so we might expect the FCP / LCP reduction to increase in step with latency. Instead, it appears to be the other way around. There could be many variables at play here - for example, the extra bandwidth the 4G connection provides seems to be more influential than the decreased latency between the two 3G connection settings. Likely that wider bandwidth pipe is especially helpful for URLs we observed that preloaded larger assets such as JS bundles or font files. We also found examples of pages that performed consistently worse on lower-grade connections (see our note on “over-hinting” below).
Quite a few sample zones cached their HTML pages on Cloudflare (~15% of the sample). For CDN cache hits, we’d expect Early Hints to be less influential on the final result (because the “server think time” is drastically shorter). Filtering them out from the sample, however, yielded almost identical relative improvement metrics.

The relative distributions between control and Early Hints runs, as well as the per-site baseline improvements, show us Early Hints can be broadly beneficial for use cases beyond Shopify’s. As suggested by the p75+ values, we also still find plenty of case studies showing a more substantial potential impact to LCP (and FCP) like the one we observed from our artificial test case, as indicated from these WebPageTest waterfall diagrams:

These diagrams show the network and rendering activity on the same web page (which, bucking the trend, had some of its best results over mobile – 3G settings, shown here) for its first ten resources. Compare the WebPageTest waterfall view above (with Early Hints disabled) with the waterfall below (Early Hints enabled). The first green vertical line in each indicates First Contentful Paint. The page configures Link preload headers for a few JS / CSS assets, as well as a handful of key images. When Early Hints is on, those assets (numbered 2 through 9 below) get a significant head start from the preload hints. In this case, FCP and LCP improved by 33%!

Early Hints Best Practices and Strategies for Better Performance

The effect of Early Hints can vary widely on a case-by-case basis. We noticed particularly successful zones had one or more of the following:

Preconnect Link headers to important third-party origins (e.g. an origin hosting the pages’ assets, or Google Fonts).
Preload Link headers for a handful of critical render-blocking resources.
Scripts and stylesheets split into chunks, enumerated in preload Links.
A preload Link for the LCP asset, e.g. the featured image on a blog post.

It’s quite possible these strategies are already familiar to you if you work on web performance! Essentially the best practices that apply to using Link headers or elements in the HTML also apply to Early Hints. That is to say: if your web page is already using preload or preconnect Link headers, using Early Hints should amplify those benefits.

A cautionary note here: while it may be safer to aggressively send assets in Early Hints versus Server Push (as the hints won’t arbitrarily send browser-cached content the way Server Push might), it is still possible to _over-_hint non-critical assets and saturate network bandwidth in a similar manner to overpushing. For example, one page in our sample listed well over 50 images in its 103 response (but not one of its render-blocking JS scripts). It saw improvements over cable, but was consistently worse off in the higher latency, lower bandwidth mobile connection settings.

Google has great guidelines for configuring Link headers at your origin in their blog post. As for emitting these Links as Early Hints, Cloudflare can take care of that for you!

How to enable on Cloudflare

To enable Early Hints on Cloudflare, simply sign in to your account and select the domain you’d like to enable it on.
Navigate to the Speed Tab of the dashboard.
Enable Early Hints.

Enabling Early Hints means that we will harvest the preload and preconnect Link headers from your origin responses, cache them, and send them as 103 Early Hints for subsequent requests so that future visitors will be able to gain an even greater performance benefit.

For more information about our Early Hints feature, please refer to our announcement post or our documentation.

Smart Early Hints update

In our original blog post, we also mentioned our intention to ship a product improvement to Early Hints that would generate the 103 on your behalf.

Smart Early Hints will generate Early Hints even when there isn’t a Link header present in the origin response from which we can harvest a 103. The goal is to be a no-code/configuration experience with massive improvements to page load. Smart Early Hints will infer what assets can be preloaded or prioritized in different ways by analyzing responses coming from our customer’s origins. It will be your one-button web performance guru completely dedicated to making sure your site is loading as fast as possible.

This work is still under development, but we look forward to getting it built before the end of the year.

Try it out!

The promise Early Hints holds has only started to be explored, and we’re excited to continue to build products and features and make the web performance reliably fast.

We’ll continue to update you along our journey as we develop Early Hints and look forward to your feedback (special thanks to the Cloudflare Community members who have already been invaluable) as we move to bring Early Hints to everyone.

Third Time’s the Cache, No More

Edward Wang — Fri, 19 Mar 2021 12:00:00 GMT

Caching is a big part of how Cloudflare CDN makes the Internet faster and more reliable. When a visitor to a customer’s website requests an asset, we retrieve it from the customer’s origin server. After that first request, in many cases we cache that asset. Whenever anyone requests it again, we can serve it from one of our data centers close to them, dramatically speeding up load times.

Did you notice the small caveat? We cache after the first request in many cases, not all. One notable exception since 2010 up until now: requests with query strings. When a request came with a query string (think https://example.com/image.jpg?width=500; the ?width=500 is the query string), we needed to see it a whole three times before we would cache it on our default cache level. Weird!

This is a short tale of that strange exception, why we thought we needed it, and how, more than ten years later, we showed ourselves that we didn’t.

Two MISSes too many

To see the exception in action, here’s a command we ran a couple weeks ago. It requests an image hosted on example.com five times and prints each response’s CF-Cache-Status header. That header tells us what the cache did while serving the request.

❯ for i in {1..5}; do curl -svo /dev/null example.com/image.jpg 2>&1 | grep -e 'CF-Cache-Status'; sleep 3; done
< CF-Cache-Status: MISS
< CF-Cache-Status: HIT
< CF-Cache-Status: HIT
< CF-Cache-Status: HIT
< CF-Cache-Status: HIT

The MISS means that we couldn’t find the asset in the cache and had to retrieve it from the origin server. On the HITs, we served the asset from cache.

Now, just by adding a query string to the same request (?query=val):

❯ for i in {1..5}; do curl -svo /dev/null example.com/image.jpg\?query\=val 2>&1 | grep -e 'CF-Cache-Status'; sleep 3; done
< CF-Cache-Status: MISS
< CF-Cache-Status: MISS
< CF-Cache-Status: MISS
< CF-Cache-Status: HIT
< CF-Cache-Status: HIT

There they are - three MISSes, meaning two extra trips to the customer origin!

We traced this surprising behavior back to a git commit from Cloudflare’s earliest days in 2010. Since then, it has spawned chat threads, customer escalations, and even an internal Solutions Engineering blog post that christened it the “third time’s the charm” quirk.

It would be much less confusing if we could make the query string behavior align with the others, but why was the quirk here to begin with?

Unpopular queries

From an engineering perspective, forcing three MISSes for query strings can make sense as a way to protect ourselves from unnecessary disk writes.

That’s because for many requests with query strings in the URL, we may never see that URL requested again.

Some query strings simply pass along data to the origin server that may not actually result in a different response from the base URL, for example, a visitor’s browser metadata. We end up caching the same response behind many different, potentially unique, cache keys.
Some query strings are intended to bypass caches. These “cache busting” requests append randomized query strings in order to force pulling from the origin, a strategy that some ad managers use to count impressions.

Unpopular requests are not a new problem for us. Previously, we wrote about how we use an in-memory transient cache to store requests we only ever see once (dubbed “one-hit-wonders”) so that they are never written to disk. This transient cache, enabled for just a subset of traffic, reduces our disk writes by 20-25%.

Was this behavior from 2010 giving us similar benefits? Since then, our network has grown from 10,000 Internet properties to 25 million. It was time to re-evaluate. Just how much would our disk writes increase if we cached on the first request? How much better would cache hit ratios be?

Hypotheses: querying the cache

Good news: our metrics showed only around 3.5% of requests use the default cache level with query strings. Thus, we shouldn’t expect more than a few percentage points difference in disk writes. Less good: cache hit rate shouldn’t increase much either.

On the other hand, we also found that a significant portion of these requests are images, which could potentially take up lots of disk space and hike up cache eviction rates. Enough napkin math - we needed to start assessing real world data.

A/B testing: caching the queries

We were able to validate our hypotheses using an A/B test, where half the machines in a data center use the old behavior and half do not. This enabled us to control for Internet traffic fluctuations over time and get more precise impact numbers.

Other considerations: we limited the test to one data center handling a small slice of our traffic to minimize the impact of any negative side effects. We also disabled transient cache, which would nullify some of the I/O costs we expected to observe. By turning transient cache off for the experiment, we should see the upper bound of the disk write increase.

We ran this test for a couple days and, as hypothesized, found a minor but acceptable increase in disk writes per request to the tune of 2.5%.

What’s more, we saw a modest hit rate increase of around +3% on average for our Enterprise customers. More hits meant fewer trips to origin servers and thus bandwidth savings for customers: in this case, a -5% decrease in total bytes served from origin.

Note that we saw hit rate increases for certain customers across all plan types. Those with the biggest boost had a variety of query strings across different visitor populations (e.g. appending query strings such as ?resize=100px:*&output-quality=60 to optimize serving images for different screen sizes). A few additional HITs for many different, but still popular, cache keys really added up.

Revisiting old assumptions

What the aggregate hit rate increase implied was that for a significant portion of customers, our old query string behavior was too defensive. We were overestimating the proportion of query string requests that were one-hit-wonders.

Given that these requests are a relatively small percentage of total traffic, we didn’t expect to see massive hit rate improvements. In fact, if the assumptions about many query string requests being one-hit-wonders were correct, we should hardly be seeing any difference at all. (One-hit-wonders are actually “one MISS wonders” in the context of hit rate.) Instead, we saw hit rate increases proportional to the affected traffic percentage. This, along with the minor I/O cost increases, signaled that the benefits of removing the behavior would outweigh the benefits of keeping it.

Using a data-driven approach, we determined our caution around caching query string requests wasn’t nearly as effective at preventing disk writes of unpopular requests compared to our newer efforts, such as transient cache. It was a good reminder for us to re-examine past assumptions from time-to-time and see if they still held.

Post-experiment, we gradually rolled out the change to verify that our cost expectations were met. At this time, we’re happy to report that we’ve left the “third time’s the charm” quirk behind for the history books. Long live “cache at first sight.”