The Cloudflare Blog

Your Worker can now have its own cache in front of it

Dan Lapid — Mon, 06 Jul 2026 13:00:00 GMT

Today we are launching Workers Cache: a tiered cache that sits in front of your Worker, configured by a single line of Wrangler config and the same Cache-Control headers you already know.

When Workers Cache is enabled, every cacheable request to your Worker hits Cloudflare's cache first. If there's a fresh cached response, Cloudflare returns it directly — your Worker doesn't run, and you don't pay CPU time for it. On a miss, your Worker runs, and if your response is cacheable, Cloudflare stores it for the next request. The next request from anywhere on Earth can be served straight from cache.

The whole thing is one config block:

{
  "name": "my-worker",
  "main": "src/index.ts",
  "compatibility_date": "2026-05-01",
  "cache": {
    "enabled": true
  }
}

After that, you control caching the way HTTP has always wanted you to — by setting headers on your responses:

return new Response(body, {
  headers: {
    "Cache-Control": "public, max-age=300, stale-while-revalidate=3600",
    "Cache-Tag": "products,product:123",
  },
});

And when content changes, your Worker purges its own cache:

await ctx.cache.purge({ tags: ["product:123"] });

That's the whole API. There is no zone to configure, no rules engine to set up, no separate cache to provision, and no second product to log into. The Worker's code is the configuration surface, and the cache follows the Worker wherever it runs — on a custom domain, on workers.dev, behind a service binding, in a preview, in a Workers for Platforms tenant. One Worker, one cache, configured once.

That's the surface area. There’s a lot underneath: tiered caching across our entire network, full support for stale-while-revalidate so stale responses never block a user, content negotiation via Vary, multi-tenant-safe cache keys via ctx.props, programmatic purges by tag or path prefix, and — the part we think is the biggest unlock — a cache that sits in front of every Worker entrypoint, not just the public one, with per-entrypoint control over which ones cache and which don't. That last piece means you can compose caching directly into the structure of your app: a chain of entrypoints with cache stages slotted in wherever you want them, configured by the code on either side. We'll walk through all of it below.

Workers Cache is available today to every Worker on any plan, enabled in Wrangler.

This is the caching API we've always wanted Workers to have. Here's why it took us this long, what becomes possible because of it, and what's coming next.

Why server-rendered apps need a cache in front

When we introduced Workers in 2017, the pitch was that you could run code on Cloudflare's network to transform requests on their way to your origin. The Worker sat in front of the cache and the origin:

This was the right model for the use cases we were targeting. If you wanted to add a header to every request, rewrite a URL, do an A/B split, or filter traffic before it reached your origin, putting the Worker in front of the cache and the origin gave you full control over what got cached and what didn't. Customers built incredible things with it.

But the world changed. Workers stopped being a thing you bolted onto an origin and started being the origin. Frameworks like Astro, TanStack Start, Next.js, Remix, and SvelteKit all ship a Cloudflare adapter that builds your app as a Worker. There's no origin behind them. The Worker is the server.

When the Worker is the origin, the original architecture has nothing to cache. Every request runs your code, even when the response would be byte-for-byte identical to the one you returned a second ago. The Workers runtime is fast enough that this works — it routinely handles tens of millions of requests per second without breaking a sweat — but "fast enough to render every request" still costs you latency on every page load and CPU time on every invocation. And on a server-rendered app, every page load is, by definition, a render.

Workers Cache flips the architecture. Cloudflare's cache now sits in front of the Worker:

On a cache hit, your Worker doesn't run at all. Cloudflare returns the cached response and your CPU billing stays at zero. On a miss, your Worker runs once, populates the cache, and the next request — from anywhere — gets served from cache without invoking your code.

This is what was missing for server-side rendering on Workers. You used to have to choose between two unsatisfying options:

Prerender everything at build time ("static site generation"). Fast page loads, but every change requires a full rebuild and redeploy. For a docs site with a few thousand pages, that's 5–10 minutes. For a large e-commerce site, it's worse — and the build runs every single time you touch anything.
Render every page on every request. Up-to-date content, but every page load pays the rendering cost and every visitor pays the latency.

Workers Cache gives you a third option: server-render on demand, cache the rendered response, refresh it on a time-to-live (TTL) you choose. The first request to a new page still renders. Every subsequent request, until the cache expires, is served as if the page were static. When the cache expires, the next request triggers a re-render — and with stale-while-revalidate, even that one doesn't wait.

You get the speed of a static site without the build time, and the freshness of server rendering without the cost. No framework-specific machinery like Incremental Static Regeneration. Just HTTP caching, working the way it was designed to work, in front of code that was designed to be the origin.

`stale-while-revalidate` is the part that makes it feel instant

The stale-while-revalidate directive tells Cloudflare that when a cached response expires, it's allowed to serve the stale copy immediately while it refreshes the response in the background. Cloudflare shipped full support for stale-while-revalidate earlier this year, and it's the directive that turns "we cache your Worker" into "your Worker's site feels static."

Without it, the first request after a cache entry expires has to wait for the Worker to render the page from scratch. The user sees that latency. With it, the first request after expiration gets the stale page immediately (with a Cf-Cache-Status: UPDATING header), and the Worker runs in the background to refill the cache. Every user, including the one who triggered the refresh, gets a cache-speed response.

In practice, this looks like:

 export default {
  async fetch(request) {
    const html = await renderPage(request);
    return new Response(html, {
      headers: {
        "Content-Type": "text/html; charset=utf-8",
        // Treat as fresh for 5 minutes; serve stale for up to an hour
        // while a background refresh runs.
        "Cache-Control": "public, max-age=300, stale-while-revalidate=3600",
      },
    });
  },
};

The mental model that makes this click:

Fresh window (max-age): Cloudflare serves the cached response. Your Worker doesn't run.
Stale window (stale-while-revalidate): Cloudflare serves the cached response. Your Worker runs in the background to refresh it. No user waits.
Outside both windows: Cloudflare runs your Worker to generate a fresh response, and the user waits for that one render.

You pick the windows. For a product catalog that updates every few minutes, max-age=300, stale-while-revalidate=3600 means visitors basically never wait, and your Worker still runs often enough to keep content fresh. For a blog archive that almost never changes, max-age=86400, stale-while-revalidate=2592000 means your Worker runs once a day per page.

The first request to a brand-new page is the only one that pays the full render cost. After that, the page behaves like static output for visitors, while your Worker still owns how the page gets generated.

One URL, many representations: `Vary` works

Real apps rarely return the same bytes to every client. The same product page might be HTML for a browser and JSON for an API client. The same image might be WebP for clients that support it and JPEG for the ones that don't. The same homepage might come back in English, French, or Japanese depending on the user.

Doing this without a cache is easy — your Worker just reads the request header and returns the right thing. Doing it with a cache is where it usually gets ugly. Most caches give you two bad options: cache nothing on URLs that have multiple representations, or cache one representation and serve it to everyone.

Workers Cache supports the standard HTTP Vary header, which is the right way to solve this. When your Worker returns a response with Vary: Accept-Encoding (or Accept, or Accept-Language, or any other request header), Cloudflare stores a separate cached variant per distinct combination of those headers — and only returns a variant whose stored values match the incoming request.

export default {
  async fetch(request) {
    const accept = request.headers.get("Accept") ?? "";
    const wantsWebp = accept.includes("image/webp");

    const body = wantsWebp ? await fetchWebpImage() : await fetchJpegImage();

    return new Response(body, {
      headers: {
        "Content-Type": wantsWebp ? "image/webp" : "image/jpeg",
        "Cache-Control": "public, max-age=3600",
        // Cache a separate variant per distinct Accept header value.
        Vary: "Accept",
      },
    });
  },
};

One URL, two cached variants. A browser that sends Accept: image/webp,*/* gets the WebP. A browser that sends Accept: image/jpeg gets the JPEG. Both come from cache. Your Worker writes both variants on the first request to each, and then runs zero times for either after that.

This is the well-trodden HTTP standard for content negotiation, and Workers Cache implements it the way RFC 9110 and RFC 9111 describe. There's no allowlist of what headers you can Vary on. You list whatever you need, and Cloudflare keys variants on the verbatim values. The docs go through the edge cases — how to keep variant fan-out under control by normalizing headers in a gateway Worker, why purges invalidate all variants of a URL together, and the one case (Vary: *) that disables caching entirely.

This is your Worker's cache, not your zone's

Before we get to what becomes possible with all this, there's a conceptual shift worth naming.

Cloudflare has had a cache forever. It's configured at the zone level: Cache Rules, Page Rules, the cached-file-extensions list, Cache Reserve, Tiered Cache topology, custom cache keys. All of it is set per zone, and historically a Worker had to either fit into that zone's configuration or work around it.

Workers Cache is different. It's your Worker's cache — it belongs to the Worker, not to a zone. This has a bunch of consequences that turn out to matter:

There is no zone configuration to manage. Cache Rules, cache level settings, the file-extensions list, Page Rules — none of them apply to Workers Cache. The Worker's Cache-Control headers are the configuration.
The cache follows the Worker, not the hostname. A Worker that's bound to api.example.com, api.example.net, and invoked over a service binding shares one cache across all three. A request to /users/42 hits the same cached entry regardless of which way in it came.
The cache works on workers.dev. It works in preview URLs (each preview gets its own cache, so testing a change doesn't poison production). It works in Workers for Platforms (each user Worker has its own cache, isolated from the dispatcher and from other tenants). All of these used to be second-class citizens for caching. They aren't anymore.
Purges are scoped to the Worker’s entrypoint. When you call ctx.cache.purge({ purgeEverything: true }), you're only purging your Worker entrypoint's cache. No risk of nuking your zone's other content. No risk of one Worker's deploy invalidating another's data.

What you configure about caching, you configure in code: which paths get longer TTLs (branch on the path and set a different max-age), which requests bypass the cache (return Cache-Control: private), how the cache key is shaped (control what gets into ctx.props, normalize the URL in a gateway Worker before dispatching). The Worker you already wrote is the configuration surface.

The full docs go deep on this in Workers Cache: your Worker's cache.

Two tiers, every Worker, no configuration

Workers Cache is regionally tiered by default. There are two layers:

A lower tier in the Cloudflare data center closest to the user. Every data center that receives traffic for your Worker has its own lower-tier cache.
An upper tier that aggregates fills across the whole network. There are fewer of these, and every lower tier consults the upper tier on a miss.

A request hits the lower tier first. On a hit, the response is served and that's the end of it. On a miss, the lower tier asks the upper tier. On a hit there, the response is returned and also stored in the lower tier on the way back. Only if both tiers miss does your Worker actually run — and the response from that run gets stored in both tiers.

The reason this matters is that the first request anywhere in the world populates the upper tier. Every subsequent request, from any data center, can be served from the upper tier without your Worker running — even if the lower tier at that data center has never seen the request before. Cache hit ratios are dramatically higher than they would be with a single flat cache layer, which is exactly what you want when your Worker is the origin.

This is the same topology that powers Tiered Cache for zones today, except you don't configure it. There is no dialog for "turn on tiered cache for my Worker." Every Worker that has caching enabled gets tiering for free.

If your Worker uses Smart Placement, the cache composes cleanly with it: tiers are consulted first, and only if both miss does Smart Placement route execution close to your origin. We have more to say about how those layers interact, including a few rough edges we're planning to smooth out, in the docs.

Run your app near the user and near the data

There's a recurring tension in web performance that nobody has fully resolved: you want your code to run close to the user (because the round-trip between user and server is on the critical path), and you want your code to run close to the data (because every database query is also a round-trip). Pick one, and the other gets slow.

We've spent years chasing both. Our network puts us within ~50ms of about 95% of the world's Internet users. Smart Placement and Placement Hints let you keep your code close to your data without ever having to think about cloud regions. But until now, the two pieces didn't fully compose. You could do "near the user" or "near the data," and if you wanted both halves of your app to be in the right place at the same time, you had to be a Cloudflare expert. We knew we could do better.

Workers Cache is the piece that closes the gap. Because the cache belongs to the Worker (not the zone), and because service bindings and ctx.exports calls between Workers go through the cache, you can build an app as a chain of Workers — each one running where it should run — with the cache as the seam between them.

The architecture looks like this:

Worker A runs near the user. It handles the cheap, latency-sensitive parts of every request: authentication, rate limiting, routing, header normalization, rendering the outer "shell" of an HTML page that doesn't depend on data.
Worker B runs near the data, courtesy of Smart Placement or an explicit Placement Hint. It does the heavy work: server-rendering pages that fetch data, reading product catalogs, generating search results, aggregating APIs, expensive transforms.
Workers Cache sits in front of Worker B. When Worker A calls Worker B over a service binding, Cloudflare checks Worker B's cache first. On a hit, Worker A receives the response and Worker B doesn't run at all — no data-center hop, no database query, no rendering work.

The cache hit path becomes: user → Worker A near the user → cache hit for Worker B → response. The data hop is paid only on a miss. Your hot pages run at the speed of code-in-front-of-the-user, and your cold pages still benefit from running near the data when they do execute.

You don't have to architect anything special to get this. Write your app as two Workers, point one at the other with a service binding, turn caching on in Worker B’s wrangler.jsonc file, and you're done.

Multi-tenant by default, with `ctx.props`

If you're caching a Worker that returns user-specific data — say, an API that serves different content per logged-in user — you need a way to make sure one user can never see another user's cached response. The standard solution is "don't cache authenticated requests," and Cloudflare's automatic bypass for Authorization headers does exactly that. But "don't cache anything" gives up the entire performance win.

Workers Cache solves this by making the caller's ctx.props part of the cache key. When one Worker calls another over a service binding and passes ctx.props with a user ID, tenant ID, or any other identifier, callers with different props get separate cache entries. One user's response can never leak into another user's cache.

import { WorkerEntrypoint } from "cloudflare:workers";

interface Props { userId: string; }

export default class Backend extends WorkerEntrypoint {
  async fetch(request: Request): Promise {
    // ctx.props.userId is part of the cache key. User A and User B
    // requesting the same URL get separate cached entries.
    const { userId } = this.ctx.props;
    const data = await loadUserData(userId);

    return new Response(JSON.stringify(data), {
      headers: {
        "Content-Type": "application/json",
        "Cache-Control": "public, max-age=300",
      },
    });
  }
}

The typical pattern is to authenticate the request in a gateway Worker, strip the Authorization header, set the authenticated user's ID into ctx.props, and then call the cached backend Worker. The gateway runs on every request (it has to, to authenticate), but the expensive backend only runs when there's no cache entry for that user yet. Auth'd APIs go from "uncacheable" to "cached per user with full safety," and the cache key does the isolation for you. The docs walk through this in detail in Multi-tenant safety with ctx.props and the example in Per-user authenticated responses.

Other CDNs make you choose between correctness and hit ratio: key the cache by each user’s token, or send every request back to origin for authorization. Workers Cache lets you share cached API responses at the edge while preserving per-request authorization boundaries. We don’t know of another CDN that offers this as a built-in model for authenticated, multi-tenant APIs. We’re pretty proud of it.

A cache between every Worker entrypoint

Here is the part of Workers Cache that we think is the biggest unlock, and it's the part that's hardest to see if you're thinking about it as "a CDN cache that happens to work in front of Workers."

Workers Cache sits in front of every Worker entrypoint — the default export, every named WorkerEntrypoint, and every call between entrypoints in the same Worker via ctx.exports. That last clause is the one that changes what you can build.

When one entrypoint calls another via ctx.exports, the cache evaluates that call the same way it would evaluate a request from a browser. A hit returns the cached response and the callee never runs. A miss runs the callee and stores its response under its own cache key — keyed by the callee's entrypoint, path, query string, and ctx.props. The caller still runs on every request, but anything it hands off to the callee is memoized independently.

You decide, per entrypoint, which ones cache. In your Wrangler config, the exports map lets you turn caching on or off for each entrypoint by name ("default" is the default export). Opt an entrypoint in to cache the responses it produces; opt one out to keep it running on every request. A gateway or router entrypoint — anything that authenticates, normalizes, or dispatches — should be opted out, so it always runs, and its own output is never served from cache.

That gives you a primitive you can compose. You can author a Worker as a chain of small entrypoints — auth, normalization, routing, the expensive read, the data layer — and let Workers Cache slot in wherever you want it. Each cached entrypoint is a unit of memoization with its own key, its own TTL, and its own tag namespace for purging. Anything you would want to configure about caching — when it runs, what it keys on, when it invalidates — is expressed as ordinary Worker code: which entrypoint you call, what request you forward, what ctx.props you pass, what Cache-Control you set.

To make this concrete, here's a single Worker that does three things you couldn't easily do together on any other platform: it authenticates every request, caches the expensive backend behind a multi-tenant-safe cache key, and invalidates that cache when data changes.

Caching is configured per entrypoint. The gateway must run on every request — both to authenticate and because a cached gateway response would skip that auth check — so we disable caching on the default entrypoint and enable it only on the inner one:

{
  "name": "my-worker",
  "main": "src/index.ts",
  "compatibility_date": "2026-05-01",
  "cache": { "enabled": true },
  "exports": {
    // The gateway runs on every request — don't cache it.
    "default": { "type": "worker", "cache": { "enabled": false } },
    // Cache the expensive inner entrypoint.
    "CachedBackend": { "type": "worker", "cache": { "enabled": true } }
  }
}

import { WorkerEntrypoint } from "cloudflare:workers";

interface Env { API_TOKEN: string; }
interface Props { userId: string; }

// Inner entrypoint: the expensive work. Workers Cache sits in front
// of this — on a hit, this code never runs.
export class CachedBackend extends WorkerEntrypoint {
  async fetch(request: Request): Promise {
    // ctx.props.userId is part of the cache key, so this is cached
    // separately for every user.
    const { userId } = this.ctx.props;
    const data = await loadExpensiveData(userId);

    return new Response(JSON.stringify(data), {
      headers: {
        "Content-Type": "application/json",
        "Cache-Control": "public, max-age=300, stale-while-revalidate=3600",
        "Cache-Tag": `user:${userId}`,
      },
    });
  }

  // Invalidate a user's cached response. purge() is scoped to the
  // entrypoint that calls it, so it must run inside CachedBackend —
  // the entrypoint that owns the cached response.
  async invalidate(userId: string): Promise {
    await this.ctx.cache.purge({ tags: [`user:${userId}`] });
  }
}

// Outer entrypoint: runs on every request to authenticate and route.
// Caching is disabled for it in Wrangler config (above), so it always
// runs and the auth check is never skipped by a cache hit.
export default {
  async fetch(request, env, ctx): Promise {
    const userId = await authenticate(request, env);
    if (!userId) return new Response("Unauthorized", { status: 401 });

    // Invalidate this user's cache on writes, from the entrypoint that
    // owns it.
    if (request.method === "POST") {
      await handleWrite(request, userId);
      await ctx.exports.CachedBackend.invalidate(userId);
      return new Response("OK");
    }

    // For reads: strip Authorization (otherwise Cloudflare's automatic
    // bypass fires and nothing caches), then dispatch to the cached
    // backend with the authenticated user's identity in ctx.props.
    const forwarded = new Request(request);
    forwarded.headers.delete("Authorization");

    return ctx.exports.CachedBackend.fetch(forwarded, {
      props: { userId },
    });
  },
} satisfies ExportedHandler;

The whole thing is one Worker. One source file. One deploy. But there are two execution stages — caching is turned off for the gateway and on for the backend in one small exports block — and a cache sits between them, keyed per user, invalidated by the write path, and serving stale during background refreshes. The cache stage isn't something you bolted on. It's a layer of the program, written in code.

The patterns this composes into are open-ended. The same shape works for:

Caching a Durable Object. Wrap the Durable Object behind an entrypoint, set Cache-Control on the response, and reads stop touching the Durable Object on a hit. Writes go to the DO directly and purge the cache by tag. The DO stays unaware that caching is happening.
Normalizing Accept-Encoding before Vary. The outer entrypoint restores the original encoding from request.cf.clientAcceptEncoding (Cloudflare's front line normalizes it for cache efficiency) and forwards to a cached entrypoint that varies on the real value. Hit ratios stay high; clients get the right encoding.
Stripping tracking parameters before caching. The outer entrypoint canonicalizes the URL — or sets a custom cache key with cf.cacheKey on the ctx.exports call — so the cached inner entrypoint sees only the canonical form, and ?utm_source=anything collapses to a single cache entry.

Stack them. A single Worker can have an outer entrypoint that authenticates and routes, a normalization entrypoint that strips tracking parameters and restores encoding headers, a cached entrypoint that fronts a Durable Object, and a separate cached entrypoint for an unauthenticated public API — each connected by a cache stage you didn't configure, just decided where to put. The Examples page in the docs walks through several of these end-to-end.

We don't know of another platform where you can do this. CDN caches sit in front of an origin. Function platforms run functions. We don't know of another platform that gives you a cache that sits inside a single deployable unit, between the parts of your application, with each cache stage configured by the code on either side of it. That's what Workers Cache is. And because it composes with everything else the platform already gives you — Smart Placement, Durable Objects, service bindings, ctx.props, ctx.exports — the patterns you can build are open-ended. We've barely scratched the surface in this post.

First-class support in your framework

If you're building with Astro, the Cloudflare adapter wires up Workers Cache for you. Just add the cacheCloudflare provider to your configuration:

// astro.config.mjs
import { defineConfig } from "astro/config";
import cloudflare from "@astrojs/cloudflare";
import { cacheCloudflare } from "@astrojs/cloudflare/cache";

export default defineConfig({
  adapter: cloudflare(),
  output: "server",
  experimental: {
    cache: { provider: cacheCloudflare() },
    routeRules: {
      "/products/*": { maxAge: 300, swr: 3600, tags: ["products"] },
      "/blog/*":     { maxAge: 60,  swr: 86400, tags: ["blog"] },
    },
  },
});

The adapter enables the cache, sets the right headers on the responses Astro generates, attaches Cache-Tag values for invalidation, and gives you a cache.invalidate() helper for purging tags when content changes. Astro pages that opt into server rendering automatically get the "render once, cache, refresh in the background" flow described above — no per-route configuration required, no framework-specific runtime layer to learn.

We're working with the maintainers of other frameworks to ship the same integration. If you build a framework adapter for Cloudflare, the Workers Cache APIs are exactly what you'd want them to be — header-driven configuration, programmatic purges, no platform-specific concepts to model.

See your cache on the same dashboard as your Worker

Caching is only useful if you can see what it's doing. The Workers Observability dashboard now surfaces cache hit information per invocation:

You can see, per Worker:

Cache hit ratio over time. The number you want trending up after you enable caching.
Hits, misses, updates, bypasses broken down. If your hit ratio is low, this is where you find out why — too many BYPASS responses (because something is setting a cookie?), too many MISS responses (because the cache key is partitioning more than you thought?), too many UPDATING responses (because max-age is shorter than your traffic interval?).

Because all of this lives on the same dashboard as your Worker's other observability — logs, exceptions, CPU time, request counts — you don't have to context-switch between looking at your zone and your Worker to understand what's happening.

Billing

Cache hits don't run your Worker, and they don't bill CPU time. They do count as a request at the standard Workers request rate, the same as any other invocation. Cache misses and bypasses bill normally — request + CPU time, exactly as they would without caching.

Outcome	Request charge	CPU time charge
Cache `HIT` (Worker does not run)	Standard rate	Not billed
Cache `MISS` (Worker runs)	Standard rate	Billed
Cache `BYPASS` (Worker runs)	Standard rate	Billed
Static asset request	Standard rate	Not billed
Worker-to-worker invocation	Standard rate	Billed if the Worker runs

There's no separate Workers Cache SKU and no per-GB cache storage fee. Tiered caching, purges, stale-while-revalidate, and the analytics described above are all included. If a request would have run your Worker and Workers Cache serves it as a hit instead, you still pay the standard request rate, but you pay no CPU time for that request. Because of this, that cache hit costs less than rendering the same response in your Worker.

One thing to watch: when caching is enabled, requests that are normally free — static asset requests and worker-to-worker invocations through service bindings or ctx.exports — are billed at the standard request rate, because each one now consults the cache in front of your Worker.

What's next

Things we know we want to do next:

Smarter co-location with Smart Placement. Today, Cloudflare chooses the upper-tier cache and Smart Placement target separately. On a full miss, the request may travel between Cloudflare locations twice: once to check the upper tier, and again to run your Worker near its data. We're working to coordinate those choices, so a miss only makes that long-distance trip once.
Larger response size limits. At launch, all responses follow the Free plan’s cacheable size limit (512 MB), regardless of your account. That’s temporary — the standard per-plan cache limits will apply once we finish a few rollout steps.
More framework integrations. Astro has built-in integration with Workers Cache. We’re working with maintainers to add similar integrations to other frameworks, including TanStack Start and Next.js via Vinext.
An API to mark cached responses stale. ctx.cache.purge() removes matching responses from cache. We’re looking at a ctx.cache.invalidate() API that makes matching responses behave as expired, so the next request can still get a fast stale response with stale-while-revalidate while your Worker refreshes the cache in the background.

Try it

Workers Cache is available today to every Worker on any plan.

To get started, add "cache": { "enabled": true } to your wrangler.jsonc, redeploy, and start setting Cache-Control headers. The Workers Cache documentation walks through the full feature surface — including the quickstart, cache keys, purging, composition patterns and examples, and debugging.

Workers used to run in front of the cache. Now they can also run behind it. Use whichever side you need — or, with service bindings, both at once.

We can't wait to see what you build.

Announcing the Monetization Gateway: charge for any resource behind Cloudflare via x402

Rohin Lohe — Wed, 01 Jul 2026 13:00:00 GMT

Today, we are announcing the Cloudflare Monetization Gateway, an engine that will give Cloudflare customers the ability to charge for any asset protected by Cloudflare: web pages, datasets, APIs, or MCP tools.

It will provide a single control plane to manage payment policies and access controls across your applications, while also protecting your origin from high payment volumes by handling payment verification and enforcement at the edge. At launch, payments will settle in stablecoins over x402, the open protocol we are building with a coalition of more than 25 industry leaders via the x402 Foundation.

The evolving business model of the web

For 30 years, the web has run on a simple economic bargain: trading content for human attention. That attention has been monetized through advertising, subscriptions, and e-commerce. This bargain funded the Internet as we know it.

But as agents become the dominant Internet users, the model is breaking. An agent does not look at ads or need to maintain a monthly subscription to all the tools it wants to access. It reads a page or consumes a data feed once, takes what it needs, and moves on. Across the web, AI crawlers already request content anywhere from a hundred to tens of thousands of times for every visitor they send back.

This reality demands a new model: usage-based pricing for everything. If attention and e-commerce are moving from websites to AI harnesses and AI-written software, then agents should pay for the inputs they need — training data, inference content, developer tooling, and API usage. The natural unit of payment for software is the request, the token, or the outcome, not the seat or the month. A few examples of what that could look like:

A few cents per web search, billed per call
\$0.001 base fee plus a \$0.01 per MB charge for an upload endpoint
\$0.99 per resolved support escalation, paid only when the work succeeds

This is the same shift behind paying creators when an answer engine uses their content — a fair exchange of value whenever content or a resource is used, priced on neutral rails built for the purpose. People often envision an agent buying high-priced assets like web domains, but most of what an agent pays for sits upstream of any checkout, and is priced far lower.

Some of the Internet already works this way. Cloud and APIs have been sold by the call and by the hour for years, but only to a known buyer: a user signs up, they are issued an API key, and they incur usage-based metered billing. Content mostly skipped payment and ran on advertising instead. These business models have never been able to serve unverified buyers for sub-cent transactions because the payment rails cost too much and took too long to settle. Below a certain price, collecting the payment cost more than the payment was worth.

Historically, usage-based billing was difficult to implement. Businesses needed to effectively become payments companies, running their own accounting to track internal usage in a robust and auditable way. Tracking this usage required significant overhauls of backend systems. Many instead chose per-seat pricing because it is simpler and frequently more profitable.

Agents flip this dynamic. A single agent can do the work of an entire team around the clock, making a flat one-time fee disconnected from actual consumption. At the same time, an agent can make thousands of micropayments without friction, while asking a person to approve each payment would be impossibly burdensome. Usage-based price points are where agents live and where stablecoin-based micropayments shine. That's because stablecoins (such as Open USD and USDC) allow buyers to transfer tiny sums across the Internet, incurring negligible fees and settling in less than a second. This is not feasible with other payment rails today.

Here’s where we can help. Cloudflare has spent years building usage-based accounting for our own billing systems and for our customers’ analytics. We can dramatically simplify the implementation of usage-based billing for web-based assets thanks to our position as a proxy layer between buyers and sellers. As shown below, with Cloudflare supporting usage-based billing, the evidence of payment can move into the request itself, and the payment validation and the request paths merge.

And here’s the benefit to you: the metering, the payment exchange, and the settlement move off your origin. What stays with you is what matters — your rules, your prices, and your revenue. You will not need to onboard the buyer or stand up a billing system. You will write a rule and agentic buyers will pay for what they use.

A refresher on x402

Last year on Content Independence Day, we gave site owners one-click control over which AI crawlers could reach their content, and with Pay Per Crawl we let them charge crawlers for it. The Monetization Gateway is the next step: instead of only charging crawlers for content, you will be able to charge any caller for any resource, from an API to data to an MCP tool call, and you will not have to build the payment machinery yourself.

x402 is an open protocol that makes it possible to pay over HTTP, named for the 402 status code it finally puts to use. The x402 exchange is simple: a client requests a payment-gated resource. Instead of serving it, the server responds with 402 Payment Required and a small payload that states the price, the accepted asset, and where to pay. The client pays and repeats the request with proof of payment attached. A facilitator verifies, and the server returns the resource. It all happens inside ordinary HTTP requests and responses, with no redirect to a checkout page and no separate payment API to call. Settlement happens peer-to-peer, so any funds that a buyer sends to a seller are directly deposited to the seller’s wallet. We are designing the Monetization Gateway to keep payment overhead low and are aiming for sub-second payment settlement.

^{x402 Payment Flow: AI Agent ↔ APIServer ↔ Blockchain, Source:}^{x402 Readme on GitHub}

Two properties make x402 a good fit for machine payments. The payment amounts can be small, down to fractions of a cent, because the protocol adds almost no overhead. And the buyer needs no account with the seller, because the payment itself is the credential. x402 is rail agnostic, but it is a natural fit for stablecoins, which can settle in under a second for a fraction of a cent with zero chargebacks.

What the Monetization Gateway does

The Monetization Gateway will provide a flexible payment rules API that will allow you to express exactly when you want a caller to pay to access your digital resources.

Here’s how it will work. Tokens, APIs, MCP tool calls, and data already flow through that path. You will decide, as precisely as you want, which of that traffic has to pay. And you will be able to enforce your decisions by writing expressions, similar to expressions that you already write for other Cloudflare rules, in a simple, dedicated product API. The Monetization Gateway will scale with Cloudflare’s global network across 330+ cities, which means that the x402 handshake will occur in close proximity to your buyer. This will reduce request latency and protect your origin.

A few examples of planned capabilities:

Charge for specific REST verbs: Require payment on calls to a specific route, for example $0.01 for every GET or POST request to /api/premium/*.
Variable pricing: Charge variable amounts for tasks of varying complexity, for example, image generation might charge any amount up to $2, depending on the compute used.
Charge only unauthenticated callers: Intercept HTTP 401 "Unauthorized" responses from your origin and return 402 "Payment Required" instead with pricing and payment instructions.

When a request matches, the Monetization Gateway will verify payment before letting it through. You will be able to set these rules in the dashboard, or manage them as code through the Cloudflare API and Terraform, so a paid endpoint is just another part of your infrastructure config.

The Monetization Gateway will initially allow users to require buyers to pay for services and resources in stablecoins. Sellers will be able to use the stablecoins they accumulate for their own transactions or redeem the stablecoins for equivalent fiat currency in their bank account. Using the Monetization Gateway offers a way to increase the addressable market for your products. With the Gateway, agents can request your resource, be told the price, pay, and get the response. No signup, no API key, no prior relationship required. You will decide how much you need to know about that buyer, and you will have the flexibility to require agents to authenticate with Web Bot Auth and apply usage-based pricing against accounts they already hold.

Where we see this going

The Monetization Gateway will turn the request into a payment and give Cloudflare customers new revenue opportunities, but where this goes is far bigger.

An agent is software that acts autonomously on a user’s behalf, and agents are starting to act on their own. Soon they will carry wallets and buy what they need without a person in the loop: a dataset, an API call, a tool, a block of compute. Some of those resources will be free, and some will require proof of who the agent is and who it acts for, through verified agent identity. Many will require both an identity and a payment, and Cloudflare is one of the few places that will be able to settle all of it inside a single request, by verifying the agent, applying the rule, and checking the payment before the origin ever sees the call. The agent becomes the primary buyer on the Internet, and the request becomes the transaction.

There is an enormous amount of value moving across the Internet today that goes unmonetized or undermonetized, not because no one would pay for it, but because the tools to charge for it have never existed. Every useful API call, every answer, every tool invocation an agent makes has value, and almost none of it is paid for today. That is the opportunity in front of us, and it is what the Monetization Gateway will unlock.

This is what we are building toward: an agent-first Internet with Internet-scale settlement built in. Where the people who make something worth paying for get paid by the software that uses it, automatically. And where the smallest new API can reach the same buyers, on the same terms, as the largest company on the web, and the independent creator is paid by the large language models that use their work. That is the next business model of the Internet, and we are building to power it.

Sign up for our waitlist

The Monetization Gateway waitlist is open now for Cloudflare customers. If you’re interested in monetizing your web page, dataset, API, or MCP tool with usage-based pricing, please join our early access list.

Content Independence Day, one year on: building the business model for the agentic Internet

Arielle Weiss — Wed, 01 Jul 2026 13:00:00 GMT

One year ago, we declared Content Independence Day. At the time, we could see what many in the industry were beginning to sense: the fundamental economics of the Internet were shifting. AI adoption was accelerating, publishers were experiencing rapid declines in referral traffic, and AI companies were crawling the web at unprecedented scale, often without clearly declaring intent, and almost always without compensation.

We changed the defaults. For all new domains on Cloudflare, AI training crawlers would be blocked by default unless domain owners chose otherwise. We didn't do this to wall off the web. We did it because we believed a healthier ecosystem required transparency, control, scarcity, and ultimately, a market where high-quality content could be valued and exchanged fairly.

A year later, that market has emerged. But the transformation of the Internet has happened even faster than we anticipated. In this report, we share key data points that illustrate how quickly the business model of the Internet has shifted – and what this new content market means for publishers and site owners.

Part I: The Internet has changed – faster than anyone expected

The vertical adoption curve

AI is not just another technology cycle. It is a platform shift happening at more than 2x the speed that smartphones were adopted. In just 3.5 years, over 30% of humanity — 2.5 billion active users — has adopted regular use of generative AI. The adoption curve isn't merely steep: it's going vertical.

The decline of the open web

Never before have we seen such a rapid change in how humans interact with information, perform work, and spend time online.

The way people use the Internet is changing dramatically. Today, for every hour spent online searching for information, only 15 minutes is spent on the open web. Traditional search behavior is collapsing as users shift to AI-driven discovery and consumption. Instead of visiting multiple sites to source and compare information, users simply type a prompt and receive a nearly instantaneous, consolidated answer.

The agentic Internet is here

This year, agent traffic crossed a historic threshold for the first time: more than 50% of traffic on the Internet is now non-human. This shift has staggering implications for publishers, content owners, and the future of the open web.

Crawlers have changed their purpose

When looking at the crawlers Cloudflare identifies by purpose, the composition of crawler traffic tells the story clearly:

52% of crawler requests are now for AI training as of June 2026, up from 22% in Spring 2025.
Mixed-use crawlers (those blending search, agent use, and training) represent over 36% of activity.
Pure search crawling now represents a small and declining share of overall crawler activity, despite remaining critical for publisher visibility.

As AI training becomes a primary driver of crawler activity, the ability to distinguish between discovery and training becomes increasingly important. Mixed-use crawlers blur that distinction, putting content owners in a difficult position: choose between remaining discoverable in the agentic era, and giving away their most valuable content without compensation.

The old business model is gone

For decades, the economic model of the open web was straightforward. Content creators exchanged access to their content for visibility in search engines, which returned referral traffic. That traffic became the primary mechanism through which publishers, creators, and businesses generated economic value.

But today, that exchange is breaking down. Content is still being crawled, indexed, and used — but increasingly without corresponding traffic being returned to the source. As AI systems answer questions, compare products, conduct research, and complete tasks directly, information across the open web is increasingly becoming part of AI training and retrieval systems. The existential question this raises is simple: if content is consumed without audiences ever visiting the source, how do content creators sustain themselves?

The implications are industry-agnostic

The earliest industries to feel the impact were news organizations and media companies. Today, similar dynamics are impacting businesses across retail, software, IT, and finance. Some of the most heavily crawled categories have seen human traffic decline as much as 40% in less than one year.

Many publishers are now preparing for what they call "Google Zero" — a world where little to no traffic comes from search referrals.

The implications extend to essentially every industry. Any organization that publishes proprietary information on the Internet will need to understand how to operate in an agentic era. This dynamic matters not just to content owners, but to all of us. The Internet is a critical part of the global economy and one of the world's most important public resources for surfacing information. Ensuring it remains healthy and sustainable is essential for all.

Part II: The market has emerged

What we built

When we launched Content Independence Day, we committed to three things:

Transparency and control for site owners, enabling them to define how their content is accessed and monetized.
Tools that create scarcity, shifting the balance of power back to content owners.
A marketplace where content creators and AI companies of all sizes can discover, license, and determine the value of content more efficiently.

One year later, a market for monetized content is here, and the conditions for a dynamic marketplace are forming.

Transparency and control created scarcity

Historically, publishers have had limited visibility into how AI companies accessed and used their content. As referral traffic declined, that lack of visibility became an economic problem prompting publishers to seek new ways to capture value.

Cloudflare's attribution, business intelligence, and enforcement tools gave publishers visibility into AI consumption at the network level — an enforcement mechanism far more effective than voluntary standards like robots.txt. For the first time, publishers could determine how their content was accessed and monetized. That control created scarcity, and drove a supply-and-demand content economy.

Scarcity created leverage

Publishers that exercised control over access successfully created scarcity, giving them negotiating leverage that led to better deals. For the first time, publishers gained operator-level attribution data — evidence of how often LLMs attempted to access their content, which competitive LLMs were crawling, what their most in-demand URLs were, and what their crawl-to-referral ratios looked like. This reduced information asymmetry in licensing discussions and enabled publishers to negotiate from a position of knowledge.

Leverage is changing the balance of power

This leverage has empowered our customers. As they have gained greater visibility into how AI systems access and use their content, they’ve become better equipped to understand the implications for their businesses and more confidently articulate the value of the information, brand, and audiences they have built.

As the balance of power between content owners and AI companies begins to change, a licensing economy is emerging:

More than 50 publisher-AI agreements have been signed since 2023.
Major AI companies now actively license content, increasingly recognizing the value of differentiated and premium content.
Collective licensing models continue to emerge and scale.
Large publishers are securing meaningful licensing agreements, demonstrating that content has real economic value within the AI ecosystem.

The conversation is no longer whether content should be compensated. The conversation now is how.

The market is maturing, but inefficiencies remain

Early licensing agreements proved demand exists, but licensing today remains largely bespoke and unlikely to fully replace lost referral, advertising, and affiliate revenue. As a result, publishers are increasingly optimizing for AI consumption alongside traditional human discovery while exploring new monetization pathways.

Supply and demand remain difficult to match efficiently, and while there’s an understanding that not all content carries the same value, content valuation is still unresolved.

The Google convergence problem

No discussion of this market is complete without addressing Google's unique role. Google remains the dominant gateway to online discovery, accounting for approximately 88% of referral traffic. But increasingly, Google is helping users consume content directly within Google-owned AI experiences.

Discovery and consumption serve fundamentally different purposes. Search drives users to content, while AI-powered experiences increasingly summarize and reuse it without requiring users to visit the source. Website owners view these activities differently because one generates traffic, while the other increasingly substitutes for it.

These differences become especially important when site owners are deciding who should be allowed to access their content and for what purpose. Most leading AI companies separate discovery crawlers from training crawlers, making it relatively simple for publishers to enable content access for one purpose or the other. Google does not. Today, Google has access to about 2x more information than leading AI companies because Google leverages a mixed-use bot that makes it difficult for customers to participate in Google's search ecosystem without also participating in Google's AI ecosystem.

Unlike other AI providers, Google’s mixed-use crawler also limits transparency for site owners. Because discovery and AI access are combined into a single crawler, publishers cannot tell why Google is accessing their content or distinguish between traffic used for search and traffic used for AI experiences. They also lose the visibility and evidence that comes from being able to allow or block these activities independently at the network level.

This dynamic has accelerated demand for greater transparency and control, as well as new monetization models to better serve both content owners and AI companies of all sizes.

Part III: A unique view of the ecosystem

Cloudflare sits at the intersection of the emerging agentic economy.

More than 20% of the web sits behind Cloudflare’s network. Of the world's most-visited websites, 36% rely on our network, and more than 40% of the Fortune 500 are Cloudflare customers. Nearly 80% of leading AI companies use Cloudflare, alongside thousands of developers and emerging AI companies.

This unique position gives us visibility into both sides of the market. We see the content owners creating content, the AI companies consuming it, and the signals increasingly connecting them. That perspective has given us a unique view into how the market has evolved over the past year, and what it now requires.

Part IV: Lessons from an emerging market

As publishers and AI companies adapt to a new agentic economy, Cloudflare has gained a clearer understanding of what the ecosystem now needs.

Transparency must become the standard

Content owners increasingly need visibility and control over who is accessing their content, how it is being used, and for what purpose. AI companies increasingly recognize that transparency builds trust and reduces friction with publishers. Visibility and enforcement are no longer security concerns alone — they have become business requirements that directly influence licensing negotiations and commercial decision making.

To help make transparency the standard, Cloudflare is continuing to invest in enhanced attribution, measurement, and publisher controls that give content owners greater visibility into and control over how their content is accessed and used.

As the industry shifts toward greater transparency, we believe that verifiable bot self-identification and declarations of crawl intent are fundamental to a sustainable ecosystem. Today, more than one-third of crawler activity on our network still comes from mixed-use bots that make it impossible for content owners to distinguish crawl intent. We are actively engaging with the ecosystem and investing in tooling to help drive that number to zero by this time next year.

Better AI requires better signals

Over the past year, it has become increasingly clear that AI companies need more than access to content. They need better ways to determine what to access, when to access it, and how frequently it has changed. Indiscriminate crawling wastes compute for AI companies and creates unnecessary bandwidth burden for publishers, reducing efficiency across the ecosystem.

We believe better answers require better intelligence. We are investing in real-time freshness signals with richer trust, quality, and relevance to help AI companies discover differentiated information while reducing unnecessary crawling across the web.

Markets need better discovery before better pricing

We believe better discovery must precede better pricing. In order for the market to mature, publishers and AI companies need better information about one another. We are investing in richer market intelligence, content signaling, and capabilities that improve discovery between both sides of the ecosystem, laying the foundation for more scalable market mechanisms over time.

Part V. Building the infrastructure for the agentic Internet

One year ago, Content Independence Day introduced a simple idea: content owners should have greater control over how AI companies access and use their information.

Over the past twelve months, that control helped give rise to a market. Transparency created scarcity. Scarcity created leverage. Leverage accelerated licensing. What was once a theoretical discussion about the future of AI and content has become an active market, with publishers, AI companies, and technology providers all adapting to a new set of economic realities.

The market is now entering a new phase that demands new infrastructure. As the Internet becomes increasingly agentic, the underlying systems that support it must evolve to handle permissions, licensing, and commercial transactions at scale. Content owners and AI companies need more efficient ways to connect and exchange value. We believe these capabilities will converge into programmable, scalable mechanisms for content discovery and monetization – reducing friction while unlocking richer forms of value exchange.

Cloudflare's role is to build the infrastructure and business intelligence, and contribute to the standards that allow the market to determine value more efficiently and help publishers and AI companies participate in a healthier, more dynamic content economy.

The Internet has always evolved. This evolution is faster and more consequential than most. But with the right infrastructure, the right incentives, and a commitment to transparency, we believe the agentic Internet can become more sustainable, more efficient, and better for everyone.

Methodology: The data in this report is compiled from Cloudflare Radar and the Cloudflare Investor Day 2026 Presentation.

Cloudflare Radar is a hub showcasing global Internet traffic, attack, and technology trends and insights. Powered by data from Cloudflare's global network, Radar was created to help anyone understand what is happening on the Internet from a security, performance and usage perspective.

Cloudflare's unique understanding of the Internet comes from its global network — one of the world's largest, spanning 330+ cities in 100+ countries — and aggregated and anonymized data from Cloudflare's 1.1.1.1 public DNS Resolver, widely used as a fast and private way to browse the Internet. More than 20% of the web sits behind Cloudflare’s network.

Making AI search smarter

Matthew Conroy — Wed, 01 Jul 2026 13:00:00 GMT

Search drives most experiences on the web. It's how we get things done, and how nearly everything on the web gets found — the creators, the merchants, the answer to whatever you just typed into a box. For nearly 30 years, that discovery journey ran on a simple bargain: let a search engine crawl your content, and it sends you visitors. You turned those visitors into a business — through ads, subscriptions, or just the audience itself. Being discoverable and getting paid were the same thing. A year ago, on the first Content Independence Day, we drew a line to defend that bargain in the AI era. But a line in the sand was only a first step. Since then, the prevalence of AI search in consumers’ lives has only accelerated as more than 50% of traffic online is non-human. The threat is no longer a handful of training crawlers you can block; it's search itself being rebuilt around AI answers.

Today's answer engines read your page and hand the user a summary, so the visit — and the revenue that depended on it — isn’t needed. We see it firsthand, and independent research backs it up: a 2025 Pew Research Center study found that when Google shows an AI summary, users clicked on a traditional search result link just 8% of the time (about half as often as when there's no summary) and clicked a link inside the summary only 1% of the time. That leaves our customers in a bind: opt out of AI and be hard to find, or opt in and deliver significant value to users while seeing increasingly little in return. Our customers want to be found and compensated for the value they provide, and right now they're forced to choose.

Today, we’ve announced new bot options to help our customers better control who can access their site and what they can do with it. But blocking was only step one: saying "no" protects content without rebuilding the business models that sustain it. So, it’s time to start building the new economic model of the Internet, starting with search.

Rebuilding the bargain

Transparency and control are the foundation, but more is needed. In 2025, we laid out our foundation via a set of responsible AI bot principles: bots should be transparent about who they are and what they're for, respect site owners' choices, and act in good faith. Our tools hold bots to that bar. But enforcing good bot behavior doesn't make AI search any better for the people relying on it, and it doesn't send a dollar back to the creator whose work made the answer possible. We can do more than help the web say "no"; we can help rebuild what it says "yes" to.

So today, we're announcing two initiatives that move from defense to offense and start putting both halves of that old bargain back together.

Make AI search smarter: By using the signals we see across our global network, like what's fresh, what's high quality, and what's actually changed, we can help answer engines surface the most relevant content and reduce unwanted crawling. People searching get better answers, while costs are reduced for both AI companies and site owners if webpages are only recrawled when they’ve changed.

Pay creators for the value they provide: When your work is used to answer someone's question, you should be rewarded instead of just being scraped for free. And you should be able to see what's being used and what people are asking. This should be a real revenue stream, and an incentive to keep producing original content worth finding.

Making search smarter

Today we're launching a research program to make AI search smarter and stop our customers footing the bill for crawls that produce nothing new.

More than 20% of the web sits behind Cloudflare’s network, which gives us a unique perspective. We can tell which pages have genuinely changed and which ones people and agents are flocking to. Through this program, we will explore using signals our customers have chosen to share about the freshness of their content, and we will combine those with our own insight into traffic flows, both human and bot. For answer engines, that's a roadmap to high-quality content. For our customers, it provides a view of what users are actually asking, and how their content shows up in AI results. The aim is to measure two things: how much these signals help answer engines to surface fresher, higher-quality content, and how much unnecessary crawling they cut out.

That second benefit, cutting unnecessary crawling, is bigger than it sounds. Cloudflare data suggests that more than 50% of crawl traffic from good bots goes to re-fetching pages that haven't changed — and that number is likely to climb as crawl volumes do. A signal that just says "nothing's changed here" lets a crawler skip the trip. That saves the answer engine compute. More importantly, it saves site owners from serving and paying for requests they never needed to.

The program is neutral by design: our goal is to make it work for every answer engine willing to play fair. It's limited to search. We aren't sharing any content, and nothing is used to train foundation models. We intend to publish what we learn, including the benefits to site owners such as better content discoverability and reduced server strain. We plan to make the capability broadly available later this year and reduce unnecessary crawling across our network.

From Pay Per Crawl to Pay Per Use

Last year we launched Pay Per Crawl so publishers could charge AI companies for crawling their content. It was a real start, but crawling is a crude measure of value. A single page might be crawled once and then cited in thousands of answers, or crawled over and over and never used at all. Creators want to be paid fairly for the value they provide.

So we're starting to shape Pay Per Crawl into Pay Per Use. We're running experiments with top AI companies, like Ceramic.ai and You.com, and the arrangement is straightforward: organizations can bring their payment models and easily scale them to content owners across the Cloudflare network.

Ceramic has built what it calls a "pay-per-query" model, so publishers who opt in can be paid when their content appears in Ceramic's search results. This means payment is designed to follow the value the work delivers rather than the number of times a crawler happens to fetch it.

"To scale the future of AI search, we need a partner with massive reach and a shared commitment to transparency and fair compensation,” says Anna Patterson, founder and CEO of Ceramic.ai. “Cloudflare allows us to easily and programmatically scale our operations. By bringing our pay-per-query model to their network, we ensure millions of content owners can seamlessly opt in to be compensated every single time their content appears in our search results."

In addition to compensation, content owners participating in the Cloudflare/Ceramic program will unlock new reporting to help with answer engine optimization (AEO). Customers can finally see the top queries leading to their content appearing in search results, the specific webpage and snippet, their average search result ranking position, and more. This is the first of many products we’ll be launching to help our customers with discoverability.

This is just one emerging approach. Another comes from You.com: agents can pay on demand for a specific piece of premium content they need, without any upfront commitment. New payment models from AI providers are being tested (e.g., Pay per Query, Pay per Result, etc.) and we have the infrastructure to support them all.

We want to be honest that this is an experiment. There’s a lot to learn, including exactly how this holds up at the scale of the Internet. We'll work that out with our partners and our customers as we go, and share what we learn. But the goal is clear: AI search companies get fresher, better-grounded answers, and the customers whose work makes the answers possible get paid when they help. Cloudflare's job in all of this is to provide the infrastructure layer that makes this market flourish.

We think this is a more natural fit for where the economics of search are heading. The old, human web optimized search to save time — providing excerpts, ten blue links, and a click. The agentic Internet is different: an agent can read fast and search continuously. Search is becoming something an agent does dozens of times to answer a single question, closer to a utility than a destination. In that world, the unit that matters isn't the crawl or the click. It's the outcome. Pricing the outcome, and paying the people who made it possible, is how the web continues to thrive.

The headline we want to earn

A year ago on Content Independence Day, the headline was a default ‘no’: AI can’t crawl without compensation. This year, our focus is on giving our users more products and controls to say ‘yes’ and bring more benefits with it.

Today's announcements are just the beginning. Cloudflare’s research project is designed to see if our signals produce better results with less crawling. Pay Per Use is a promising direction we’ll experiment with alongside partners who believe that content creators deserve fair compensation for their work. This is how the last 30 years of the web got built too: somebody runs the pilot that turns "the model is broken" into "here's the new model," one experiment at a time. We believe there’s value to our customers to be discoverable in this new agentic era, and to optimize their content for maximum discovery. But they should be able to do this without giving away their most valuable creative assets for free.

The web is changing, and the business models it’s relied on are changing with it. The old Internet was open, neutral, and worth contributing to. We have a rare chance to keep it that way, and to build the business models that fund it in the future. Smarter answers for humans and agents asking the questions. A fair deal for the people whose skill, creativity, and commitment makes the answers worthwhile. That’s how we pursue Cloudflare’s mission: to help build a better Internet.

Happy Content Independence Day!

Building on the open, agent-ready web? If you are interested in learning more about the Ceramic and You programs, please fill out this form. If you're building an answer engine and want to crawl smarter, we’d love to hear from you too: aeo@cloudflare.com.

Your site, your rules: new AI traffic options for all customers

Jin-Hee Lee — Wed, 01 Jul 2026 13:00:00 GMT

One year ago, we declared the first Content Independence Day, and we gave website owners the means to take back control of their content. The deal between crawlers and website owners that had held up for 30 years — we crawl you, and you get referrals — was no longer true. AI was taking everything and sending back nothing, presenting an existential threat to website owners. And so we launched a one-click "Block AI Bots" option, along with a Pay-Per-Crawl marketplace.

A lot has changed in a year. Last July, conversations around “AI bots” centered around blocking AI training without compensation, pointing to the win–lose deal where content was used for model training with no value driven back to the website owner. But a desire for more nuance has emerged: Content owners still want to be able to protect their content, and they should be compensated for the original content that they work hard to create, curate, and share. We also know that locking down content isn’t a one-size-fits-all solution; website owners want more options than resorting to “block all automation, every time.”

If you run a small site, the problem isn’t just that someone could train models on your content — it's that nobody can find you in the first place. So you have to make a Faustian bargain: either show up in search and let AI train on you, or risk losing discoverability. This unfairly advantages incumbent search providers if they use the same bots for both search and training; and this unfair advantage incentivizes new players to be evasive as they try to close the competitive gap.

Now, AI can be anything

Today, AI can be in anything. Google search has changed from being sorted by AI to being a full answer engine that answers your question directly on the results page. And Google is not unique in this position — this is the direction in which “search” is moving.

We could debate the cutoff for what qualifies as “AI” today, just to find that the standard changes tomorrow. So, instead of defining a bot primarily as “AI” or not, our updated approach to classification will ask deeper questions about bot or agent behavior: What are they doing on my site? What are they storing? And how will they reshare my content?

A pragmatic taxonomy

To address these questions, we need a more nuanced view — a pragmatic taxonomy that aligns with the AI use cases our customers care about. So we are opening the discussion beyond AI training alone and focusing on three AI use cases that we want all customers to be able to manage:

Search: any behavior that collects or indexes your content, so it can answer questions about it later. The key is that Search is proactively building a database of your site to later respond to queries with. Site owners should expect to get referral traffic or other equitable compensation as a result.
Agent: automated behavior that is acting, usually in real time, on a person's behalf, to get something done right now. This includes chat fetch bots (e.g., ChatGPT-User) and browser-use agents (e.g., Gemini or Claude driving Chrome). The key is that it visits your web application in order to complete a job, and often there's a human waiting on the other end.
Training: a crawler taking your content to train or fine-tune a model. The key is that your data is permanently absorbed into the underlying architecture of the AI to improve its capabilities.

Many popular crawlers on the web fall into one of the classifications above; some fall into multiple. We classify plenty of other behaviors beyond the three above — including ads verification, feed fetching, and agentic transactions (more on this below). But we believe it should be simple for all website owners to manage access for these three AI-centered use cases. We believe that bot operators should separate their crawlers because that creates more transparency for website owners: allowing them to better understand why a given crawler is visiting them, as well as to better manage the access they extend to that crawler. If a company runs automation that builds Search indexes, acts as an Agent, and collects data to Train their models, then we strongly encourage that company to separate the automation into three separate crawlers.

We want a classification system that is scalable and representative of the world of automated traffic as it evolves. Tracking a bot’s purposes is nothing new, but our new taxonomy involves a few updates that better represent the state of bot traffic today. Most notably, we want to recognize that bots that have multiple purposes should be tracked with all purposes, not just one of them.

New options to manage AI traffic

We want to provide more options for managing different kinds of AI traffic, to all website owners on the Cloudflare network.

The managed preset to “Block AI bots” that we’ve announced in the past included single-purpose bots that crawled data for model training, as shown below:

^{Screenshot of the existing setting to manage AI bot traffic on July 1, 2025.}

But not all AI use is the same, and we want our customers to have the controls they need. So, we’re launching the ability to manage AI traffic based on three major use cases: Search, Agent, and Training crawlers. With these new options, our customers can more finely tune how they manage AI bot traffic — including customers on our Free tier.

^{Screenshot of the new options to manage AI bot traffic on July 1, 2026.}

Setting new defaults

On September 15, 2026, we’ll be setting new defaults for each of these three classifications. For all new domains onboarding to Cloudflare, the categories of Training and Agent will be blocked by default on the pages that display ads, while Search will remain allowed by default.

An ad is a signal that a website owner meant for a person to land there and see it — something monetizable that fuels the business. So, on those pages, we treat human attention as the end goal, and keep away the bots that may prevent this attention (i.e., Training and Agent bots). On the other hand, Search is the behavior that most naturally funnels back visitors, and we believe it’s in the interest of most site owners to allow this.

Another change that will apply on September 15 is that multi-purpose crawlers (specifically those that combine Search with Training) will be allowed/blocked according to all of their behaviors, in line with our call for transparency for website owners. Since the defaults will be enforced by the most restrictive applicable rules, multi-purpose crawlers such as Googlebot, Applebot, and BingBot will be blocked by customers who have selected to block Training (either through the new options to manage AI traffic, or through the legacy Block AI bots service).

Of course, customer choice is paramount: if a website owner wants to opt out of these new default configurations, they can easily mark this in their Security settings any time leading up to September 15, which will confirm that they want no changes on Training crawlers that also crawl for Search purposes. We’ll also continue to notify customers of the upcoming change to defaults as we approach September 15 to ensure that customers who want to choose settings different from the defaults have the opportunity to do so.

BotBase: a new visibility plane for Enterprise customers

We’re also excited to launch a major visibility update as a new feature of Enterprise Bot Management. As Cloudflare’s directory of tracked bots has grown, so has the desire to manage these bots in sensible groupings and to understand more detail about a particular bot.

Introducing BotBase. BotBase is our new database tracking all known bots, including Verified bots and agents. This database provides a comprehensive, searchable view of our entire directory of bots, directly on the Cloudflare dashboard. We’re tackling visibility first, but, later this year, we’ll expand BotBase to provide a direct control center for known automated content on your website.

With this new view, Enterprise Bot Management customers can see the full catalogue of all Verified bots/agents and where they are classified in this updated taxonomy — a view we’ve never shown dynamically on the Cloudflare dashboard before. Customers who want to precisely target a specific bot can also easily filter for all traffic from this bot, plus copy the detection ID to use in Security rules. All of this is now live within a dedicated page, which can be accessed through the Bot Management configuration card.

As we built BotBase, we wanted to account for all of the pieces of information that would allow us to build scalable, powerful insights from bot to bot. One of these pieces is a cornerstone for our updated taxonomy, which is based on what a bot may do on your site — its behavior. We separate these classifications as shared below, and each bot is classified with one or more of these behaviors.

Bot classification	Behaviors and uses
*Search*	*Crawling to scan your site to help it appear in search engine results*
*Agent*	*User-directed agents visiting a page on behalf of a human*
*Training*	*Crawling to train or fine-tune models*
Transact	Checkout actions on behalf of users
Data Collection	Includes price scraping, competitive intelligence gathering, and third-party analytics
Security Testing	Includes vulnerability scanning and penetration testing
SEO	SEO crawling, site auditing, accessibility checks
Ads Verification	Ad placement verification, ad fraud detection
Social / Link Preview	Link previews for social platforms and messaging apps
Feed Fetching	Includes RSS readers, podcast aggregators, and news feed bots
Monitoring & Operations	Includes uptime monitoring, webhooks, and health checks

^{Bold italicized rows indicate the new configurable options that are available to all customers.}

How does a crawler use my content?

Another piece of information we’ve heard is important to our customers is a bot’s content use — what a bot may keep and reshare after it has crawled your content. To address this, we are building capabilities for Bot Management customers to select and block based on the “content use.” This setting can be set to one of three levels, from least to most permissive:

immediate — interact, but store and reuse nothing
reference (default) — index, excerpt, and link back
full — summarize and reproduce

These values can be combined with bot classifications to express nuanced rules, such as “allow all bots that are used for Search, SEO, and Ads Verification, but only up to the reference use level.” This allows website owners to make decisions in sensible groupings rather than manage individual bot-by-bot rules.

To further support this, starting today, we're testing a new signal, use, that extends Content Signals and lives in your robots.txt. This extends the three fields of the first version of Content Signals with a fourth, optional field that expresses the same preference as above:

use=immediate
use=reference
use=full

As with all other items listed in the robots.txt file, the values of content use signal a website owner’s preference, rather than issuing blocks directly. We’re now adding support for this extension: all customers who have already enabled managed robots.txt — which prepends the preference to robots.txt that crawling for search is okay, but that crawling for training is not — will now have the additional preference of use=reference added to their robots.txt.

# Cloudflare Managed content with original Content Signals

User-agent: *
Content-Signal: search=yes,ai-train=no
Allow: /

^{The contents of Cloudflare managed robots.txt with the original Content Signals values.}

# Cloudflare Managed content with the new content-use signal

User-agent: *
Content-Signal: search=yes,ai-train=no,use=reference
Allow: /

^{The contents of Cloudflare managed robots.txt with the added parameter.}

We’re also starting to track content uses for every bot in BotBase, and when we discover a bot abusing these signals, it will lose the “Verified” status, resulting in it no longer being allowed. Today, bots that reproduce in full cannot have the Verified status.

What does it mean for a bot to be Verified?

Speaking of “Verified,” the definition of Verified is being updated to reflect the upcoming changes to default allow and block baselines. Previously, all Verified bots were allowed by default, which was reflected in our basic Bot Fight Mode offering to block unwanted automatic traffic and in our rule templates for Enterprise Bot Management customers.

Starting today, we’re adjusting this to add nuance: non-verified bots are still default blocked, but we are no longer viewing Verified as “default allowed.” Now, the Verified label makes a bot allowable with its relevant category, meaning the allowed category (e.g., allowing Search) will determine what is allowed to access a website.

To balance this change, we’re opening up the process of becoming a Verified bot, and making it more transparent, too. To "Verify" a bot, a bot operator needs to show two things: that you represent yourself honestly, and you don't abuse the access that honesty earns. And to make this easier on bot operators, we’re currently building management tools for bot operators to better ensure they are accurately represented by Cloudflare’s classification system (to be announced in the near future).

^{A preview screenshot of the upcoming platform built directly for bot operators who are part of or want to be a part of BotBase, the next generation of the Cloudflare Bots Directory.}

Experimenting with transitive trust

One more piece: The bot (or agent) at your door increasingly isn't run by the company that built it. A platform like Cloudflare’s Developer Platform runs automations for thousands of different operators at once, ranging from enterprises to a developer you've never heard of. You might trust Stripe, but you don't necessarily trust everyone who wired Stripe's tools into a weekend project.

We call the case of (site owner → bot owning company → end user) a matter of transitive trust, and we're proposing to utilize the existing Forwarded header as defined in RFC 7239 that rides along with the request and allows “proxy components to disclose information lost in the proxying process.”

This is similar to what X-Forwarded-For does for IP addresses, or X-Forwarded-Host does to preserve the original Host header. So when a website owner says, "Allow this operator," that preference will hold, whether the operator comes to you directly or through three layers of intermediaries that are trusted. More details can be found in our documentation, with a brief example to show the format below.

Forwarded: for="openai"

Adding the extension with content-use discussed above, the header addition would look something like the below, specifying how the operator says they will use the content they access:

Forwarded: for="openai";use="reference"

This also lines up the incentive model we want to foster. Losing trusted status across the more than 20% of web domains that sit behind Cloudflare is a deterrent with teeth. Trust becomes something you can carry with you, and something you can lose.

However, as bot traffic blends with human traffic, it’s possible that this system of transitive trust doesn’t carry beyond the users who can afford to be identifiable. The measures we are proposing today help to convey trust, but they won’t fit the entire web for all time. Small sources of traffic need privacy, and companies that want to preserve their own privacy commitments should be able to explore fair building blocks for the future of an agentic Internet, such as private rate limiting.

Set your terms today

These are small changes that move in the same direction: site owners get more control over who uses their content, and how. We believe the new defaults we discussed today and will soon implement are ones that encourage transparency and are more reflective of where the world is going.

Of course, the ebbs and flows of the web will continue shifting under us, and we'll keep adjusting with it. But the direction won't change, because it's the one Cloudflare started with: a web ecosystem built around trust. Where the people who make things can decide how they're used — and one where being honest about what you do earns you more access, not less.

These new options to manage AI traffic are live now, and can be configured by all existing customers in their zone Settings. Not on Cloudflare yet? Start for free to set the traffic controls that you want today.

Happy Content Independence Day.

Unmasking the crawls with Attribution Business Insights

Jin-Hee Lee — Wed, 01 Jul 2026 06:00:00 GMT

Original content is the lifeblood of conversations and curiosities. Imagine a world without it: we could find a thousand ways to regurgitate the same material that’s already been created, but we would witness the decline of fresh ideas and arguments.

Website owners fuel the ecosystem of ideas, news, and interesting tidbits, but they face the increasingly complex challenge of managing traffic to their websites and being paid for their content. While some bot traffic is clearly malicious, it isn’t always obvious when a particular AI crawler is helping or harming your business. To answer this, site owners need granular, reliable data to differentiate between traffic that provides value, and traffic that strains resources while eroding the foundation of their business model: actual humans consuming their content.

At Cloudflare, we hold a core belief: website owners have the right to control access to their content. We want to help website owners maintain their high-quality content and regulate AI traffic.

To provide much-needed clarity and help website owners take control, we’re excited to announce the new Attribution Business Insights dashboard — designed with business decision-makers and publishers in mind.

The new economics of the Internet

For decades, the business model of the Internet relied on a straightforward, unspoken agreement: website owners allowed search engines to crawl their content and, in return, search engines sent readers back to their pages. This symbiotic relationship, where traditional search engines operated with a balanced "crawl-to-referral" ratio, generated the pageviews needed to sustain advertising, affiliate revenue, and subscriptions. Search index crawlers would scan your content a couple of times for each referral sent, so making your website available to crawlers had a clear pipeline to additional revenue. We can think of this as the SEO (Search Engine Optimization) era.

Today, the explosive rise of AI crawlers and agents has broken this contract, plunging the digital publishing industry into an unprecedented crisis. The Internet is risking a transition into a "zero-click" ecosystem where AI chatbots scrape original content to synthesize instant answers — completely bypassing the original sources. We’ve already seen a marked shift from the SEO-only world into an AEO (Answer Engine Optimization) world, and now conversations around GEO (Generative Engine Optimization) are taking center stage.

The imbalance of this new reality is made clear by the crawl-to-referral ratios we see across the Internet today. While traditional search engines had a more balanced ratio of crawls to legitimate visitors referred, major AI crawlers operate on a drastically different, extractive scale. Bots from leading AI companies have been observed with a range of crawl-to-referral ratios: we noted ratios of 118:1 up to nearly 50,000:1 around the time of our Content Independence Day in 2025. In other words, an AI crawler might have crawled your premium content tens of thousands of times just to send back a single visitor. This ratio is fundamentally unfair.

For publishers, this creates a double hit: first, they’re losing out on the crucial referral traffic, ad impressions, and direct audience relationships that fund content creation and journalism. Second, they’re forced to bear the rising infrastructure costs of hosting and serving content to automated bots that offer no commercial value in return. The era in which it makes sense to allow all crawlers in the hopes of being discovered is over.

Introducing Attribution Business Insights

We want website owners to have the facts — the cold, hard numbers to understand which bots are helping their business and which bots are harming it. We also want to make this analysis easier than ever, which is why we’ve designed Attribution Business Insights to cut the noise, focusing on the details that our customers have told us are most important.

Today, the Attribution Business Insights dashboard is available to all Cloudflare Bot Management customers. The new dashboard is designed to deliver a targeted view of bot traffic flowing to your website; unlike traditional analytics tools that may require extensive manual filtering, this dashboard provides you with key insights right away.

We set out to answer the most pressing questions for site owners today: How should you think about AI traffic on your websites? What is the value of different audiences — including humans, non-AI bots, and AI bots? And most importantly, what is your data being used for?

^{The new Attribution Business Insights dashboard view, which includes insights about bot traffic overall, a site-wide crawl-to-referral ratio, and the distribution of AI bot traffic vs. organic traffic.}

To answer these questions, the dashboard displays a powerful array of data and insights:

Bot traffic to content pages: View your overall bot vs. human traffic, as well as the volume of all bots successfully accessing content.
Crawl-to-referral ratios: See your site-wide crawl-to-referral ratio on the scale of 24 hours, seven days, or 30 days. You can also see crawl-to-referral ratios per bot operator (per company that owns one or more bots).
Top bots breakdown: A list of top bots by volume, including their country of origin, bandwidth they take up on your website, and whether you’re currently blocking or allowing them.
Updated classification based on crawler behavior: We go beyond a generic label of “AI Crawler” by classifying crawlers with our updated taxonomy, whether it’s Training (i.e., training the next version of an LLM chatbot), Search (i.e., refreshing databases for Retrieval-Augmented Generation), or Agent (i.e., used in agentic interaction to return answers to an end user).

From data to business strategy

You shouldn’t have to be a security expert to understand how AI crawlers affect your business. If website owners want to spend just a few minutes ingesting the high-level insights, they can walk away with a clear temperature check of the effectiveness of their content security policy.

For those who want to do a little more digging to understand how AI companies are making use of their content — or collect information to guide how they want their relationships with AI companies to develop — we show a more granular view organized by bot operator.

^{Breakdown of bot activity on a website, with important details for each bot such as type, crawl-to-referral ratio, and current action.}

By having a consolidated view of companies seeking to access content on your website, you can develop a better baseline of crawler activity. We want this data to equip our customers to step into any business conversation with the facts on their side. Tell Company1 that their crawl volume is twenty times that of Company4’s, and that Company4 is already compensating you for content. Revisit the way that Company2 licenses your content based on their recent activity. This new dashboard propels business conversations to move forward.

How does this new layer of visibility tie into the existing tools you have to protect your website from abuse? In line with other features of Bot Management, the action step still happens in Security rules. To avoid adding noise to the control plane, Attribution Business Insights is intended to be a hub for thoughtful, filtered analytics, rather than another place to take action. This dashboard serves as a central source of information, allowing you to investigate before then taking an action in the same rule engine that governs other abuse mitigations. We also want to be loud and clear about inviting business decision-makers into this dashboard, acknowledging that conversations around AI traffic have a wider set of stakeholders than only security-specialized users.

What’s next

The Attribution Business Insights dashboard is the next critical step in providing website owners with the transparency and control they need to manage evolving AI bot threats, and more broadly, shape the new dynamics of the Internet. We’re already investigating the next iteration with close publishing partners to create a visibility plane that covers security from the perspective of the website owner with valuable, original content to share.

A sneak preview below includes a new view to dissect crawler activity per-article to reveal the appetite that AI companies have for different pieces of content, different campaigns, and so on.

^{Breakdown of most popular articles, according to traffic volume. Shows key metrics such as AI bot traffic vs. other bot traffic vs. human traffic, both direct and from a referral.}

Visibility is the first piece, and there’s more to come to empower website owners to take control of their content in this new age. We encourage all customers of Cloudflare Bot Management — especially those driving business conversations — to access this today for a fresh take on analytics.

How we built saga rollbacks for Cloudflare Workflows

Vaishnav Kavitha — Thu, 25 Jun 2026 13:00:00 GMT

Cloudflare Workflows allows you to build durable, multi-step applications with built-in retries and state persistence across long-running processes. When a Workflow executes, each step can call external systems, retry failures, and persist state across restarts. But if one step fails, it may leave earlier work from completed steps in an inconsistent or partial state.

Today we’re shipping saga rollbacks for Workflows, allowing you to declare rollback logic within the step itself, in case of failure.

For example, consider a workflow for transferring funds between accounts at two different banks:

Debit from account at Bank A
Credit to account at Bank B
Send email confirmation to both account owners

What happens if Step 2, the credit to account at Bank B, fails? Once the debit succeeds at Bank A, the transaction is committed and the money has left its system. As the orchestrator of the transaction, you cannot simply “undo” the operation in Bank A's system. Instead, the money must be credited back to the account at Bank A through a new operation that semantically reverses the first one.

This pairing of an operation and its compensation logic is called the saga pattern.

Before today, developers had to implement their own compensation logic to track what succeeded, what failed, and what actions should be taken upon failure, outside of the steps’ direct definitions. Now, you can define compensation logic for each step.do() as an argument within the steps themselves, maintaining your workflow’s durability for the rollback as well.

// track what completed so we know what to undo
let debitA;
let creditB;
try {
  debitA = await step.do("debit-bank-a", () => bankA.debit(from, amount));
  creditB = await step.do("credit-bank-b", () => bankB.credit(to, amount));
  await step.do("notify", () => notifyBoth(from, to, amount));
} catch (error) {
  // unwind in reverse. each undo is its own durable step,
  // must be idempotent, and must keep going if one fails.
  if (creditB) {
    try {
      await step.do("reverse-credit-b", () => bankB.debit(to, amount, creditB.id));
    } catch (e) {
      await alertOnCall("reverse-credit-b failed", e);
    }
  }
  if (debitA) {
    try {
      await step.do("refund-debit-a", () => bankA.credit(from, amount, debitA.id));
    } catch (e) {
      await alertOnCall("refund-debit-a failed", e);
    }
  }
  throw error;
}

^{Without rollbacks}

// each step ships with its own undo. add a step,
// add its rollback right here. no growing catch
// block, no manual ordering, no replay logic.
await step.do("debit-bank-a", () => bankA.debit(from, amount), {
  rollback: async ({ output }) => bankA.credit(from, amount, output.id),
});
await step.do("credit-bank-b", () => bankB.credit(to, amount), {
  rollback: async ({ output }) => bankB.debit(to, amount, output.id),
});
await step.do("notify", () => notifyBoth(from, to, amount));

^{With rollbacks}

Try it out

To use rollbacks, just pass an options object containing a rollback function as the last argument to step.do().

const debit = await step.do(
  "debit-account-a",
  async () => {
    return await bankA.debit({
      accountId: fromAccountId,
      amount,
      idempotencyKey: `${transferId}:debit-account-a`,
    });
  },
  {
    rollback: async () => {
      await bankA.credit({
        accountId: fromAccountId,
        amount,
        idempotencyKey: `${transferId}:rollback-debit-account-a`,
      });
    },
  }
);

// The idempotency keys make both the forward operations and rollback operations safe to retry without duplicating the transfer

const credit = await step.do(
  "credit-account-b",
  async () => {
    return await bankB.credit({
      accountId: toAccountId,
      amount,
      idempotencyKey: `${transferId}:credit-account-b`,
    });
  },
  {
    rollback: async ({ output }) => {
      if (output === undefined) {
        return;
      }

      await bankB.debit({
        accountId: toAccountId,
        amount,
        idempotencyKey: `${transferId}:rollback-credit-account-b`,
      });
    },
  }
);


// If we fail here, we may want to revert all previous payments. Users should not have to wrap their code in complex try-catch logic just to revert two small payments (see below)

await step.do("send-confirmation", async () => {
  await sendTransferConfirmation({ ... });
});

Rollback functions should be idempotent, just like regular Workflow steps. If you refund a charge, use the payment provider's idempotency key. If you release inventory, make the release safe to call more than once.

If any step fails, the rollback handlers will execute in reverse step-start order. It sounds simple: run the undo steps when something fails. In practice, there are a few details that make the API and execution model important.

1. The failed step may still need rollback. A failed step.do() can still be rollback-eligible if it registered a rollback handler.

The rollback will not start if user code catches an error and the Workflow continues, but if a step error is caught and the Workflow later fails for another reason, rollback can still run for previously registered handlers, which execute in reverse step-start order.

Why? The step may have partially interacted with an external system before failing. For example, a payment provider may capture a charge, but the step may fail before returning the chargeId to Workflows. That is why rollback handlers receive output, but must handle output === undefined.

2. Rollback only starts when the Workflow fails. Adding a rollback handler does not mean every step error triggers rollback. If user code catches an error and continues, the Workflow continues. Rollback starts when the Workflow itself is about to fail terminally.

When rollback starts, Workflows finds eligible step.do() calls, runs their rollback handlers, then records the final Workflow failure.

3. Ordering has to be predictable. For sequential Workflows, rollback order feels obvious:

Reserve inventory.
Charge card.
Create shipment.
If shipment fails, refund the card and release the inventory.

Parallel steps make this more subtle. Completion order can differ from start order, so Workflows uses reverse step-start order instead of reverse completion order.

The practical rules are:

Any started or completed steps with rollback handlers are eligible.
The failing step.do() is also eligible if it registered a rollback handler.
Handlers run in reverse step-start order, not completion order.

How we designed the API

Once we had the expected behavior in mind, we had to add this new pattern into the Workflows API. Rollbacks went through a few iterations before we landed on rollback options.

Why not a fluent or builder API?

The first approach was a fluent form: step.do(...).rollback(...) It reads well. The forward action and the compensation sit next to each other, and the call site looks like ordinary JavaScript chaining.

The problem is that step.do() already has an important meaning: it starts a durable step and returns a Promise for the step output. In Workers, promise-like values are especially meaningful because Workers RPC supports promise pipelining, a pattern inherited from systems like Cap'n Proto.

Promise pipelining lets code call a method on a future value before that value has fully returned to the caller. For example:

const session = api.authenticate(apiKey);
const name = await session.whoami();

Here, session is not the real session object yet. It is more like a handle to the session that will exist soon. When you call session.whoami(), Workers can send that call to the remote side early and say: “once authentication creates the session, call whoami() on it.”

That saves a round trip. The caller does not need to wait for authenticate() to fully finish before asking for whoami().

We considered a fluent API:

step.do("charge-card", chargeCard).rollback(refundCharge);

To a reader, that can look like “call .rollback() on the result of charge-card.” But rollback is not part of the step’s output. It is part of the step.do() options, registered before the step starts, so Workflows knows how to compensate the step if a later step fails.

A fluent API also makes step timing harder to reason about. Today, step.do() starts the step when it is called, so developers can start a step, do other work, and await the first step later:

const first = step.do("first", () => serviceA.call());

await step.do("second", () => serviceB.call());

await first;

With today’s execution model, first starts immediately, before second. A fluent API would complicate that. Workflows would need to wait and see whether .rollback() gets attached before it knows the full step definition. That could delay when the step is sent to the engine.

In the earlier example, first could start at await first instead of at step.do("first", ...), after second has already completed.

That makes concurrent Workflows harder to reason about: step timing would depend on when the returned Promise is consumed, not just where step.do() is called.

We also considered a builder-style API:

const charge = await step
	.saga("charge")
	.do(() => chargeCard())
	.rollback(() => refundCharge())
	.run();

A builder API avoids the Promise ambiguity. It also gives us an obvious place for future step-level options, and makes it clear that the forward action and rollback action belong to the same saga step.

But it adds ceremony. Every step needs a final .run(), forgetting .run() would be easy and hard to spot without tooling, and simple one-step cases start to look like configuration chains. It also introduces a new step.saga() builder, breaking from the existing step. pattern. Most importantly, it makes step.do() feel like an older API rather than the primary Workflows primitive. The goal of rollback was to extend step.do(), not replace it.

Rollback as step metadata

step.do(..., { rollback })

Ultimately, we chose the explicit form where rollback is metadata on the step.

This way, each rollback is defined within the forward step itself. Each handler receives the error that caused the rollback to start, the step context, and the output, which is either the persisted value returned by the forward step (which can be undefined) or undefined if the step failed before persisting a value.

Rollbacks emit lifecycle events, so you can tell whether compensation started, which rollback handler failed, and whether rollback completed successfully.

Crucially, the original Workflow failure remains separate: rollback is what Workflows does after the failure, not the reason the Workflow failed.

Just as you can define custom retry and timeout behavior in the step configuration via WorkflowStepConfig, you add rollback-specific values in rollbackConfig.

{
  rollback: async ({ output }) => {
    await bankA.credit({ accountId: fromAccountId, amount, transferId: `${transferId}-reversal` });
  },
  rollbackConfig: {
    retries: { limit: 10, delay: '30 seconds', backoff: 'exponential' },
    timeout: '2 minutes',
  },
}

This matches the lifecycle-event mental model we wanted. A step.do() already describes a durable unit of work that Workflows records, retries, and later shows in logs. Rollback is another lifecycle behavior for that same unit of work. It should travel with the step definition, not live in a separate wrapper or builder.

The step still starts when step.do() normally starts.
The returned promise still represents the step output.
Concurrent Workflow code keeps the same execution model.
Retry and timeout options for rollback live next to the rollback handler.
Existing step.do() calls keep working exactly as they do today.

This shape is slightly more explicit than the fluent API, but that explicitness is useful. The operation and its compensation are still in one place, and the API does not introduce a new step builder or a new kind of promise. Developers who already understand step.do() only need to learn one additional options object.

This is less magical, but it is simpler to adopt, and clearer to understand.

How it works under the hood

Rollback feels like a small API addition, but it changes what Workflows needs to record about each step.

A regular step.do() already has a durable record. Workflows records that the step started, whether it completed, what it returned, and whether it should be skipped instead of repeated if the Workflow resumes later.

Rollbacks add one more thing to that record: whether the step registered compensation logic.

This means Workflows has two pieces of information to bring together if the Workflow fails.

The first is durable step history. The Workflow engine stores data to know what ran, what completed, what output was saved, and whether rollback was registered.

The second is the rollback handler itself, which is the function written to compensate for that step. Workflows does not save the text of that function as data. Instead, it keeps a callable reference to the handler while the Workflow is running.

In Workers RPC, this kind of callable reference is called a stub. A stub lets one part of the system call code that is running somewhere else. Stubs also have lifetimes such that they can be disposed when a call or execution context ends. If you need to keep a stub past that point, Workers RPC provides a dup() method, which creates another handle to the same target.

For rollback, that model is useful. The durable step history records what needs compensation. The rollback stub gives Workflows a way to invoke the compensation code. And because rollback handlers may need to outlive the immediate step.do() call that registered them, Workflows keeps its own callable reference to the handler for the rollback phase.

In the common case, when a Workflow enters rollback in the same engine lifetime, Workflows already has the rollback stubs it needs. It can use the durable step history to find eligible steps, then invoke the rollback stubs that were registered during forward execution.

This gets more subtle when Workflows has to recover after a restart.

If the engine is evicted, crashes, or restarts while rollback is needed, Workflows still has the durable step history, but it may no longer have the in-memory rollback stubs. To recover, Workflows uses replay: a recovery mode where it can re-run the Workflow code without re-executing completed forward step bodies.

When replay reaches a completed step.do(), Workflows reads the persisted result instead of running the step body again. For rollback recovery, Workflows only needs to rebuild handlers for steps that had rollback attached and are eligible for rollback. As those step.do() calls are encountered, their rollback options can register the callable stubs again

That lets Workflows recover the rollback handlers it needs without duplicating the original external side effects.

With those pieces in place, rollback can work whether the handler is still available in memory or has to be rebuilt during recovery.

When the workflow is about to fail, Workflows does not ask your application to reconstruct what happened. It already has the step history. It can look at the persisted record and answer the important questions:

Which steps started?
Which steps finished?
Which failed step may still need cleanup?
Which steps registered rollback handlers?
What output should each rollback handler receive?
What order should compensation run in?

Then Workflows invokes each rollback stub with a rollback context: the original error, the step context, and the step output, if one was persisted.

The ordering detail matters. In normal JavaScript, especially with Promise.all(), completion order is not always the same as start order. If step A starts first and step B starts second, step B might finish first. For rollback, Workflows uses the persisted start order as the stable source of truth, then unwinds it in reverse.

Rollback handlers also run through Workflows' normal step machinery. That means compensation gets the same operational properties you expect from Workflows: retries, timeouts, lifecycle events, logs, and a final recorded outcome. If a rollback handler keeps failing after its configured retries, Workflows records the rollback outcome as failed, stops running the remaining rollback handlers, and the Workflow instance ultimately ends in the Errored state.

This is the main difference between saga rollbacks and a catch block. A catch block only knows what is still in memory at its exact point in your JavaScript execution. Workflows rollback uses persisted step history to decide what already happened, invokes the stubs it already has in the common case, and safely rebuilds missing stubs during recovery when it needs to.

That is also why the API puts rollback on step.do() itself. Rollback is not a separate global error handler — it is metadata attached to the durable unit of work Workflows already understands.

What’s next

Our first iteration of rollbacks includes:

Explicit per-step rollback handlers for step.do()
Sequential rollback execution
Retry and timeout configuration for compensation

Next, we want to explore:

Rollback support for waitForEvent
Support for parallel rollback execution
Rollback support for Python Workflows

When a multi-step application fails halfway through, the hardest part is often not knowing that it failed. It is knowing what already happened, and what needs to happen next.

Saga rollbacks let you put that answer directly beside each step. If you are building multi-step applications with Workflows, try saga rollbacks and tell us what compensation patterns you want next. Get started with the Workflows documentation and share feedback in the Cloudflare Community.

Unlocking the Cloudflare app ecosystem with OAuth for all

Sam Cabell — Wed, 24 Jun 2026 06:00:00 GMT

Cloudflare provides services that help run 20% of the web, but we don’t do it alone. Developers on our platform use a myriad of tools and services from other companies too. Cloudflare provides a rich API for our platform that enables developers to create automations, CI/CD, and integrations that glue together the various parts of their infrastructure. Earlier this month, we announced self-managed OAuth, making it easier for customers to create and manage their own OAuth clients for delegated access to the Cloudflare API.

Cloudflare isn’t new to OAuth. If you’ve used Wrangler, or used integrations from partners like PlanetScale, then you’ve already used it. However, until now, third-party OAuth was only available through a small number of manually onboarded integrations, and was not available to developers more broadly. That meant developers building their own integrations had to rely on API tokens, which are harder to manage and a poor fit for many delegated application flows.

Over the last year, we onboarded a growing number of early partners while improving the consent, revocation, and security model behind Cloudflare OAuth. But as our Developer Platform grew and agentic tools drove demand for delegated access, it became clear that opening up OAuth to all customers was critical to the success of our platform.

With self-managed OAuth, developers can now offer a standard OAuth flow where customers grant scoped access directly, making it easier to build SaaS integrations, internal developer platforms, and agentic tools while giving users clearer consent, easier revocation, and more control over what an application can do.

Scaling the ecosystem securely

While our earlier OAuth solution was sufficient for a small number of carefully managed partners, we realized that our permissions model, our consent experience, and our ways of mitigating potential abuse vectors were not mature enough.

Earlier this year we updated our consent experience to make it clearer which application is requesting access, and what permissions it will receive. We also added revocation to the dashboard so developers can easily control which applications have access to their data, and made app ownership more visible to prevent OAuth phishing attacks.

Opening self-managed OAuth to all customers also required major upgrades to our underlying OAuth engine. This process required a large amount of planning to do with minimal user interruption, while also ensuring data stability and security.

Planning the upgrade to our OAuth engine

Years ago, we deployed Hydra, an open-source OAuth engine, to power Cloudflare OAuth under the hood. That deployment served us well when usage was limited, but as the developer platform grew and agentic workflows became more common, it became clear that we needed a major upgrade to unlock new capabilities and improve performance.

As we planned the upgrade, we decided to do two smaller sequential upgrades rather than doing one large upgrade. First, we would move to the latest 1.X release, evaluate any behavior or performance changes, and then proceed with the 2.X upgrade.

During our upgrade planning, it became clear that even the 1.X upgrade would still impact customers because the Hydra database required extensive schema migrations that:

Created indexes in a manner that would claim an exclusive lock on critical tables, preventing active users from performing important OAuth operations
Added columns to critical tables, and moved other columns to new tables

There was also a quirk in the version of Hydra we were using in which the SDK would perform SELECT * operations, causing deserialization issues with the schema changes.

To prevent user impact, we rewrote the SQL migrations to use features such as CREATE INDEX CONCURRENTLY, and built a custom version of Hydra which selected explicit columns rather than SELECT *.

With the latest 1.X upgrade planned out, we now needed to create a plan for the even larger 2.X upgrade. We identified three potential options, and weighed the benefits and drawbacks of each one. Doing an in-place upgrade was not going to work for us, due to the sheer amount of schema changes the major version bump brought with it. We decided that a blue-green strategy would work, but there was more that needed to be done than simply flipping a switch to start using the new version. The upgrade and migration process would take multiple hours, and we needed the system to continue functioning correctly in that time window.

The first blue-green option would involve disabling writes to the database, preventing any new authorizations from occurring. This means they would not be lost in the transition, but it also meant that nobody would be able to use existing OAuth apps unless they already had a valid credential. It also presented another large problem: if users needed to revoke access from an application for any reason, it would not be possible while the upgrade was being performed.

To combat these issues, we came up with a way to leave writes to the database enabled, at the cost of losing some of them in the switch to the green version. The first thing to solve was minimizing the number of writes for new tokens. There was an operational lever we pulled: increasing the expiry time of tokens to multiple hours. This would allow apps that received new tokens before the upgrade to continue using them without needing to refresh.

With reducing writes solved, we needed to come up with a way to not lose any revocations our users performed during the upgrade window. To do this, we created a queue system (using Cloudflare Queues!) which, after a revocation event, would have a record written into the queue with information about that revocation. This would allow us to drain the queue with the database flipped to the green version, replaying all revocation events that took place in the time window in which they would have been lost. This was critical to get right, otherwise applications that users had revoked would inadvertently have their access restored.

Executing the upgrade

Upgrading to 1.X

From an operational point of view, our first upgrade to the last 1.X release went off without any hitches. Our custom database migrations ran faster than we expected, with no user impact. We had to do a hard cutover to the new version because the old version was unable to introspect tokens that were created by the newer version.

After the cutover, we saw an increase in refresh token errors that we had not seen before. This ended up being due to stricter refresh invalidation behaviors in the new version; if a refresh token was reused, Hydra would invalidate the whole access and refresh token chain. This is problematic for Wrangler and MCP clients. These clients both have a high request volume, and a single reused refresh token would invalidate the entire session.

We mitigated this by adding refresh token coalescing behavior to our Worker which routes OAuth traffic to the correct destination. This allowed us to briefly cache the refresh token request before it reached Hydra, so that if we detected a retry we could short-circuit the request and respond without invalidating the tokens. Fortunately, 2.X versions of Hydra have a configurable “refresh token grace period”, which resolves this by allowing a refresh token to be retried for a period of time without invalidating the whole chain.

Upgrading to 2.X

Since multiple hours of high user-facing impact would not be acceptable, we had our blue-green upgrade strategy set. At a high level, this sounds simple; the migrations would run on a copy of our production database, and then cut over along with the new Hydra version after they complete. In reality, there were a lot more moving parts:

Enable revocation replay capture queue
Copy and restore our database to the new target
Targeted data cleanup — existing data violated some new constraints introduced in the newer versions, which could prevent migrations from succeeding
Perform cutovers on the Hydra service along with two additional critical internal systems simultaneously to prevent any errors
Post-cutover monitoring and validation

We chose an upgrade window when Hydra had the lowest request volume per second to minimize lost token writes. Other than some timeout tuning, our production migrations ran well against the new database: the net runtime in production was approximately three hours. After the migrations completed, we carefully rolled out the new version of the Hydra service, along with two additional system configs to flip our systems to use the new SDK version.

Shortly after cutting traffic over, we observed that a data cleanup job in our authorization service (which relies on the Hydra consent session API) was being overeager in its purging of OAuth policy data. After investigation, we discovered that there was an issue in one of the Hydra migrations that corrupted the state of certain valid OAuth sessions, which resulted in the migration marking them as invalid. The valid sessions being corrupted caused a disagreement between Hydra and our authorization service, manifesting as an increase in 403s. To mitigate this, we did data restorations and began work on improvements for OAuth authorization behaviors to remove reliance on static policy data.

Beyond the data cleanup issue, there were some additional small fixes more driven by specific client behaviors which we landed quickly.

With the Hydra version upgrade complete, OAuth traffic has remained stable with improved system performance and reliability for our customers. It also brought production onto the same foundation our newer OAuth APIs had already been validated against in staging, clearing the way for our self-managed OAuth release on June 3.

Performance improvements

After completing a large upgrade like this, it is always rewarding and illuminating to look at some broad metrics about the impact. We gathered additional metrics during the database migrations, and observed considerable performance improvements after the upgrade was complete.

Database

Metric	Approx. Value
Rows updated	132.5M
Rows inserted	114.7M
Temp bytes	136.97GB
Transaction commits	22.2k

Hydra performance

Metric (avg)	Before	After	Change
API P95	185ms	101ms	-45%
RSS memory	888MB	763MB	-14%
Go heap alloc	449MB	271MB	-40%
Goroutines	4015	3076	-23%
CPU	1.07 cores	0.67 cores	-37%

Self-managed OAuth for all

Opening up OAuth to all customers is an important step toward a broader Cloudflare app ecosystem. Today, any Cloudflare customer can create their own OAuth applications and build integrations on top of Cloudflare. We’re extremely excited to launch Cloudflare self-managed OAuth for all.

To get started, take a look at our documentation or jump straight to the OAuth apps page in the dashboard and create your first OAuth app.

The White House's post-quantum executive order is an important milestone. It’s time to get to work

Sharon Goldberg — Tue, 23 Jun 2026 18:25:18 GMT

On June 22, 2026, President Trump signed Executive Order 14412, "Securing the Nation Against Advanced Cryptographic Attacks." The order sets a December 31, 2030, deadline for federal agencies to transition their most sensitive systems to post-quantum encryption, and a December 31, 2031, deadline for post-quantum authentication. The EO also directs federal contractors to comply with post-quantum Federal Information Processing Standards (FIPS) by the end of 2030.

We welcome this executive order. The U.S. government has a long track record of using federal leadership and procurement to drive adoption of new technologies across the broader industry. We've seen this work with IPv6, with routing security and the Resource Public Key Infrastructure (RPKI), and with DNSSEC, and we’re glad to see this tradition continue with post-quantum cryptography.

The EO is especially important at this moment because the timeline for Q-Day, the day that quantum computers can break the public-key cryptography used across the Internet, has been accelerated. In April 2026, Cloudflare moved our own target for full post-quantum security to 2029, following research breakthroughs from Google and Oratomic. This EO updates guidance from 2024, when the National Institute of Standards and Technology (NIST) stated that the classical public key cryptography used across the Internet (namely RSA and Elliptic Curve Cryptography, which can be broken once powerful quantum computers become available) should be deprecated by 2030 and disallowed by 2035.

The Internet’s transition to post-quantum encryption is well underway, while the transition to post-quantum authentication has only just begun. Today, over two-thirds of browser traffic to Cloudflare's network is protected with post-quantum encryption, and most of our products support post-quantum key agreement. Our SASE platform, Cloudflare One, provides post-quantum encryption across all major on-ramps and off-ramps, including TLS, MASQUE, and IPsec. We've recently started deploying post-quantum authentication and aim to be fully post-quantum secure by 2029. The EO is an excellent foundation and builds on work from the previous two Administrations. We've been doing the work the EO is asking federal agencies to do since 2019, we have some thoughts on what the order gets right, we see opportunities for the Office of Management and Budget (OMB) to strengthen and facilitate cost-effective agency migration, and we provide a roadmap for how organizations and agencies can advance their transition most effectively.

The EO’s requirements for federal systems

The bulk of the EO's binding requirements are aimed at two categories of federal systems: High Value Assets (HVAs) and high impact systems. HVAs are federal information or systems designated by OMB as the government's crown jewels: systems whose compromise would significantly affect national security, foreign relations, or public confidence. These include databases that hold millions of federal employee records, systems that process classified intelligence, or platforms that manage federal financial transactions. Meanwhile, high impact systems are those where confidentiality, integrity, or availability is rated "high" under FIPS 199, meaning a breach could cause severe harm including loss of life, major financial damage, or significant degradation of an agency's ability to carry out its mission.

The EO has the power to bind federal agencies, but not other organizations (i.e., critical infrastructure, state, local, tribal and territorial governments, academia, civil society). That’s why the EO only gives these deadlines to federal agencies:

Date	Requirement
July 2026	Each federal agency head identifies a PQC migration lead and provides their name and contact details to OMB and the National Cyber Director.
September 2026	OMB issues guidance requiring each agency to: (1) review their inventory of HVAs and high impact systems; (2) plan for PQC migration; and (3) submit that plan to OMB and the National Cyber Director.
December 2030	All HVAs and high impact systems must be transitioned to PQC for key establishment.
December 2031	All HVAs and high impact systems must be transitioned to PQC for digital signatures.

National Security Systems are explicitly excluded from these deadlines. They are on a separate, classified track managed by the NSA with deadlines between 2030 and 2033 already set in 2022.

Two migrations: encryption and authentication. Both should begin now.

The EO splits the PQC migration into two phases: post-quantum key establishment (encryption) by 2030, and post-quantum digital signatures and certificates (authentication) by 2031. This accurately reflects the availability of post-quantum encryption across the Internet today. Our own deadline for full post-quantum readiness (including authentication) is 2029, but we are amongst the earliest adopters in the industry.

We are also happy to see the EO focusing on NIST-standardized post-quantum cryptographic algorithms and not Quantum Key Distribution (QKD), since QKD does not operate at Internet scale due to its need for specialized hardware and dedicated physical links between sender and receiver.

Now let’s have a deeper look at the two migrations called for and required in the EO: post-quantum encryption and post-quantum authentication.

Post-quantum encryption is needed today to stop harvest-now-decrypt-later attacks, where an adversary collects encrypted traffic today and decrypts it later once quantum computers are powerful enough. Post-quantum encryption is especially valuable for organizations handling data that will still have value to adversaries 3-10 years from now, like government agencies, banks, healthcare organizations, defense contractors, and telecom providers.

Post-quantum authentication stops an adversary that has a quantum computer from forging certificates to impersonate servers, generating malicious code signatures, or gaining unauthorized access to systems. Post-quantum authentication is needed only after Q-Day risk materializes, because it stops attacks that are possible only once a cryptographically-relevant quantum computer (CRQC) exists.

It’s important to put the migration timelines in context with advancements in quantum computing. In addition to yesterday’s EO on post-quantum security, President Trump also signed an EO to accelerate deployment and commercialization of quantum computing, sensing, and networking. The fact that the EO sets a 2031 deadline for post-quantum authentication tells us something important: the U.S. government believes there is a non-negligible chance that a CRQC could be operational around that time.

Road to Quantum Safety

What about the state of these two technologies? The migration to post-quantum authentication is a bigger challenge than post-quantum encryption for a few reasons, including:

Post-quantum ML-DSA digital signatures are larger than classic digital signatures, which could have an impact on performance of some systems, for instance in short-lived TLS connections. That’s why we are working with Google Chrome on Merkle Tree Certificates to solve the performance problem for TLS.
The dependency chain for post-quantum authentication is longer, requiring coordinated upgrades across clients, servers, certificate authorities, certificate transparency logs, root stores, and browsers.
There is only limited ecosystem deployment of post-quantum authentication so far, as compared to the much broader deployment of post-quantum encryption.

It is interesting that the EO sets a one-year gap between the encryption and authentication deadlines. One extra year of calendar time is tight, so this work cannot proceed sequentially. The ecosystem needs to start working on both of these targets concurrently, or we will miss this 2031 deadline.

Cryptographic deployment across the Internet cannot happen without standards developed by the Internet Engineering Task Force (IETF). They are working to transition their protocols to post-quantum cryptography. The TLS community is ahead, with the IETF PLANTS working group making good progress on post-quantum certificates for TLS. There is much work to do here, and we look forward to supporting the IETF in its efforts.

Supply chain pressure that helps everyone

The EO includes requirements for federal contractors, which may turn out to be the most impactful part of the EO.

Namely, the FAR Council must publish proposed rules requiring "covered contractors" to comply with NIST FIPS incorporating PQC algorithms by December 31, 2030 (Sec. 6(c)). The FAR Council must also publish proposed rules requiring contractors to implement vulnerability disclosure programs that cover cryptographic vulnerabilities (Sec. 6(d)). These proposed rules need to go through notice-and-comment rulemaking, but the EO has a December 31, 2030, target which is still important. This deadline is one year earlier than federal agencies are required to complete their post-quantum authentication migration, so that federal contractors will be ready before agencies hit their own deadlines.

Federal agencies can only migrate to PQC if the products they buy support PQC. To put this into practice, CISA released its Product Categories for Technologies That Use Post-Quantum Cryptography Standards, drawing a clear line between technologies where PQC is already "widely available" versus those still "transitioning." The "widely available" list includes cloud platforms (IaaS, PaaS), web browsers and servers, chat and messaging software, and endpoint security products like full disk encryption. For these categories, CISA's guidance is clear: organizations should procure only PQC-capable products. The "transitioning" list, where PQC is not yet widely available, includes networking hardware (routers, firewalls, switches), identity and access management systems (HSMs, certificate authorities, identity providers), email servers and clients, and database systems.

By telling contractors their products must be PQC-compliant by 2030, and directing agencies to immediately favor PQC-capable vendors in mature markets, the federal framework forces the vendor ecosystem to ship PQC-capable products on a fixed timeline. Products that vendors build to federal requirements will end up used by hospitals, banks, universities, and small businesses, which makes PQC support more broadly available. Cloudflare is among the many vendors subject to these requirements, and because networking software and cloud services are already designated by CISA as widely available PQC categories, we've already shipped post-quantum encryption across most of our products at no extra cost.

Critical infrastructure and PQ for everyone

The EO also speaks to critical infrastructure: energy, financial services, water, transportation, telecommunications, healthcare, and other systems whose failure would have a serious or significant impact on the country. While the EO has no hard migration deadline for critical infrastructure owners and operators, the EO directs certain federal agencies to "assist" critical infrastructure owners and operators with their PQC migration plans (Sec. 5(a)).

While the EO focuses mostly on federal agencies and critical infrastructure in the U.S., post-quantum cryptography is important to every Internet-connected individual and organization. Harvest-now-decrypt-later attacks are a risk today. And after Q-Day, the risk of unauthorized access by an adversary armed with a quantum computer will impact any organization, big or small. When we launched free universal SSL in 2014, our CEO Matthew Prince wrote:

Having cutting-edge encryption may not seem important to a small blog, but it is critical to advancing the encrypted-by-default future of the Internet. Every byte, however seemingly mundane, that flows encrypted across the Internet makes it more difficult for those who wish to intercept, throttle, or censor the web.

We feel the same way about post-quantum cryptography. That’s why every post-quantum upgrade we build is available to all customers, on every plan, at no additional cost.

Opportunities for OMB’s implementation guidance

The EO sets the direction, and now OMB has 90 days to provide important clarifications and operational guidance to achieve the most effective PQC migration across federal agencies (Sec. 4(b)). Based on what we've learned from our own PQC migration, here are a few elements that we suggest that guidance should include:

Define what it means to “transition.” The EO requires agencies to "transition" their systems to PQC, but it never defines what "transition" means. Does it mean the system supports PQC algorithms? That it prefers them? Or that classical cryptography has been disabled entirely?

These are very different security postures. A system that supports ML-KEM but still allows a classical-only TLS handshake is vulnerable to downgrade attacks. An adversary capable of intercepting traffic could force the connection back to classical key exchange. The system would have "transitioned" to PQC in name, but still be vulnerable to the same quantum attacks the order is trying to prevent.

History is instructive. When SSLv3 was deprecated after the POODLE attack in 2014, servers kept SSLv3 enabled for backwards compatibility, allowing attackers to force connections to downgrade and then exploit SSLv3's weaknesses. It took years for the ecosystem to actually turn SSLv3 off. To avoid repeating this pattern, we need a clear definition of “done” that includes disabling quantum-vulnerable cryptography to prevent downgrades.

Crypto agility: Crypto agility is the ability to swap cryptographic algorithms without re-architecting your systems. The EO mandates migrating to specific NIST crypto standards, but says nothing about building systems that can swap cryptographic algorithms if these algorithms need to change in the future. Crypto agility doesn't mean supporting every algorithm at once. It means building systems so that when the community converges on a better algorithm in the future, the upgrade is a configuration change, not a re-architecture. The OMB should include this in its guidance.

CBOM or quantum impact inventory? The EO directs CISA and NIST to publish guidance on the minimum elements for a cryptographic bill of materials (CBOM) within 270 days (Sec. 5(d)). A CBOM is an inventory of the cryptographic algorithms, protocols, and implementations used in a given hardware or software product, similar to a software bill of materials (SBOM).

In theory, CBOMs are a good idea. In practice, we'd caution against treating exhaustive cryptographic inventories as a prerequisite for action. A detailed CBOM of every algorithm in every library in every product takes a long time to produce, it can take federal agencies an entire procurement cycle of discovery tooling and consulting, and it potentially becomes stale by the time the inventory is complete. Also, a CBOM doesn’t list systems that should be using cryptography but are not. And a CBOM lists keys without an understanding of their purpose, making them less useful for organizations trying to understand the risk associated with a quantum-vulnerable key.

We think that a quantum impact inventory is a more productive framing. What would be the impact if the system or its data is compromised? How likely is that to happen? What measures can be taken to mitigate the risk, whether a drop-in replacement, a software update, or a compensating control like tunneling traffic over bulk post-quantum connection or isolating it from the Internet? How feasible is each option and what dependency chain does it create? Identifying these informs where to take action first. You can fill in the details of a full CBOM over time if that makes sense for your organization, but you should start by discovering your most exposed and impactful systems.

Making post-quantum cryptography affordable to all. True national resilience fails if post-quantum cryptography is treated as a gated luxury rather than a universal baseline. OMB policy must resist vendor lock-in or toll booths that leave underfunded critical infrastructure behind or increase technical debt at federal agencies.

What to do now: don't wait for 2030

You do not have to wait for 2030 or an exhaustive cryptographic inventory to start your migration. History has shown that updating cryptography is hard and can take a long time; other organizations should start sorting out their migrations as well. So as we wait for OMB guidance for federal agencies, here’s what we recommend for all organizations:

Protect your Internet traffic now. Start with traffic that crosses the public Internet, because that is the easiest for adversaries to harvest now and the most immediately at risk. If your web traffic flows through Cloudflare, your connections are largely protected with post-quantum encryption. If your enterprise network uses Cloudflare One, your private network traffic is also protected. If your provider doesn't support post-quantum encryption, switch to one that does. Even if the individual applications running inside your network haven't been upgraded yet, start tunneling your traffic through post-quantum encrypted infrastructure to protect it in bulk, even if individual systems are not yet inventoried and upgraded.

Update procurement. Make "post-quantum encryption by default, at no additional cost, with a clear roadmap for post-quantum authentication and crypto agility" a requirement in every technology procurement. If your vendor charges extra for post-quantum security or doesn't have a roadmap or plan, ask why or find another vendor.

Quantum impact inventory. For traffic that stays inside your private network perimeter and is not exposed to the public Internet, the harvest-now-decrypt-later risk is lower because an adversary would need to be on your network to capture it. But you still need to know what cryptography your internal systems use, so you can plan your migration. Use a quantum impact inventory as a tool to prioritize your efforts, for example focusing on systems or connections that handle sensitive data or are exposed on the public Internet.

Plan for authentication now. The 2031 deadline for post-quantum authentication will come faster than you think. Start identifying your long-lived keys, root certificates, and code-signing infrastructure. These are the highest-priority targets for a quantum attacker, and they have the longest dependency chains to upgrade. Now is a great time to update your software libraries and automate certificate provisioning even if post-quantum certificates are not yet available in your ecosystem. And make sure your vendors are planning to be ready for the looming post-quantum authentication deadline.

Aligning policy and international standards

At the same time, work should also start now on aligning global government policy with international standards. We were glad to see that Section 5(b) directs the State Department to engage foreign governments and industry groups to encourage adoption of NIST-standardized PQC algorithms.

Here’s why this matters. Cryptography migrations cannot be run in a vacuum, with each country operating within its own borders. A TLS connection between a U.S. person and a server abroad only works if both ends negotiate the same cryptography. NIST has been running open international cryptographic competitions for decades. The AES competition (1997-2001) produced the encryption standard used across the Internet today, selecting a cipher designed by Belgian cryptographers. The SHA-3 competition (2007-2012) produced the latest hash standard, selecting an algorithm designed by a Belgian-Italian team. The PQC competition (2016-2024) followed the same open model: anyone could submit, anyone could analyze, and the winning algorithms were designed by international teams. ML-KEM, the key agreement standard now being deployed across the Internet, was created largely by European cryptographers. These are open, internationally vetted algorithms. NIST organized the competitions, but the results belong to the global cryptographic community.

The risk ahead is fragmentation. If different jurisdictions mandate different algorithms, the result is cipher bloat and increased attack surface: more code to write, test, and audit, more surface for downgrade attacks, and slower deployment for everyone. We've seen this happen firsthand in IPsec, where the lack of an interoperable standard led vendors to ship proprietary PQ key agreement algorithms that couldn’t interoperate, delaying the migration by years. The TLS community went the opposite way, converging on a single hybrid key agreement (X25519MLKEM768), and deployment followed quickly.

We are big fans of NIST, and especially its leadership in vetting standards globally and standardizing cryptography worldwide. We encourage the Trump Administration to work with Congress to ensure that NIST has appropriate resources, staffing, and tooling to meet current and emerging deliverables in this EO and others, like America's AI Action Plan.

We'd like to see State Department-led engagement drive real alignment: adoption of the same NIST algorithms across allied nations, alignment on timelines, and mutual recognition of cryptographic algorithms and modules. The Internet is one network, and its cryptography should be one standard.

Speeding up CMVP

As a final note, the EO directs NIST to revise the processes used by the Cryptographic Module Validation Program (CMVP) to accelerate validations of cryptographic modules (Sec. 6(b)). Having bumped up against the CMVP program for years, we are extremely happy to see this in the order.

CMVP exists for a good reason. Federal agencies and their contractors need a way to verify that the cryptography inside a product actually does what it claims: that AES is implemented correctly or that random number generators have enough entropy. CMVP has been tuned for a steady state where cryptography doesn’t change much.

Going forward, CMVP needs to be adjusted to accept the realities of the impending migration. We welcome the FedRAMP update stream that allows updated modules to be used immediately before final validation. This allows faster adoption of post-quantum cryptography, and correction of implementation errors that were missed in validation. Similar allowances for CMVP are essential.

Go forth and PQ all the things

This post-quantum EO is a meaningful step. It sets real deadlines and creates supply chain pressure that will accelerate adoption across the industry.

For organizations starting their own migration, we suggest you start by protecting your public Internet traffic along with updates to your procurement requirements, followed by a quantum impact inventory to figure out where to focus next. Do not let cryptography inventory slow you down from deploying post-quantum encryption across your most sensitive systems immediately.

Cryptographic deployment across the Internet depends on standards developed by the IETF. The TLS community is further along, but there is lots more work to do across other protocol communities, and we look forward to supporting those efforts.

Let us go forth and PQ all the things, quickly and together. Free TLS helped encrypt the web. Free post-quantum cryptography will help secure it for what comes next.

You can get started now on Cloudflare by visiting our PQC page.

How we found a bug in the hyper HTTP library

Deanna Lam — Mon, 22 Jun 2026 18:00:00 GMT

The Images service, built in Rust on Workers, runs on every machine in Cloudflare’s edge network. To handle client connections, we use hyper, an open-source HTTP library for Rust.

Last year, we introduced the Images binding to enable custom, programmatic workflows for processing remote images in Workers. At the end of 2025, we rearchitected the binding to provide a more direct, local connection between the Workers runtime and the Images service.

Shortly after rollout, we received reports that transformation requests from the binding were failing — but only intermittently and only for larger images. Even stranger, the responses for these requests returned a 200 status without any errors logged. The image data was simply cut short: A response that should have been two megabytes might arrive with a few hundred kilobytes instead.

We spent six weeks chasing a nearly invisible bug — a race condition that occurred only under specific conditions — in the hyper library that impacted how the Images binding returned processed image data back to the client. In the end, it took four lines of code to fix it.

Hops, handoffs, and hyper

When developers build on Cloudflare, they compose full-stack applications from a set of platform services that are accessible to Workers through bindings. Bindings provide direct APIs to resources on the Developer Platform like compute, storage, AI inference, and media processing.

The Images binding decouples image optimization from delivery; you can transcode, composite, or manipulate images without needing to return the output as an HTTP response. It also lets you apply optimization parameters in any order, rather than following the fixed sequence imposed by the URL interface. Here, a worker can pass image data directly to the Images API, chain operations together, and get the processed result back as a stream:

const result = await env.IMAGES
  .input(image)
  .transform({ width: 800, rotate: 90 })
  .output({ format: "image/avif" });
return result.response();

At a high level, this is how image data moves through our various services:

^{The pipe represents a socket connection between the intermediary and Images, where data is handed off from one process to the next through the kernel’s buffer.}

The binding communicates with Images through a socket connection managed by the Workers runtime. A socket connection is a communication channel between two processes. Each end of the socket has buffers that are managed by the operating system’s kernel; these buffers are temporary holding areas where data sits after one side writes it but before the other side reads it.

Hyper manages the connection on the Images service’s side, reading incoming requests from the socket and writing responses back to it.

When a request uses the Images binding, the Images service reads the input, performs the requested optimization operations, and encodes the result. It then passes the entire encoded image to hyper as a single in-memory block.

Hyper writes this response data into its own internal buffer. At this point, hyper considers the encoding work as complete, since it has all the bytes that it needs to send. The next step is to flush its internal buffer to the socket’s outbound buffer, moving the data from the Images service to the intermediary on the other end.

If the reader on the other end is fast, then hyper can flush everything in one pass — the outbound buffer will have room because the reader is consuming data as quickly as it arrives. Once all data is sent, hyper issues a shutdown on the socket, signaling that the connection is finished and no more data will be written. But if the reader is slower (even by a few milliseconds), then the outbound buffer fills up, and hyper needs to wait until there’s room to continue writing.

Taking the local

All incoming traffic on Cloudflare's network passes through FL, an internal intermediary service that runs security and performance features and routes requests to the appropriate backend. When we first launched the binding, image data flowed from the Workers runtime, through FL, to the Images service.

This path was a natural fit for our initial release and follows the same architecture as our URL interface. Over time, though, this coupling with FL became a constraint: Every change to the binding had to follow FL’s release cycle.

In December 2025, the Images team replaced FL with a new intermediary service, an internal worker binding that runs on the same machine. In the original architecture, data moved through FL over network sockets; this path carried the overhead of FL’s full processing pipeline, such as DNS lookups and routing.

The internal binding replaced these with Unix sockets to directly connect the services on the same machine, bypassing FL and the overhead of the network stack. This made the request path to Images faster and gave the team independent control over binding releases.

Within days of the rollout, we received our first customer report.

200 OK (not OK)

The first sign of trouble came from a customer with a non-standard setup: two layers of image processing, where one pipeline was nested inside another.

First, their worker used the Images binding to composite multiple large source images from R2 — a JPEG background plus PNG overlay layers — into a single combined JPEG. Second, they further compressed, transcoded, and resized the result through the URL interface.

^{The bug originated in the inner pipeline’s return path, where the response was truncated before reaching the outer pipeline.}

The inner pipeline (transformation binding) handled compositing. The outer pipeline (transformation URL) handled delivery optimizations like scaling and format conversion. This layered approach meant that when the inner pipeline silently returned a truncated response, the only visible error appeared one level up:

error reading a body from connection: end of file before message length reached

The outer pipeline received HTTP 200 from the inner one, with a Content-Length header that promised several megabytes. The actual body was only a fraction of that: In one request, only ~200 KB arrived out of an expected 3.3 MB. The error surfaced in the outer pipeline, but the truncation could have originated in the binding, the intermediary service, the Images service, or somewhere in between.

When a browser receives a truncated image, the result is visible. Depending on the format, the image either renders partially (e.g., with the bottom half missing or gray) or fails to decode entirely, instead displaying a broken image.

Debugging in the dark

From here, we worked inward through the request path, testing each layer to isolate where the truncation was happening. Some of these efforts hit dead ends; others left breadcrumbs that narrowed the search:

Building a reproduction. We built a worker that mimicked the customer’s nested setup, then stripped away layers until we could trigger the bug with the binding alone. A small script let us fire requests in batches. In one early run, 19 out of 25 requests failed. The amount of data that did arrive — roughly 200 KB — was suspiciously close to the size of the socket buffer in production. This confirmed that the problem wasn’t tied to the customer’s configuration and gave us a reliable way to trigger the bug on demand.
Investigating timeouts. Early on, we suspected the truncation might be related to timeout behavior (i.e., the connection was being closed after a time limit). This theory didn’t hold, as the truncation wasn’t correlated with request duration.
Updating hyper version. When the bug was first reported, we were running 0.14.x, while the latest hyper version was around 1.8.x. We tested across hyper versions 0.14, 1.7, and 1.8, just in case the most obvious answer was the correct (and easiest) one. But the bug appeared in each version, which meant that there wasn’t an upstream fix.
Reproducing locally. We ran local integration tests on macOS and a Debian VM. Even under considerable load, our local requests never triggered any failure. Making direct curl requests to the binding socket and replaying captured requests always seemed to work. The bug only appeared on the full production path when there was real concurrency and a real Workers runtime client on the other end of the socket. This led us to suspect the runtime itself.
Ruling out the Workers runtime. We examined the HTTP client that the Workers runtime uses to communicate with Images through the binding socket. None of the traces from either side of the connection showed any syscalls that indicated an unexpected close or early termination. We observed that the client behaved correctly and multiple other services used the same client without issues.
Distributed tracing. By inspecting request traces end-to-end, we confirmed that the truncated body was already present before it reached the outer transformation layer in the customer’s setup. That narrowed the problem to the inner pipeline — the binding path through the Images service.
Instrumenting the intermediary service. We added instrumentation to the intermediary service to measure body sizes before forwarding the response data. The bodies were already truncated by the time they left the Images service, so the intermediary was ruled out.
Deeper tracing within the Images service. At the service level, the request was processed, the image was properly encoded, and the response was sent with HTTP 200.

The only consistent signal was that the bug was timing-dependent: It appeared only on the production path, with real concurrency, and only for larger images.

A kernel of truth

Tools for application-level debugging told only what the system thought it was doing. But according to the system, everything was fine: Tracing said the response was sent; logging reported no errors, and the Images service returned 200 on every request.

To see what the system was actually doing, we attached strace to the Images service. strace records the syscalls that a process makes to the kernel, which could show us exactly which bytes were written, when a shutdown was called, and whether the client sent any termination signal.

Setting up the trace was delicate. strace works by intercepting syscalls as they happen, which adds a small amount of timing overhead to each one. Filtering for a narrow set of syscalls kept that overhead minimal. Broadening the filter, however, slowed the process just enough to shift the timing between the flush and the shutdown check — and make the bug disappear entirely. That alone reinforced our theory that the issue was timing-sensitive.

Using a reproduction worker, we triggered the bug and compared the syscall output between successful and failing requests.

In a successful request, the response is written in chunks as the socket buffer allows, with shutdown called only after all the data is sent. For example, this may look like:

sendto(42, "HTTP/1.1 200 OK\r\nContent-Length: 14991808\r\n...", ...) = 219264
sendto(42, "\xff\xd8\xff\xe0...", 292352) = 292352
// ... keeps writing until buffer drains ...
sendto(42, "...", 292352) = 292352
shutdown(42, SHUT_WR) = 0

When we reproduced the bug, a failing request looked like:

sendto(42, "HTTP/1.1 200 OK\r\nContent-Length: 14991808\r\n...", ...) = 219264
shutdown(42, SHUT_WR) = 0

Here, there is only one write — just enough for the headers and a sliver of the body — before the shutdown is immediately called. Out of a 14.9 MB response, only about 219 KB was sent. The remaining ~14.8 MB of image data never left hyper’s internal buffer, nor was there any termination signal from the client between the write and the shutdown. Instead, the Images service prematurely shut down the connection on its own, genuinely believing it was finished.

The failing requests confirmed that the bug was a race condition that triggered intermittently. Whether a request succeeded or failed depended on whether the flush and shutdown operations overlapped, which changed from request to request. When the buffer was still full at the exact moment that hyper decided the connection was finished, data was lost.

^{When the reader consumes slower than hyper writes, the outbound buffer fills up. If hyper shuts down the connection before the buffer drains, then only a fraction of the response makes it to the intermediary; this incomplete data gets forwarded back to the Workers runtime and the client.}

The December rearchitecture didn't introduce this bug, which had been present in hyper for years across multiple major versions. But the new intermediary changed who was reading on the response side of the socket. Our working theory is that FL, the previous intermediary, consumed data fast enough that the socket buffer rarely filled during a response. The new reader read at a pace that occasionally let the buffer fill during larger responses.

These few milliseconds of backpressure, introduced by an improvement that made everything else faster, were all it took to surface a flaw that had been hiding in plain sight.

Inside the dispatch loop

Hyper's HTTP/1 connection lifecycle is driven by a state machine in a file called dispatch.rs. It runs a loop that reads requests, writes responses, flushes the write buffer to the socket, and decides when to shut down. In simplified form:

fn poll_loop(&mut self, cx: &mut Context<'_>) -> Poll> {
    loop {
        let _ = self.poll_read(cx)?;
        let _ = self.poll_write(cx)?;
        let _ = self.poll_flush(cx)?;

        if !self.conn.wants_read_again() {
            return Poll::Ready(Ok(()));
        }
    }
}

More precisely, the let _ before poll_flush is where the bug lives.

In Rust, let _ = expr discards the expression's result, including Poll::Pending, the signal that the flush isn’t done yet. The flush might still have megabytes sitting in its buffer, but the loop never finds out.

When a request fails, this is the exact sequence of events:

The Images service finishes encoding the image and hands the entire response to hyper as a single in-memory block.
Hyper writes the block into its internal buffer and marks its write state as Writing::Closed. From an encoding standpoint, the work is done — there is nothing left to encode.
Hyper calls poll_flush to move the buffered data to the socket. In our previous example, the socket accepted about 219 KB. The remaining ~14.8 MB stays in hyper's buffer. The socket is full, so the kernel returns Poll::Pending.
poll_loop discards the Poll::Pending with let _.
It checks wants_read_again(). The full request was already received, so this returns false.
poll_loop returns Poll::Ready(Ok(())), signaling that the loop is finished, even though the flush is not.
poll_shutdown() fires. The SHUT_WR syscall is issued.
The client receives 219 KB and an EOF (end-of-file) indicating that the connection is closed, even though it expects 14.9 MB.

In the second step, hyper marks the write operation as complete as soon as the response body is buffered (i.e., when encoding is finished), rather than when it has actually been flushed. Most of the time, the flush completes in a single pass and this distinction is invisible. On the rare occasions when the socket buffer is full, the flush has to wait — even though hyper doesn't. The bytes are still sitting in hyper’s buffer, waiting to be flushed to the socket. Hyper proceeds to shut down the connection with this data still in the buffer.

This also explains why curl never triggered the bug. Curl reads data as fast as it arrives: The socket buffer never fills, the flush always completes immediately, and the discarded return value is harmless. The production path, with a reader that occasionally paused for a few milliseconds, was the only configuration where the buffer filled at exactly the wrong moment.

Don’t forget to flush

After weeks of investigation, the fix itself was conceptually simple. Hyper needed to check whether the flush was actually done before moving on.

Our reproduction worker confirmed that the bug existed, but it couldn't tell us why a given request failed. Before writing the fix, we needed a test that could trigger the exact socket conditions inside hyper.

We knew the conditions that triggered the bug: a socket that accepts one chunk of data and then blocks. To test with a controlled scenario, we built a custom wrapper around a TCP stream that simulated a full socket buffer. The wrapper accepted 8 KB on the first write, then returned Poll::Pending on every subsequent write, mimicking a reader that stopped draining the buffer.

The test sent a 500 KB response through this constrained socket and checked whether hyper called shutdown while 492 KB was still buffered. Without a fix, it did. With the fix, it waited.

Initially, we applied the fix in hyper’s dispatch loop. Instead of discarding the result of poll_flush, we checked to see whether the flush was actually done:

let flush_result = self.poll_flush(cx)?;

if flush_result.is_pending() {
    return Poll::Pending;
}

if !self.conn.wants_read_again() {
    return Poll::Ready(Ok(()));
}

If the flush hasn't completed, then the loop returns Poll::Pending to the asynchronous runtime. The runtime waits for the socket to become writable, then wakes the task back up to continue the flush. The connection shuts down only after all data has been sent.

When we deployed this fix, we observed that every byte was written and the shutdown was called only after the buffer was actually empty. The customer who made the first report also confirmed that the issue disappeared.

While our initial solution worked, the dispatch loop wasn’t the right place for the fix. Returning Poll::Pending early could slow down other operations on the same connection by reducing how frequently reads are polled, causing unintended backpressure. It also doesn't correctly handle keepalive connections, where a single connection handles multiple requests in sequence — these should remain reusable even while the previous response is still being flushed. Neither issue affected our particular service (where keepalive is disabled), but both could affect other hyper users if the fix were contributed upstream.

We traced through hyper's connection lifecycle and found a more targeted approach. Rather than changing how the dispatch loop behaves, we applied the fix at the point where shutdown is actually called. Before shutting down the socket, hyper should first flush any remaining data in its buffer:

pub(crate) fn poll_shutdown(
    &mut self,
    cx: &mut Context<'_>,
) -> Poll> {
    ready!(self.poll_flush(cx)?);
    Pin::new(&mut self.io).poll_shutdown(cx)
}

This leaves the dispatch loop unchanged. It adds a flush only at the exact point where data loss would otherwise occur — the moment before shutdown.

What stayed with us

None of the tools at the application level surfaced any errors, crashes, or log entries that provided useful clues. Application-level observability can have a blind spot for bugs that live below its awareness.

The failure occurred intermittently, scaled with response size, couldn’t be reproduced with simple tools like curl, and disappeared when we observed the system more closely. These signals pointed to a timing-dependent bug in the connection layer, not in the application logic.

Our breakthrough came from using kernel-level tooling with strace, the one layer that records what actually happened on the socket. The underlying bug lived in the few milliseconds between a partial flush and a premature shutdown — a window that opened only after we made the system faster.

We merged our fix and the deterministic test into hyperium/hyper via PR #4018. It will be available in a future hyper release, ensuring that any service using hyper’s HTTP/1 implementation won’t lose response data to the same race condition.

In the meantime, we’re running an internal fork with the patch applied. This fix stabilized the binding’s architecture, creating a reliable foundation to expand its functionality.

The Images binding initially covered only transformations of remote images. Earlier this month, we announced that the Images binding now supports operations for hosted images, giving developers a unified way to build media-rich applications on Cloudflare.

Read more about how the binding works in our documentation.

Temporary Cloudflare Accounts for AI agents

Sid Chatterjee — Fri, 19 Jun 2026 13:00:00 GMT

Everyone's writing code with AI agents today. But the moment an agent needs to deploy something — and needs to sign up and create an account — it slams face-first into a wall built for humans: a browser-based OAuth flow, a dashboard to click through, an API token to copy-paste, a multi-factor authentication prompt to satisfy. For an interactive copilot sitting next to a developer, that's annoying. For a background agent, it's a hard stop.

Today we're rolling out Temporary Cloudflare Accounts for Agents.

Agents can now deploy websites, APIs, and agents right away, without first needing to sign up for an account.

Any agent can now run wrangler deploy --temporary and deploy a Worker to Cloudflare. This temporary deployment stays live for 60 minutes, during which time you can claim the temporary account, making it permanently your own. If you don't, it expires on its own.

Our goal? Let your agent code and ship.

Why frictionless deployments matter for AI agents

Frictionless temporary accounts matter more than it might first seem:

Background AI sessions have no human in the loop, and are becoming the norm. Any auth step that needs a browser, a copy-paste, or "click here in 60 seconds" means an agent gets stuck and may choose to deploy elsewhere.
Trial-and-error is the agent's superpower. Agents need a tight write → deploy → verify loop. They need cheap, throwaway deployment targets, so they can curl their own output and decide whether they got it right.
Agent platforms are building their own ways for deploying code to "just work" without extra steps or credentials. People are starting to expect that this process just works, without the need to sign up for other services that they've not used before or heard of.

How it works

Temporary accounts are built around Wrangler, our Developer Platform command-line interface (CLI) tool that lets developers bootstrap new projects, manage their configurations and resources, and deploy and update them.

Wrangler usage is widely documented online and agents know how to use it very well. But if you hadn’t yet signed in and granted Wrangler permission to your Cloudflare account, when the agent tried to deploy, it would get stuck at the sign-up and authentication step. And you might rightly ask: How do agents and LLMs know that this new --temporary flag in Wrangler exists, so that they actually use it without a human explicitly telling them to do so?

To solve this, we updated Wrangler to prompt the agent with a message that tells it about the --temporary flag:

When the agent discovers this, and then runs wrangler deploy again with the --temporary flag, Cloudflare provisions a temporary account for the agent to use, gives Wrangler an API token to work with, and provides a claim URL that the agent can give back to the human.

Let’s go over every step of the flow

Deploying and iterating on a new project

Make sure you’re using the latest Wrangler release, fire up your favorite coding agent, and write a prompt to deploy a "hello world" app in build mode:

Make a very simple hello world Cloudflare Worker in TypeScript and deploy it using wrangler, don't ask me questions, do the best you can

The agent will run wrangler, pick up the --temporary flag from the output messages, build your script, and deploy it instantly, no human in the loop required:

As you can see, the agent wrote the script, deployed it using the --temporary flag, curled the preview link it got from the output, and verified that the result matches the code.

This is great, but agentic coding is often not about one single deployment. A session can go through a cycle of multiple code changes. This is not a problem: the agent can iterate on the Worker script and redeploy the changes as many times as it wants (within the 60-minute claim window). Type this prompt:

Now change hello world to "hello cloudflare" and redeploy

Look at the agent changing the source code, reusing the previously created temporary account, redeploying a new version and rechecking the result:

Claiming the account

At any point, you can claim the temporary account and make it yours permanently. When you click the claim link you will be taken to a page where you can either sign up for or sign in to Cloudflare, and then claim the temporary account that your Worker was deployed to. This includes claiming not just Workers, but resources like databases and other bindings, too.

If you do not claim these temporary accounts within 60 minutes, they will be automatically deleted.

The road to frictionless agentic deployments

This is just one way we’re eliminating the signup barrier for agents. We recently announced a partnership with Stripe and a new protocol we co-designed that lets agents provision Cloudflare on behalf of their users — creating an account, starting a subscription, registering a domain, and getting an API token to deploy code, with no copy-pasting tokens or entering credit card details. Last month, we collaborated with WorkOS on the launch of auth.md, which anyone can adopt, to let agents provision new accounts using well-established, existing OAuth standards.

There’s a ton going on in this space, and we’re excited to keep making it easier for agents to use Cloudflare, and for developers to make their own apps agent-ready. Temporary accounts are one more step toward frictionless agentic deployments — stay tuned for more.

Temporary accounts have some limitations, and their capabilities may change over time; check the developer documentation for more information and then go build something. Point your agent at Cloudflare, see how far it gets, and tell us what we can improve or what delights you — share what you’ve built on X or hop into the Cloudflare Community.

Build your own vulnerability harness

Dan Jones — Thu, 18 Jun 2026 17:59:40 GMT

A few weeks ago, we published our initial findings from Project Glasswing, looking at what happens when you point frontier security models at an enterprise codebase. We also explored how our defensive structures adapt to protect our infrastructure and customers from threats posed by frontier AI. Since then, the AI ecosystem has continued to shift rapidly — developers who've built tightly around a single model have already experienced what happens when that model is no longer available or gets superseded by a more capable one. These market shifts only reinforce our core thesis: no matter which underlying model is leading the pack on any given day, the future of agentic workflows will not be found in standalone models, prompts, or single-agent sessions.

Moving from a localized security "skill" to a continuous, fleet-wide scanning pipeline requires an architecture where models are treated as interchangeable components. Relying on a single model inherently limits defensive coverage, as the same system will tend to look at code paths through the exact same lens. To counter this, models should be frequently interchanged and cross-tested. By varying the models across the pipeline — such as using one model for initial discovery and an entirely different one for validation — we can ensure that vulnerabilities are cross-checked by distinct sets of logic. Furthermore, a true enterprise-scale harness must look beyond isolated repositories to trace vulnerabilities across cross-repo dependencies, ultimately filtering thousands of raw candidates down to a trusted, triaged queue of actionable fixes.

This post serves as a practical look at how to build that model-agnostic layer, focusing on how we manage state controls, eliminate false positives, and coordinate end-to-end triage at scale.

Two objections, up front

The first post made the case for why generic coding agents can't do this job. The main issue is that agents only hold one hypothesis at a time, fill their context window after covering a sliver of a real repo, and then lose information during context compaction. For more details, read that post.

Before we move forward, we would like to answer two likely questions.

"Why not use subagents instead of a harness?" Subagents are useful, and they are a good starting point. But security analysis needs hundreds of separate investigations that survive across runs, don't share a context window, and can be re-scoped and cross-referenced later. It needs persistence, deduplication, resumability, and eventually fleet-wide dependency tracing. That's an orchestration problem, and a prompt can't get you there.

"Is this blog post just an ad for frontier models?" No. Our approach centers on the harness, not the model. When it comes to vulnerability discovery, we run it with whatever frontier model is currently best at what we need. When we point different models at the same target, they each turn up a different share of the bugs. The harness is the bit that lasts. If you build your own system, design it to be model-agnostic from day one. This will allow you the freedom to use any model of choice without constraints.

It all starts with a skill

We started with a ~450-line security-audit skill that we ran on a single repository, and adjusted the prompts until we surfaced real bugs. Later, we added the orchestration that became the plumbing of the entire system. The real value lives in the prompts themselves, and our prompts continue to carry the initial skill's attacker scenarios, bug classes, and anti-pattern detections nearly unchanged.

The skill was written to run a 7-phase audit in one session:

Three parallel research agents do recon and write an architecture.md.
One Hunter agent runs per class attack, trying to break the code rather than review it.
Adversarial validators try to disprove each finding.
The survivors are written up as a human-readable vulnerability report.
They're also emitted as findings.json against a schema, and a mechanical check validates that file.
Finally, a fresh agent independently re-verifies every finding against the source.
The surviving, re-verified findings are submitted to the ingest API.

That first skill maps almost directly onto the later harness:

Skill phase	Harness stage
Recon agents write `architecture.md`	Recon
Hunters run per attack class	Hunt
Validators disprove findings	Validate
Surviving findings become a report	Report
`findings.json` is checked mechanically for schema adherence, not correctness	Mechanical validation of line numbers and functions in findings
Fresh agent re-verifies findings	Independent validation

The skill worked, but it quickly revealed its limits. Looking at the coverage metrics, a single run finds only about half the bugs you'd catch across multiple runs. In our experience the ones it did find skewed toward the simpler and less subtle. Once your process is basically "run it ten times and diff by hand," you probably need to start looking at a real harness.

While running and fine-tuning the skill, we ran into three walls:

Context exhaustion: An hour in, the context window fills up and the model will cannibalize its own memory, instantly forgetting the bugs it spent all morning tracking down. We broke this bottleneck by externalizing the state entirely, treating the LLM as a stateless compute engine.

Persistence: A crash mid-run means starting over. Losing hours of work to one AI rate-limit error or connection flakiness is an incredibly expensive way to realize you need a better architecture.

Cross-repo reasoning: A single repo session is completely blind to the relationships between applications that consume it, and the number of bugs that surface when you inspect the interface between components is probably more than one might expect.

ADVICE: A real but minimal harness consists of just Recon, Hunt, and Validate stages kept in a database, alongside a separate Validator that can't file its own findings. You should skip cross-repo tracing entirely until you have more than one repository that matters. Skip a dedicated Deduplication agent until you are actively drowning in noise. Start with a skill in your development environment, get your prompts working well, and only build the next architectural stage when not having it is the specific thing slowing you down.

Codifying the skill into a pipeline

Most AI security write-ups in this space are about a single repo or a curated benchmark; running a whole fleet this way, with cross-repo tracing, isn't something we've seen written up elsewhere. Our codebase spans a massive mix of languages — Rust, Go, C, Lua, TypeScript and Python, alongside various configuration management systems, static configs, and all sorts of additional context. So we had to come up with something new that worked for us. Going from that first slash-command run to a fleet scanner that could cover 128 distinct repos, automatically finding and interrogating relevant dependencies, took about six weeks. Codification was mostly mechanical: we lifted each phase of the skill into its own agent, put a database behind it and an orchestrator in front. The mapping was almost one-to-one.

The entire fleet runs on one unified harness with no per-language tuning and traces the dependencies between repos. While offloading syntax to a model makes the system language-agnostic, the differentiator is its ability to trace dependencies between repos. The harness itself doesn’t care if it’s looking at C pointers or a TypeScript file; it focuses on the higher-level logic of security orchestration. This allows us to scale across hundreds of different codebases, without having to write custom language parsing.

A two-stage vulnerability research workflow

Our entire vulnerability research workflow is built on a two-stage operational framework: the Vulnerability Discovery Harness (VDH) and the Vulnerability Validation System (VVS).

The VDH functions as our discovery engine, proactively scanning codebases to surface potential security issues. Once bugs enter the VVS, which allows multiple harnesses to feed into it, they go through stages of Deduplication, Judgment, and finally Fixing, as we’ll talk about later.

We use one model for VDH, but we use a completely different model for VVS, so the models are effectively double-checking each other. There is an obvious security benefit to this: by forcing Model B (VVS) to judge the output of Model A (VDH), you ensure that the finding is evaluated by an entirely different set of logical weights and training data — one that acts as an unbiased, adversarial third party whose sole job is to ruthlessly stress-test Model A's assumptions. And operationally, we benefit from treating model providers like interchangeable commodities. Model providers can change temperature, caching, and inference effort budgets over time, even within one model version. Instead of building a system that depends on a model behaving predictably over time, our harness is built to absorb downstream volatility without breaking.

Stage 1: Vulnerability Discovery Harness (VDH)

The first post covered what each agent/stage is for, so we'll talk about the parts it didn't: the glue between stages, and the handful of details that decide whether any of it works.

Agent/stage	Primary Role	Sub-agents / Tooling
Recon	Maps out the target architecture and maps potential threat vectors	3 parallel Recon sub-agents write `architecture.md`
Hunt	Runs per-class attacks, compiles fragments, probes binaries	It spawns siblings (these handle between 9% and 20% of fleet-wide tasks depending on the model). It reaches out to and writes to the Wishlist tool.
Validate	Mechanically checks the finding, then adversarially disproves it	Runs in two passes: plain code handles the initial schema/path checks, then a single isolated agent tries to disprove the finding before it can be filed.
Gapfill	Generates new hunt tasks for empty coverage cells	Enqueues fresh hunt tasks for any under-tested (area × attack-class) cells that still look thin
Dedup	Identifies and consolidates overlapping findings	Combines deterministic code and agents to cluster findings by root cause, folding them together in real time
Trace	Walks dependency graph; spawns consumer-repo tasks	Walks the graph to add hunt tasks inside every identified consumer repo to make sure cross-repo bugs are caught
Feedback	Learns from pre-existing reports and optimizes future runs	Takes validation failures, shallow runs, and repeated misses, and instantly rewrites queued prompts to make future tasks sharper.
Report	Renders human-readable report	Just a script, no model required

^{Table 1: Vulnerability Discovery Harness (VDH)}

Stages four through eight run as a continuous producer-consumer loop. As the initial hunt progresses, the Gapfill, Feedback and Trace agents generate new tasks; Dedup folds overlapping findings back together and the rest of the loop keeps consuming the queue. This ensures a vulnerability discovered late in the cycle is still validated, reported and checked against other code to make sure it doesn't contain the same bug, all within the same run.

Splitting the pipeline this way guarantees strict context controls. If you fill the context window, the model starts hallucinating. We keep each agent’s job hyper-focused, keeping context usage below 25% of the total window. A naive “read all files” approach will blow past this limit every single time.

One thing that caught us out was that persistence needs to be factored in before parallelism. You do not want to throw away a five-hour run because of an unforeseen error. Every stage writes to one SQLite database keyed by (run_id, repo, stage). Any stage can resume, retry, or get pulled into a later run without redoing work. Findings are streamed and saved as they happen, so a crash costs you the task in flight and nothing else.

ADVICE: Sometimes a transient API error comes back as text in the (200 OK) response stream instead of throwing a code exception. To the orchestrator, this looks exactly like a task that finished cleanly. You must explicitly classify the response text, not just trust the exception type, or you end up logging empty runs as successes.

Dynamic threat modeling

During the Recon stage, the agent writes the threat model instead of being handed one. Beyond about ten built-in attack classes (many forms of injection, memory corruption, protocol parsing, timing side channels, and others), the Recon agent can invent repo-specific classes on the spot, each with its own methodology. It writes a custom taxonomy tailored specifically to that codebase, which is used to more tightly scope the Hunter agents.

Reading source code isn’t enough to understand how it behaves under stress, especially for subtle undefined-behavior bugs in C and other lower-level languages. The Hunter agents move past code reading and transition into active execution. They compile fragments, build small versions, and attack them. The biggest jump in quality came from giving Hunters a sandbox (built on unshare) to crash binaries.

ADVICE: If the harness itself runs inside Docker, that sandbox needs seccomp=unconfined and apparmor=unconfined or it will silently fail to start. It’s a one-line fix that saves you a day of head-scratching if you aren't an expert in nested containerization, like us.

Micro-forks and the wishlist

Beyond the core pipeline stages, we added two specialized mechanisms that grant the Hunters significant autonomy to adapt their focus and request external resources without derailing an ongoing analysis:

Sibling Forking: This helps ensure that if a Hunter agent trips over an interesting code path that is outside the current scope, it doesn’t wander off track. It uses a tool call to fork a sibling agent with a precise structural seed. Fleet-wide, this accounts for roughly 9% of tasks, though the rate is highly model-dependent — from near-zero to about a fifth, depending on which model is hunting.

The Wishlist: When an agent needs a tool it doesn't have, often a Validator confirming a Proof of Concept (PoC) or a Hunter wanting to build something (like a specific build environment, a VM, or some prod config files), it writes to a central wishlist. It provides enough context for the system to automatically re-run that exact task once a human provides the dependency. Some of these can be partly self-healing: if the container needs to be rebuilt with some changes, this can autonomously happen after the run by having a generic coding harness monitor the logs.

The wishlist has been written to 25,472 times across 128 repos since the wishlist was added, and it's the main way the agents talk back to us. One that landed while we were writing this: "I need a FreeBSD VM to confirm this PoC end-to-end."

Fleet-wide cross-repo tracing

After the initial cleanup, a Tracer agent checks how different software components are connected. It looks for a specific path: can a potential attacker send harmful input from the outside to a vulnerable part of the system? If the answer is yes, the Tracer agent automatically spawns fresh hunt tasks inside the consumer repository. To make this work, you need a unified, cross-repo symbol index and an accurate dependency graph. This allows you to uncover deep, systemic flaws that a standard single-repo scan would miss.

Running our harness across an entire fleet of repos revealed two lessons that only surfaced when this was done at scale.

First, deduplication is its own problem, big enough to need its own agents. When you are scanning a handful of repositories, you can manually eyeball overlapping bugs. Simple string matching or file-path checks won't save you here. Determining whether two complex logic flaws are actually the exact same root bug sounds trivial, but it isn't. It requires so much cognitive reasoning that we had to deploy dedicated Dedup agents just to clean up the noise, along with their own heuristics and ways of reducing the work.

The second is to not wire in static analysis early. We plumbed Semgrep all the way through, and the Hunters invoked it zero times in a month of runs. They would rather read and run the code. The wishlist, by contrast, was the single most-used tool in the system. It's worth paying attention to what the agents actually reach for, rather than what you think they'll want.

Making findings you can trust

The agent will edit the source code so its own exploit works, then triumphantly report the bug it just created. It will write a test that proves something entirely tautological like “exec() executes things, therefore critical vulnerability”. Or it builds an exploit that runs fine but proves nothing, because the threat model behind it is nonsense. If your harness doesn't actively fight this, all you've built is a faster way to produce junk.

A Hunter has to state the threat model before it's allowed to file anything. It has to define exactly who the attacker is, and what boundary the vulnerability crosses or what assumption it breaks. The output schema ordering enforces it. This requirement eliminates the vacuous findings, the "if a user has database write access, they can write to the database" kind.

Every confirmed finding ships with a PoC written as a test that runs against the original, untouched codebase. This prevents the agent from editing the source files to force an exploit to land. If there is no working PoC, we treat the finding as fake. In practice, that's a Hunter compiling a thirty-line parsing loop, running it with memory protection enabled, and demonstrating that the incorrect read stride is originating from a stack address rather than the expected message body. You can re-run it yourself. Furthermore, every confirmed finding must also ship a proposed patch. What actually reaches our review queue is a verified bug, a working test, and a functional git diff, not just a vague text description of a problem.

Before an exploit path survives, deterministic code (written in plain code, not another model) mechanically verifies that the cited files and paths actually exist, and confirms that both the patch and the test parse correctly. This Validator cannot log findings of its own; its sole job is to aggressively disprove the Hunter's theory. If a Hunter is allowed to grade its own homework, it will confidently validate everything it outputs.

We don't claim a false-negative rate for our system. There's no labeled set of every real bug in a codebase, so any claimed recall number is entirely speculative. What we can watch is whether re-runs keep turning up new bugs (they do) and whether coverage is still growing across runs. It’s all a proxy, as you don’t know for sure how many bugs exist in a single codebase, but it’s a good-enough way of measuring effectiveness.

Stage 2: Vulnerability Validation System (VVS)

A finding coming out of the harness is just the start of the triage process, with all discoveries landing in a single, shared VVS that currently holds 13,841 findings across 145 repos in total. Triaging that volume is its own massive engineering problem, and it matters just as much as the hunting. That triage engine runs on a different model from the harness, broken down into three distinct jobs.

Agent/stage	Primary role	Spawns/ sub-agents/tooling
Dedup	Identifies if a vulnerability is already in the system, or raised as internal Jira ticket already	Deterministic: plain code builds inverted indexes over files, functions, trust boundaries, and rare tokens, then hands each finding a short candidate list Probabilistic: Dedup agent reasons over that short list, Stable cross-run key reopens existing records
Judgment	Production reachability and validation	Single agent — builds context about the bug from MCP servers, to get the shape of what the service looks like in production. Searches the wiki, Jira, git, config, and all available other sources to try and understand whether a bug is truly applicable to our production environment, and then score the vulnerability against this. It also validates the bug against source code to understand if the bug still exists on the latest main branch.
Fixing	Generates patches, runs regression tests	Runs the regression test before and after (filtered to the affected test; full suite only when per-test filtering isn't available). It requires a clean fail→pass flip on the target test to clear the gate. If the post-patch test fails, or if a global run detects downstream regressions, the commit is automatically blocked and flagged for human intervention.

^{Table 2: Vulnerability Validation System (VVS)}

Deduping

Comparing every single finding against every other finding using an LLM scales at O(N^2), which falls apart completely at scale. To keep the model off the critical path, deterministic code builds inverted indexes over the structured data (touched files/functions, trust boundary, rare tokens) to generate a short list of real candidates. Only then does an agent look at that short list to see if a single fix would close several of them. Stable cross-run keys ensure re-found bugs reopen existing records rather than spawning new ones.

Contextual judgment

Judgment is a second, independent pass over what survived. The agent rechecks the latest information, pulling from deployment, environment, and config context to determine if the code path is reachable in prod, and identify the repo owner. This process filters "exploitable now" from "real but latent" and from "real but filed against the wrong component." It's moving a pile of chaotic findings into a risk-driven orchestration workflow.

Automated fixing

The Fixer takes the proposed patch and unit tests, rewrites them to match the repo’s style, applies the diff, and runs targeted tests. A clean fail→pass flip is the ideal and the only auto-cleanup case; a failing post-patch test blocks the commit. The Fixer never merges code on its own; a human must review the branch. This gate is the non-negotiable, human-in-the-loop safeguard that enables a clean, unbreakable cryptographic trail for change management compliance. Left to patch freely, a model will happily fix a security bug while quietly breaking an unrelated feature or adding dozens of new bugs.

Across all three triage jobs, each agent is confined to one narrow task wrapped in deterministic bookkeeping code, and nothing writes to production without a human signing off on a dry run. While this pipeline moves the engineering bottleneck from finding bugs to reviewing and landing fixes, the Fixer remains the youngest and slowest part of the system.

What it costs

Running hundreds of agents over a fleet of repos is not cheap, but at least the shape of the spend is predictable. Almost all of the compute budget goes directly into the hunt stage. This makes Gapfill our cost-to-coverage lever, as each additional pass costs roughly half as much as the initial hunt.

Because the cost per repository varies wildly, we budget per repo rather than per run. We enforce a strict task cap per repository and spin up a worker pool of anywhere from 50 to 200 workers. That way you can spend money on the repos that are actually finding things, and not waste it on the ones that aren't.

It's also why, for us, the big scans are a periodic backlog sweep and not a per-PR check. A full scan of a complex repo can take hours; the worst run took just over 14 hours. Cheaper, smaller harnesses are the right tool for that job.

How we tell it's working

We measure our system’s effectiveness by tracking how efficiently our automated pipeline filters deliberate engineering noise into high-quality, actionable findings. Because we intentionally tune our Hunters to over-report subtle primitives that could be chained into larger attacks, our true indicator of success is how sharply we can refine that initial mountain of raw data, before it ever reaches a human.

To gauge this, we track exactly how many raw findings survive each validation stage over time. Thanks to better context injection from our Recon phase, our initial validation rejection rate dropped from 40% down to 11%, while the share of high-integrity findings climbed from 35% to 58% (representing ~12,057 lifetime findings).

Here's the lifetime breakdown from raw candidates to actionable findings, at the point in time this blog post was written.

Vulnerability Discovery Harness (VDH)

Raw candidates: Everything the discovery harness emitted before independent validation.
Needs repro: Findings that appeared plausible but required manual reproduction before being trusted.
Rejected at validation: The validator disproved the threat model, exploit path, affected code, or evidence.
Duplicates: Candidates collapsed onto another finding from the same harness.
Survived validation: Findings that passed the independent validation gate and moved into the VVS.
Bugs that went elsewhere: Findings deliberately routed outside this flow.

Vulnerability Validation System (VVS)

Another vulnerability harness: Other automated sources feeding the same validation system.
Total bugs in system: The combined pool after ingest.
Duplicates: Findings the dedup pass identified as already covered by another canonical finding or ticket.
Wrong repo / other / not a risk: The noise bucket: misattributed findings, defense-in-depth, or latent risks.
Bugs sent to teams: Finalized, clean findings ready for remediation.
Judged Internet-exploitable: High-urgency findings a realistic attacker could trigger in production.
Not judged Internet-exploitable: Lower-urgency, actionable bugs (production issues, dependency risks, or config errors).
Final severity split: The categorization used to assign priority for the engineering teams.

The core metric of the harness isn’t a speculative recall score — it’s keeping the number of unconfirmed findings in front of real humans as close to zero as possible. The architecture needs to be a relentless filtering funnel.

Out of 20,799 raw candidates generated by VDH, only about 12,057 survived validation.
When these were pushed into the VVS, joining findings from another harness, the central pool was brought to 13,841.
The Dedup agent folded away 5,442 findings as duplicates.
1,154 were routed to the queue as ‘wrong-repo’ or ‘low-risk’ and were recycled back into the system where appropriate.
Ultimately this left 7,245 actionable findings for engineering teams to act on.

Traditional compliance rules dictate arbitrary remediation windows based entirely on a static CVSS score (e.g., "Fix all Highs in 30 days"). Our contextual judgment layer turns this compliance checkbox into actual risk management.

The architecture is capable of tracking findings back to their origin, meaning that fixing a single root cause resolves an entire cluster of findings rather than just patching individual issues. VDH system performance is also measured by dividing repos into (area x attack-class) cells and running the Gapfill agent iteratively until it stops producing findings. Whenever we update an underlying prompt, we test it against a held-out repository to see if that total coverage cell number actually moves.

The harness wires automated health signals to catch system failures early in the pipeline. If a hunt finished suspiciously fast and fails to spawn sub-hunts or gap tasks, it usually indicates a crashed dependency rather than a clean codebase. To remedy this, the system flags any Hunter agent that finishes with zero findings as “shallow” and immediately requeues it for a new run.

Finally, our system’s robustness is reinforced by the independent triage pass described earlier. By re-judging all submissions with a different model and separate logical weights, we ensure an unbiased, adversarial verification that is decoupled from the specific model used for discovery, providing a trust layer that persists regardless of which model is in use.

None of this is finished. We change our system constantly, and it is nowhere near a perfect science. But raw candidate findings are cheap now, and the only work worth doing is turning them into sound, verifiable code fixes.

Building your own harness means accepting that AI models are volatile, but your orchestration layer doesn't have to be. By decoupling your security logic from any single provider, forcing adversarial verification, and automating your triage pipeline, you can turn a mountain of LLM noise into a reliable, fleet-wide defense engine.

Our “North Star” metrics: measuring real-world velocity

Every codebase is a little different, so to show you how this actually works in the real world, we mapped out a realistic benchmark based on a standard repo run. Keep in mind that this represents a single pass on one repo; over time, as the continuous fleet-wide loop deduplicates, filters, and recycles findings, it reduces the volume of lifetime candidates by roughly 65%.

Engineering hours saved via automated patching: Rather than focusing on static baselines, we measure the health of our pipeline by its technical throughput, processing velocity, and its ability to eliminate the manual triage bottleneck:

Initial Validation Cut: For a standard repository (~30k lines of code), this yields 100 initial findings, with a full run taking 3-4 hours, maintaining a hyperfocused context window throughout.
Compression: The Deduplication and Contextual Judgment Layers process these candidates in parallel. Within 3 hours, the system compresses and refines the batch of findings from ~100 raw candidates to 80 distinct, high-fidelity bugs.
Remediation: The automated Fixer processes these 80 distinct bugs at an average rate of 5 minutes per bug. In total, the system can discover, validate, deduplicate, and open functional pull requests in approximately 14 hours.

Shrinking mean-time-to-resolve for critical flaws: Of course, you can’t dump 80 patches into production all at once without breaking things. To keep deployments safe, our system uses a tiered rollout:

Critical Exposure Containment: The system isolates the critical, high, and exploitable bugs (avg. 10 out of 80). We fast-track these for a human review and introduce them into release cycles, getting them fully patched in production in 5 days.
Incremental Hardening: The remaining latent risks, minor config anomalies, and lower-urgency bugs are incrementally rolled into prod over a 15-20 day window to guarantee platform stability.

How we’re handling all of this patching

These findings are the result of an isolated, ring-fenced research experiment designed to stress-test our code. They do not represent active, unpatched vulnerabilities in our live production environment.

Because the harness runs constantly in our test environments, these specific numbers are completely out of date by the time you're reading this. Every single bug surfaced by the pipeline came attached to a working test case to demonstrate the bug and a draft patch. Our security teams are systematically processing the reports and applying the necessary fixes, meaning the Cloudflare products you use every day are already actively hardened against these vectors.

Along with this blog post, we’re releasing the initial skill we used to develop the harness, it’s been slightly cleaned up before release so it’s easier to understand and integrate, but the skill itself remains substantially the same. Hopefully the harness itself will follow shortly. This could be a starting point for your own vulnerability harness, your own skill, or whatever suits your needs best: github.com/cloudflare/security-audit-skill

If your team is working on the same problems and would like to compare notes, reach out to us at security-ai-research@cloudflare.com.

Celebrating 12 years of Project Galileo

Jocelyn Woolbright — Thu, 18 Jun 2026 13:00:00 GMT

Twelve years ago this month, Cloudflare launched an ambitious project built on a simple idea: people shouldn’t be knocked offline just because someone more powerful disagrees with them. Today, Project Galileo provides free access to cybersecurity services to more than 3,400 websites belonging to journalists, human rights defenders, and other nonprofit organizations in 120 countries. We continue to believe that a better Internet is one where anyone with an idea can reach a global audience.

Each year on the anniversary of Project Galileo, we announce new products, programs, and strategic partnerships. To celebrate our 12th anniversary this year, we’re publishing our first comprehensive report on cyberattacks targeting civil society, releasing case studies that explore the security needs of 16 Project Galileo participants, and announcing new project partners.

Introducing a new annual report on cyberattacks against global civil society

Because Project Galileo now includes 3,400 domains belonging to organizations in over 120 countries, Cloudflare has access to unique data regarding the cyber threats, attacks, and trends targeting civil society — a critical pillar of global democracy. In addition, because the Cloudflare network spans more than 335 cities in 125 countries and more than 20% of the web sits behind it, we were also able to compare attacks targeting civil society with those targeting the Internet more broadly. The full report can be explored here.

This year’s data demonstrates that civil society organizations were targeted more frequently, and often more intensely, than other Internet users. Cyberattacks often coincided with critical moments in civil society’s work, such as publishing investigative reporting or conducting public advocacy. Our key findings include:

DDoS attacks were the most common cyber threat against civil society. Their defining feature was duration, with some spanning days and weeks.
Civil society groups faced attempts to exploit website vulnerabilities at a rate more than seven times higher than other Cloudflare customers. Media organizations were disproportionately impacted.
Journalists operating in exile faced a rate of malicious traffic that was nearly four times higher than journalism organizations overall.
Nearly 10% of all emails Cloudflare processed for civil society included potential phishing material.

We conclude our report with a call to action: ensure simple and affordable cybersecurity for all, expand transparency about cyberattacks and Internet shutdowns, and embed AI and post-quantum protections into security tools by default. We hope this report can serve as a resource for civil society, policymakers, and the broader public seeking to understand and respond to cyberattacks. Moving forward, we plan to produce it annually, allowing us to compare cyber threat trends over time.

In addition to the report, Cloudflare released the following qualitative case studies that add context about each organization’s security needs.

Organization	Description	Country/Region of Operation
SHARE Foundation	Nonprofit advocating for privacy, free expression, and other digital rights.	Serbia
Hledaczvirat	Online platform/database for finding lost and found pets, connecting owners with animal shelters.	Czech Republic
Iran Watch / The Wisconsin Project	Research project tracking Iran's weapons capabilities and nonproliferation issues, run by the Wisconsin Project on Nuclear Arms Control.	United States
Bulletin of Atomic Scientists	Nonprofit media organization covering nuclear risk, climate change, and disruptive tech.	United States
The Royal Meteorological Society	Society for weather and climate science, supporting meteorology research, education, and professional accreditation.	United Kingdom
Project Ainita	An engineering collective that develops tools and research for human rights organizations, lawyers, and activists operating in high-risk environments.	Global
Ukraine War Archive	Digital archive documenting and preserving evidence of war crimes and events from the Russia-Ukraine war.	Ukraine
Our World in Data	Research and data publication on global issues like poverty, health, and climate.	United Kingdom
Hague Institute for Innovation of Law	Think-and-do tank focused on user-friendly justice systems and resolving justice problems for people worldwide.	Netherlands
Center for American Progress	Progressive public policy research and advocacy think tank.	United States
Sea Shepherd Brazil	Brazilian chapter of Sea Shepherd, marine conservation organization protecting ocean wildlife and ecosystems.	Brazil
elTOQUE	Independent digital media outlet covering Cuba, including news, economics, and exchange rate tracking.	Global
Humanitix	Nonprofit ticketing platform donating booking fees to children's education and health charities.	Australia
Organized Crime and Corruption Reporting Project (OCCRP)	Global investigative journalism network exposing organized crime and corruption.	Netherlands
Activist Rights	Legal information resource for activists on their rights and legal risks during protest and campaigning.	Australia
China Digital Times	Bilingual news website covering censorship, human rights, and politics in China.	United States

Welcoming new partners

Project Galileo relies on its 59 civil society partners to be a success. Every single organization that applies to the program is reviewed and approved by one of these partners. These groups volunteer their time and expertise, often reviewing multiple applications per day, to help make sure our services go to deserving organizations.

Over time, these relationships have not only helped grow Project Galileo into the program it is today, but also launched entirely new initiatives, like our email security partnership with Protect.ngo (formerly CyberPeace Institute) or our work supporting Internet measurement at public schools through UNICEF's Giga project.

For several years, one of our goals for Project Galileo has been to reach more organizations in regions outside North America and Europe. Part of that effort has been attending regional events like RightsCon in Costa Rica (2023) and Taiwan (2025) to speak directly with local digital rights organizations. We have also welcomed new partners who bring their own active networks and communities into the program. For example, last year we announced two new partners in the Asia-Pacific region: EngageMedia and the OpenCulture Foundation.

Because of the new services we recently added to Project Galileo to help local news organizations protect their content from AI crawlers, our partnership focus this year was groups serving journalists. To that end, we are proud to announce three new partners:

Organization	Description	Country/Region of Operation
International Center for Journalists	Nonprofit focused on promoting high-quality independent journalism. Provides training, fellowships, mentorship, and financial support to journalists, specializing in helping reporters leverage digital technologies.	Based in the United States and supporting journalists in 180+ countries.
Media Cluster Norway	Innovation hub focused on next-generation media technology. Provides collaborative research spaces, funding opportunities, business incubation, and networking events for 100+ creators and local newsrooms.	Norway
NGO-ISAC	Nonprofit network focused on protecting civil society from cybersecurity threats. Provides threat intelligence, defensive coordination, training, and support to its network of over 1,000 nonprofit organizations.	United States

Continuing to protect civil society around the world

Today’s new report, case studies, and new partners are all aimed at working toward Project Galileo’s fundamental goal: ensuring that cyberattacks do not silence organizations working in vulnerable, essential areas like journalism and human rights.

As we look to the future, we remain committed to finding new ways to expand our protections to at-risk groups worldwide. If your organization is looking for protection under Project Galileo, please visit cloudflare.com/galileo.

Bringing more agent harnesses and frameworks to Cloudflare, starting with Flue

Thomas Gauvin — Wed, 17 Jun 2026 19:35:00 GMT

2026 is the year agent harnesses go to production. The software that controls the model’s access to the outside world — harnesses like Codex, Claude Code, OpenCode, Pi, and Project Think — has matured to the point where teams are deploying agents as real, load-bearing infrastructure, not just prototypes.

But building agents that survive production is hard.

We learned this firsthand building Project Think as our first-party agent harness. In working with our customers to run agents in production, we found a common set of distributed systems problems that every agent faces when running in the cloud. When an agent is interrupted, how can it automatically and gracefully resume from where it left off, without losing context or wasting tokens? How can agents run untrusted code securely? How can agents use the tools they were trained for?

A harness can’t solve these problems on its own. They’re tied to state, storage and compute — which means they’re dependent on the platform the agent runs on. That’s why we’re taking our learnings from hardening Project Think for production and bringing them to the Cloudflare Agents SDK as a base layer. Durable execution, dynamic code execution, a durable filesystem and dynamic workflows, now available to any harness building on Agents SDK.

At the same time, a new layer has emerged above the harness. Frameworks like Flue wrap a harness with the project structures, conventions, integrations and developer experience that make agents productive to build.

To solve these scaling challenges, there’s a new, three-layer stack that is emerging for building production-grade AI. Here is how the pieces fit together, moving from the user-facing developer experience down to the underlying platform primitives:

The framework (Flue) — the project structure, the conventions, the integrations, the CLI and the developer experience for building agents.
The harness (Pi, Project Think) — the agentic loop that calls tools, reads results, manages context and keeps going until the task is done.
The runtime/platform (the Cloudflare Agents SDK) — the compute, state, and storage primitives everything above depends on

The Agents SDK is that bottom layer: it makes primitives like durable execution available to any harness and any framework. Flue, our new open-source framework from the team behind Astro, is the first to build on it. Here’s how.

Flue

Flue shipped 1.0 Beta this week, built on the Pi harness, the same harness that OpenClaw is built on. What makes it different as an agent framework is the approach: you don’t script what your agent does, you describe what it knows. Define the context an agent needs — its model, skills, sandbox, and instructions — and it solves whatever task you give it, autonomously. There’s no orchestration loop to write.

This declarative model is what makes writing agents easy: here’s a triage agent that intercepts a bug report, reproduces it in a sandbox, and diagnoses the issue in under 25 lines.

The Flue developer experience

Flue’s power comes from the fact that agents don’t live in isolation. They are built to exist where your users already work, and integrate with your preferred tooling:

Anywhere agents: Drop your agents into Slack, GitHub, Linear, or Discord with pre-configured Channels that handle event verification and dispatch boilerplate automatically.
Headless, but UI-ready: Agents shouldn’t live in a black box. Flue agents can run completely headlessly for background tasks, but @flue/react provides native frontend hooks that stream an agent’s state, tool execution, and live messages straight into your frontend application, without you having to build custom real-time plumbing from scratch.
Ecosystem-ready: Flue makes it easy to add and upgrade integrations with commands like flue add channel slack, generating a Markdown blueprint that your own coding agent can read, modify, and cleanly integrate straight into your codebase.

Designed for production, not just prototypes

Moving an agent out of a local terminal and into a production ecosystem introduces traditional distributed systems failures. Host crashes, API timeouts from LLM providers, and unexpected restarts threaten to erase the short-term memory of a running agent turn.

Flue solves this via Durable Streams. Each event in the execution history is added to an append-only log. By processing every prompt, tool response and model choice as an unchangeable ledger, an agent’s state is never volatile. If a process dies, another simply picks up the log and continues from the exact step it left off.

Deploy anywhere, including Cloudflare

Flue is a multi-cloud framework. On Node.js, each agent runs as a long-lived process. You can deploy it to any VM or container, run it in GitHub Actions, or embed it on an existing server. But when you target Cloudflare, each agent becomes a Durable Object.

By running each Flue agent inside its own Durable Object, Cloudflare can automatically scale to as many agents as you need, each with their own isolated storage and compute. You don’t have to provision servers, manage sticky sessions, or worry about noisy neighbors. And when Flue agents are deployed to Cloudflare, they get durable execution using Agents SDK’s runFiber(), stash(), and onFiberRecovered() methods. Flue also uses @cloudflare/codemode and @cloudflare/shell for sandboxed code execution against a durable workspace.

What harnesses need out of an agentic platform

Flue’s Cloudflare target works so effectively because it maps cleanly to the core primitives we built into the Agents SDK. You can even dig into the Flue source code to understand how Pi, the underlying harness, is adapted to work on Cloudflare Agents SDK.

Here’s how Flue leverages the Agents SDK under the hood, and what it takes to run any modern agent harness reliably at scale.

Every agent harness needs durable execution

An agent turn is not a single request. The model streams tokens, calls tools, waits for results, maybe asks a human for approval, or delegates work to a subagent. That sequence can take seconds or minutes, and at any point the process can be interrupted or crash. When that happens, all of the agent state that was in memory is gone: the streaming connection, the pending tool calls, where the agent was in its turn. Sure, the conversation history is persisted on disk, but the user sees a spinner that never resolves. That’s a broken user experience.

Fibers solve this problem by providing a native checkpointing mechanism directly inside the Agent’s underlying Durable Object. runFiber() records the progress to the Durable Object’s SQLite storage before the work in the Agent turn starts and checkpoints with stash() as the turn advances. When a fresh agent instance boots after an interruption, onFiberRecovered() delivers the last checkpoint, so your agent knows a turn was interrupted, where it got to, and can decide how to continue.

import { Agent } from "agents";
import type { FiberRecoveryContext } from "agents";

class MyAgent extends Agent {
  async doWork() {
    await this.runFiber("my-task", async (ctx) => {
      const step1 = await expensiveOperation();
      ctx.stash({ step1 });

      const step2 = await anotherExpensiveOperation(step1);
      this.setState({ ...this.state, result: step2 });
    });
  }

  async onFiberRecovered(ctx: FiberRecoveryContext) {
    if (ctx.name !== "my-task") return;

    const { step1 } = (ctx.snapshot ?? {}) as { step1?: unknown };
    if (step1) {
      const step2 = await anotherExpensiveOperation(step1);
      this.setState({ ...this.state, result: step2 });
    }
  }
}

Flue uses runFiber() on its Cloudflare target for exactly this. With the onFiberRecovered() hook, your harness can decide how to resume the execution of the turn, whether it attempts a full reconstruction model like Project Think that repairs turn state or whether it replays certain parts of the turn.

Executing code is better than overloading agents with tools

Agent harnesses give models access to the outside world through tools. But tool surfaces grow fast, and models get worse at selecting the right tool as the list gets longer and the context window fills up with tool definitions. A better pattern: give the model one tool that executes code. The model writes a TypeScript function that calls the APIs it needs, and the harness runs it. We wrote about this when we introduced Code Mode.

The question is where that code runs. To run LLM-generated code securely, you need a sandbox. But typical sandboxes would be slow, cost-prohibitive and inefficient to run each tool call. That’s why the Agents SDK provides @cloudflare/codemode, which wraps Dynamic Workers, to execute LLM-generated code in its own Worker isolate with only the bindings you provide.

Code Mode creates a fresh Dynamic Worker for each snippet, runs it, and discards it. Isolates start in under 10ms and $0.002 per load, resulting in drastically faster and cheaper cost of execution than booting a container every time your agent needs to execute a short piece of code. Flue uses @cloudflare/codemode on its Cloudflare target to power its code tool. The agent writes JavaScript against the workspace and runs it with Code Mode.

You don’t need a full container for most workspace tasks

Agent harnesses often need a filesystem, whether it’s to read files, write outputs, search through code and understand diffs. Coding agents in particular live in the filesystem. But if the harness is running in a serverless environment, how can it get a durable filesystem that persists across executions?

The usual answer is a container. That works, but it’s expensive for what agents mostly do. The majority of filesystem operations in an agent turn are text. Consider a review agent that reads files, greps through source code, or perhaps writes a patch. You don’t need a full Linux boot for that.

@cloudflare/shell gives your agent a durable virtual filesystem inside its Durable Object, backed by SQLite. It provides typed file operations — read, write, edit, search, grep, diff — that agent harnesses can use as tools.

Instead of calling individual tools, a Flue agent running on the Cloudflare target writes JavaScript against the workspace virtual file state API. By running more operations within the Durable Object, the agent benefits from the isolate model’s more efficient execution process, entirely avoiding container overhead:

async () => {
  const files = await state.glob("src/**/*.ts");
  const results = [];
  for (const file of files) {
    const content = await state.readFile(file);
    const todos = content.match(/\/\/ TODO:.*/g);
    if (todos) results.push({ file, todos });
  }
  return results;
}

This translates into a faster and more cost-efficient sandbox environment for agents that need to run shell and filesystem operations to get their work done. And for agents that need a full OS, to run npm install, git, or compilers, Cloudflare Containers provides that. We’re also building @cloudflare/workspace, to keep the virtual file system of a given Durable Object in sync with a container’s, allowing for seamless transition from lightweight Workers to a Linux environment only when it needs one.

Dynamic Workflows: let agents write their own workflows to repeat tasks consistently

But what happens when an agent needs to do more than read files or execute single code snippets? What happens when it needs to orchestrate a massive, multi-step pipeline that must repeat consistently over time, like a code review that successfully resolves bugs or a research workflow that produces good results? A harness can’t provide durable multi-step execution on its own. It needs the platform to persist each step, retry failures, and resume after interruptions.

This pattern is gaining traction. Claude Code recently shipped dynamic workflows, where Claude writes a JavaScript script at runtime to hand off work to dozens of subagents, and the runtime executes it durably. @cloudflare/dynamic-workflows provides this for any harness running on the Agents SDK. Your agent generates a workflow at runtime, and the Workflows engine persists each step, retries failures, and can sleep for hours or wait for external events like human approval.

From the Agent class, runWorkflow() connects your agent to the Workflows engine. The agent kicks off the workflow and can go to sleep. The workflow calls back into the agent via RPC to report progress, update state, or request approval. When the workflow finishes, the agent wakes up with the result.

Direct access to the Cloudflare ecosystem

Beyond compute and storage, agent harnesses need access to external capabilities: web browsing, email, memory, search, inference. A harness shouldn't have to integrate each of these separately, manage API keys for each, or worry about credentials leaking through agent-generated code.

The Agent class gives your harness access to the rest of Cloudflare through bindings: AI Gateway for per-agent spend tracking and limits, Browser Run for web automation, Email Service for inbox workflows, Agent Memory for persistent recall, AI Search for retrieval, Containers for workloads that need a full OS, and inference across 14+ model providers. Bindings grant capabilities without exposing credentials: your agent uses them, but the keys never enter agent-generated code.

Bring your agents to the agentic cloud

We know this approach works because it is the exact architectural foundation we used to build Project Think, our first-party agent harness. While Project Think remains our highly optimized, out-of-the-box solution for native Cloudflare agent experiences, the Agents SDK ensures that the broader open-source ecosystem can leverage those exact same battle-tested primitives, including Flue.

If you're building agents today with Flue, you can deploy in just a few clicks to Cloudflare. And if you're building your own agent harness or you’re building an agent framework, target the Agents SDK and get the platform integration for free.

Agents SDK: developers.cloudflare.com/agents
Flue: flueframework.com, npm install @flue/runtime
Think: docs
Cloudflare Community: community.cloudflare.com

Introducing the Cloudflare One stack: agent-powered deployment

AJ Gerstenhaber — Wed, 17 Jun 2026 13:00:00 GMT

Adopting or migrating to a Zero Trust network architecture can be a daunting task. Before a single policy changes, teams have to recall how their network is actually built: which applications exist, their authentication and authorization constructs, how traffic flows between them, and any assumptions the current architecture makes. This hands-on process requires practitioners to decode the intent behind every security and routing policy in place.

Today, we’re releasing the Cloudflare One stack, a set of skills you give to your agent to configure, deploy, and manage your Zero Trust environment for you. This toolkit is designed to help automate the process of learning an entirely new security suite and mapping your existing one into Cloudflare.

Cloudflare has worked with thousands of customers through exactly this process. That repetition built expertise on where migrations stall, what questions come up every time, and what it takes to move forward. The Cloudflare One stack packages that expertise and makes it more accessible than ever.

The agent gap in network security

Teams are already using agents to write code, triage alerts, and automate workflows. Organizations are increasingly asking for Cloudflare-provided tooling to help agents execute on security workflows. On their own, agents are not trained on the nuances of an organization's specific network topology or vendor configurations.

By providing prescriptive and authoritative guidance, organizations can layer this context into their existing toolkit to make better use of the security products they are already deploying.

Cloudflare has long been the easiest-to-deploy SASE vendor in the market. The stack extends that philosophy to agents: it gives them the context, tools, and structured reasoning they need to operate on your security infrastructure.

What is the Cloudflare One stack?

The Cloudflare One stack is a collection of skills that can be used with any agent. As with any skill, you can use them standalone, layer in your own context, or build tooling on top. It was purpose-built to help security practitioners across the entire lifecycle of evaluating, deploying, and managing Cloudflare One.

The stack was built by synthesizing hand-curated knowledge from employees with tens of thousands of hours of experience working with customers on Cloudflare One products. It contains tools for planning, managing, and implementing your user and agent security infrastructure on Cloudflare. It also contains handpicked logic for migrating from legacy vendors like Zscaler and Palo Alto Networks.

When used in conjunction with the Cloudflare code mode MCP server, the stack gives agents a typed interface to the Cloudflare API. Agents can query your live account, inspect configurations, and make changes through a curated set of Cloudflare-recommended workflows rather than ad-hoc API calls.

What’s in the stack?

The Cloudflare One stack ships as two lightweight skill files: cloudflare-one and cloudflare-one-migration. Together they cover migrating to, building an implementation for, managing, and troubleshooting your Cloudflare One deployment:

Remote access and VPN replacement with Cloudflare Access
User, network, device, and data security with Cloudflare Gateway
Connectivity with Cloudflare Tunnel, Cloudflare Mesh, and Cloudflare WAN
Migration guidance with explicit detail for moving from other SASE vendors
Network diagram interpretation and generation, so you can visualize proposed changes to your network in a way that is easy for you and your team to understand
Vendor concept translation, which maps concepts between SASE vendors to reduce the barrier to evaluating and switching providers
Troubleshooting and operations, with the Digital Experience Monitoring (DEX) toolkit and automated rule recommendations

How it works

The stack is available in the Cloudflare Skills repository. Each skill file contains structured knowledge, decision trees, and tool definitions that agents load automatically when the context matches. Give this to your agent and let it help you set up, configure, and manage your Zero Trust environment:

The cloudflare-one skill covers general product guidance. For example, if you ask an agent for the best way to replace your VPN infrastructure with Cloudflare Tunnel or Cloudflare Mesh, the skill knows how to:

Inventory your existing VPN applications and identify which connectivity model each requires
Map each application to the appropriate Cloudflare primitive — self-hosted Access application, Tunnel-connected service, or Mesh-connected network segment
Generate a recommended deployment sequence that minimizes disruption during cutover
Produce a configuration summary your team can review before making any changes

The cloudflare-one-migration skill covers vendor-to-vendor translation. For example, if you ask an agent to migrate your Zscaler Private Access applications to Cloudflare Access, the skill knows how to:

Map Zscaler application definitions to Cloudflare Access application definitions
Transform Zscaler user groups and policies into Cloudflare Access policies
Use the Cloudflare API to create the equivalent resources in your account
Generate a summary of what was migrated and what requires manual review

The migration logic in the stack is the same logic used in Cloudflare's Descaler and Deskope programs. Those programs have already moved enterprise customers from Zscaler and Netskope to Cloudflare One in hours rather than months. The stack makes that capability available to any customer or partner, at any time, without waiting for a scheduled engagement.

More ways to use the stack

The Cloudflare One stack can also:

Recommend security rules based on traffic seen in your live account
Automatically migrate your existing Zscaler Private Access applications into self-hosted Cloudflare Access applications
Investigate anomalies in your secure web gateway HTTP logs and build rules to resolve issues users are seeing
Report on user stability with the DEX toolkit and take actions to improve user latency in key scenarios

Whether you are loading the skill from an agent or building custom tooling on top, the Cloudflare One stack handles all of these use cases and more.

For partners, too

While this simplifies ongoing management for customers who have already adopted the Cloudflare One product suite, it is also a tool for the Cloudflare partner network. Partners can use it to help their customers deploy faster, manage more effectively, troubleshoot with increased accuracy, and drive issues to resolution.

What's next

You can start using the Cloudflare One stack today. To get the most out of the stack, pair it with the Cloudflare code mode MCP server. The MCP server gives your agent live access to the Cloudflare API through a single, compressed interface that keeps authentication credentials out of the model context.

The Cloudflare One stack will continue to expand as Cloudflare One products evolve. New skills for additional migration sources and more advanced troubleshooting workflows are already in development.

As we learn more about how customers and partners utilize these skills files, we plan to build more robust tooling around these skills. If you are a customer or partner and want to share feedback on what the stack should handle next, reach out through your account team or open an issue in the repository.

Cloudflare DMARC Management is now generally available

Ayush Kumar — Tue, 16 Jun 2026 13:00:00 GMT

When we first launched DMARC Management, it was driven by a simple belief: every domain on the Internet deserves strong email authentication, and cost should never be the reason it doesn't happen. As part of our mission to help build a better Internet, we made DMARC Management available for free to every Cloudflare customer. We wanted to give everyone the tools to understand and improve their DMARC posture without needing to hire an email security consultant or parse XML report files by hand.

Today, we are taking that commitment further. Cloudflare DMARC Management is now generally available, with a redesigned experience built to help you reach full DMARC enforcement as easily as possible.

^{The DMARC Management dashboard offers a unified view of your email authentication posture.}

What email authentication actually does for you

Every time someone receives an email "from" your domain, their email provider asks a simple question: did the real owner of this domain actually send this? Without a way to answer that question, anyone can send an email pretending to be you and your recipients will have no way to tell the difference.

Email authentication is the set of DNS records that answers that question. There are four protocols that protect your domain:

SPF (Sender Policy Framework) tells receiving mail servers which IP addresses and services are allowed to send email on behalf of your domain.
DKIM (DomainKeys Identified Mail) attaches a cryptographic signature to every email you send, so receiving servers can verify the message hasn't been tampered with in transit.
DMARC (Domain-based Message Authentication, Reporting, and Conformance) ties SPF and DKIM together and tells receiving servers what to do when an email fails authentication: let it through, quarantine it, or reject it outright. It also sends you reports on who is sending email as your domain.
BIMI (Brand Indicators for Message Identification) lets you display your brand logo next to your emails in supported inboxes, but only if your DMARC policy is strong enough.

When all four are configured correctly, spoofed emails get blocked before they reach anyone's inbox and your legitimate emails are far more likely to be delivered. When they're missing or misconfigured, you're exposed to brand impersonation and deliverability penalties from the large email mailbox providers.

DMARC is no longer optional

DMARC has always been important. But over the past two years, the stakes have gotten significantly higher. Google, Microsoft, and Yahoo have all announced or implemented stricter email authentication enforcement. Domains that do not have proper DMARC, SPF, and DKIM records configured (or worse, have them configured incorrectly) are increasingly seeing their legitimate emails land in spam folders or get rejected outright. What was once a best practice is now a requirement. Poor email hygiene directly translates to poor deliverability, and for many businesses, that means lost revenue and missed communications.

The message from the industry is clear: if you send email from your domain, you need these records configured correctly. The grace period is over.

The problem: DMARC is confusing, and mistakes are costly

Here is the challenge. The journey from p=none (monitor only, no emails are blocked) to p=quarantine (suspicious emails are sent to spam) to p=reject (unauthenticated emails are blocked outright) is filled with uncertainty. Enable enforcement too early, and you risk breaking legitimate email flows from third-party services you forgot were sending on your behalf. Move too slowly, and you leave your domain exposed to spoofing, and now, to deliverability penalties from the very providers your customers use.

Most organizations know they need DMARC enforcement. But actually getting there requires understanding aggregate XML reports, identifying every legitimate sending source across your infrastructure, and building enough confidence that tightening your policy will not break anything.

We built Cloudflare DMARC Management so that any customer can navigate this journey on their own. No need for professional services engagement. No spreadsheet analysis of aggregate reports. No guessing which IP address belongs to which vendor. The goal is to make the path to full DMARC enforcement as self-service as possible, giving you the visibility and confidence to tighten your policy without breaking anything.

^{DMARC reports show sending source alignment across your domain.}

What we shipped

Deeper report visibility with source investigation

We redesigned the reporting experience to make it easier to understand what is happening with your email traffic. You can now see at a glance which sending sources are passing or failing DMARC, SPF, and DKIM alignment and drill deeper than ever before.

Every report now surfaces the source IP address alongside the sending service, giving you the granularity to distinguish between legitimate infrastructure and unauthorized senders. You can now open any IP address directly in our Investigate tab, which surfaces all the threat intelligence Cloudflare has on that address — reputation data, geolocation, autonomous system number (ASN) details, and any known associations with malicious activity.

This turns your DMARC reports from a passive data feed into an active investigation tool.

^{Drilling into a sending source reveals IP-level detail and Cloudflare threat intelligence in the Investigate tab.}

What you see	What it tells you
Source IP address	The specific infrastructure sending email on behalf of your domain
Sending service name	The organization or provider behind the IP
DMARC / SPF / DKIM alignment	Whether each authentication check passed or failed for that source
Investigate tab	Cloudflare threat intelligence: reputation, geolocation, ASN, and known threat associations

Email authentication record status

One of the most common questions customers ask is: "Are my records set up correctly?"

Until now, answering that question required manually inspecting DNS TXT records and understanding what each tag and value means across multiple specifications. With this release, you can see the status of every email authentication record you need: DMARC, DKIM, SPF, and BIMI, in a single view.

Each record type gets a clear pass, warning, or fail status based on automated analysis. You can drill into any record to see specific findings about what we detected and recommendations for how to fix it. If your DKIM key is malformed, we flag it. If you are missing a BIMI record and your DMARC policy is strong enough to support one, we let you know that too.

^{Record analysis cards show pass, warning, or fail status for each email authentication record, with actionable recommendations.}

The recommendations are written in plain language, not RFC jargon. The goal is to make it obvious what action to take next, regardless of your email security expertise.

Record	What we check
SPF	Multiple records, lookup limits, permissive +all, missing mechanisms
DKIM	Key formatting, missing or malformed public keys
DMARC	Policy strength, monitoring vs. enforcement, reporting configuration
BIMI	Logo URL format, Verified Mark Certificate (VMC) presence

SPF lookup audit

This one addresses a problem that silently breaks email for more organizations than you would expect. The SPF specification (RFC 7208) imposes a hard limit of 10 DNS lookups per SPF evaluation. Every include:, a, mx, redirect, and exists mechanism in your SPF record counts toward that limit, and so do the nested lookups inside each include:. Exceed 10 and receiving mail servers return a permerror, which means your SPF check fails entirely.

Most people have no idea they are over the limit until their email starts getting rejected.

DMARC Management now lets you audit your SPF record and see exactly how many lookups it incurs. You can drill into every mechanism in the record, see which include:chains are the most expensive, and identify where to consolidate or flatten your record to get back under the limit.

^{The SPF lookup audit traces every DNS lookup in your SPF record, showing exactly where you are against the 10-lookup limit.}

Getting started

To use DMARC Management, you need to have your domain's DNS on Cloudflare. Then you can turn on DMARC Management under the Email tab for that domain in the Cloudflare dashboard.

1. Navigate to your domain in the Cloudflare dashboard.

2. Go to Email > DMARC Management.

3. Follow the setup wizard to start receiving DMARC reports.

4. Review your record analysis and recommendations.

5. Work toward p=quarantine (suspicious emails go to spam) or p=reject (unauthenticated emails are blocked entirely) at your own pace.

What's next

We are going to continue building on top of DMARC Management, and our goal is to keep it accessible. We have several things planned that we are excited to ship: deeper forensic reporting, smarter recommendations, and tighter integration with the rest of the Cloudflare platform.

If you are not yet using Cloudflare for your DNS, you can get started here. Once your domain is on Cloudflare, DMARC Management is available immediately with no additional configuration or cost required.

Your domain is either protected or it isn't. Head to Email > DMARC Management in your Cloudflare dashboard to get started.

Growing the Cloudflare AI team with talent from Ensemble AI

Alex Reneau — Mon, 15 Jun 2026 13:00:00 GMT

Today, we’re excited to share that key members of the team at Ensemble AI are joining Cloudflare to help accelerate our work in AI infrastructure and make it easier for developers to run powerful AI models efficiently at scale.

Ensemble AI, founded in 2023 in San Francisco, has spent the last few years focused on one of the most important challenges in AI: making large models faster, smaller, and more cost-effective to serve, without sacrificing quality. The team has developed new approaches to model compression and efficient inference that are designed to reduce the memory, compute, and deployment overhead of large language models and multimodal architectures.

As AI becomes a core part of how developers build applications, the economics of inference matter more than ever. Models are getting larger; workloads are becoming more dynamic. And customers increasingly expect AI to be available everywhere: globally distributed, fast, reliable, and affordable. Bringing the Ensemble AI team into Cloudflare strengthens our ability to make that possible.

Incorporating Ensemble’s expertise

The team at Ensemble AI has focused on preserving the structure inside modern AI models while reducing the cost of running them. Instead of treating model efficiency as only a quantization or hardware problem, Ensemble has explored new model building blocks that can make neural networks more compact and efficient at the architectural level.

A core part of this work is NdLinear, a drop-in replacement for standard linear layers in transformer models that operates directly on multidimensional activations rather than flattening structure away. This enables models to preserve meaningful axes, such as heads, channels, spatial dimensions, or other structured representations, while reducing parameter count and compute. Ensemble has also developed NdLinear-LoRA, an efficient adaptation method designed to reduce the trainable parameters required for fine-tuning large models.

These approaches complement other efficiency techniques, including quantization and vector quantization. Together, they point toward a future where developers can run capable AI models with substantially lower memory, compute, and cost requirements.

Making AI inference more efficient

Cloudflare Workers AI gives developers access to serverless GPU-powered inference on Cloudflare’s global network. As developers build more AI-native applications, the ability to serve models efficiently becomes a critical part of the platform.

Inference cost is one of the biggest barriers to scaling AI applications. Every improvement in model size, memory footprint, throughput, and GPU utilization can make AI more accessible to developers and more economical for customers. This is especially important as AI workloads expand beyond simple text generation into agents, multimodal models, personalization, fine-tuning, retrieval, and reinforcement learning.

We are deepening our investment in the core machine learning capabilities needed to make Workers AI faster, more flexible, and more cost-efficient. This builds on top of our existing work on improving model efficiency, including our inference engine Infire, tensor compression techniques like Unweight, and our platform for running extra large language models. The team will focus on improving the economics of serving large language models and other advanced AI architectures, with an emphasis on model efficiency, GPU utilization, and scalable deployment.

Building for the next generation of AI workloads

AI infrastructure is entering a new phase. Developers no longer need only access to models; they need infrastructure that can run models reliably, affordably, and close to users. They need the ability to experiment with different model sizes, fine-tuning approaches, and deployment patterns without being blocked by cost or operational complexity.

Cloudflare is uniquely positioned to help solve this. Our global network, developer platform, and serverless architecture give us the foundation to bring AI closer to where applications already run. The Workers AI Machine Learning Engineering team will help us improve the efficiency layer underneath that experience.

By combining Cloudflare’s global infrastructure with Ensemble’s work in model compression and efficient architectures, we can continue building a platform where developers can deploy AI applications with lower cost, better performance, and less operational overhead.

What’s next

Together, we will continue building the infrastructure needed to make AI more efficient, accessible, and useful for developers everywhere. Our goal is simple: help developers run powerful AI workloads at global scale while improving the economics of inference across the Cloudflare platform. If you want to join us in our mission, check out our careers page.

Scaling Security Insights: how we achieved a 10x increase in global scanning capacity

Dave Baxter — Fri, 12 Jun 2026 13:00:00 GMT

Security Insights provides actionable security recommendations for every Cloudflare account. To find these insights, we perform regular scans for all accounts, zones, and DNS records, looking for potential security risks and misconfigurations.

However, two key issues emerged. First, our scans were too infrequent. Scans were only being performed every week or two, and therefore newly introduced security risks could remain undetected for up to two weeks. Second, automatic scanning was opt-in for many free plan accounts – meaning lots of accounts weren’t being scanned at all.

The risks of infrequent or nonexistent scans are rising: as automated attacks accelerate, the window for detecting security misconfigurations is shrinking. Making sure that we’re finding these issues for all of our customers is crucial to our aim of building a better Internet for everyone.

We calculated that to increase our scanning frequencies and enable automatic scanning for all accounts, we would need to increase our scanning throughput by around 10x on average – from 10 scans per second to 100 per second. But our system was already struggling with its load: millions of events were filling up our backlog waiting to be processed; our API was frequently timing out; our processes were crashing. We needed to fix our system, and we needed to make it scale.

This is the story of how we increased scanning throughput for Security Insights by more than 10x, enabled security insights for millions of customers, and doubled our scanning frequency for all customers. Read on to find out how we achieved these improvements.

How we scan for security insights

At a high level, our automatic security scans are triggered by a scheduler. When an account or zone is due for a scan, the scheduler publishes a message (or messages) to Apache Kafka, an open-source distributed event streaming platform. These messages fan out to a number of checkers: specialized Go microservices that scan specific assets or configurations.

For every message, each checker sends its results (the security insights that it found) to our internal API, which then persists these in a Postgres database.

Making it scale

Scaling Kafka

Apache Kafka is not strictly a queue: it is a partitioned event stream (though recently gained queue semantics). Within a partition, messages must be consumed and processed in order. This differs from typical queues where messages may be consumed in order but are processed out-of-order. As a result, we can only have one active consumer per partition within a consumer group.

This has two consequences for us:

Messages that are slow to process block the consumer from progressing to the next message
For each checker, we can only have as many consumers as there are partitions (each checker has its own consumer group)

We could have tried to scale by adding more partitions. However, this would have increased resource usage for the Kafka broker itself, which is shared by many other services. We reserved this as a last resort, aiming to improve our code and architecture first.

Introducing parallel processing

Although we can only consume messages in order, there is nothing stopping us from consuming multiple messages at once.

We changed our checkers to consume messages in batches, processing each message in a separate goroutine. The trade-offs are that we’d have more work to re-do if our process crashed midway through a batch, and our memory usage would be slightly increased. In our case, these were both acceptable.

Avoiding head-of-line blocking

Some messages processed by a few of our checkers take much longer to process than others. For example, one account/zone may have far more assets than another. In the worst case, these messages can take minutes or hours to process compared to the average case of seconds or milliseconds.

We opted for a very simple approach: splitting our consumer groups and checkers in two – the ‘slow lane’ and the ‘fast lane’. We could determine quickly whether a message would be slow or fast to process. If the ‘fast lane’ checker encounters a slow message, it skips it.

This solved the problem: slow messages had the dedicated resources and time to be processed with minimal delay, and fast messages were able to proceed at their regular fast pace.

Optimizing our database queries

Every insight we find gets written to our Postgres database. This is handled by a single API endpoint that our checkers invoke with a list of insights. The implementation looked like this:

for _, issue := range issues {
	_, err = tx.Exec(ctx, `INSERT INTO table ... VALUES ($1, $2, ...) ON CONFLICT DO UPDATE ...`, ...)
	if err != nil {
		return err
	}
}

The astute reader will notice that for large sets of insights, this code makes a round trip to the database per insight. With a maximum observed size of 500,000, this was half a million round trips, queries, and transactions in a single API call.

We initially tried the gold standard for bulk inserts in Postgres: COPY into a temporary table. However, we found that this approach led to bloat in the Postgres system tables.

We settled on a hybrid approach:

Using UNNEST when the number of issues was below a threshold
Using COPY when the number of issues exceeded this threshold

This provided the best of both worlds: reasonably fast inserts for huge sets of insights (seconds), and even faster inserts (milliseconds) for small sets of insights.

Investigating our API timeouts

We noticed several strange behaviours in our internal API as we tried to scale:

A large number of requests were triggering client-side timeouts
Many checkers were spending 20-90% of their processing time on a single API call
When triggering a large volume of scans, our throughput would start high and deteriorate

All of these problems had the same root cause: latency.

Our primary database is located in Portland, Oregon. Our API, however, was running active-active in both Portland and Amsterdam. Even at the speed of light, the round-trip latency between Portland and Amsterdam would be 50 milliseconds.

As a result of this latency, database queries from the Amsterdam API instance took much longer, holding connections from our client-side connection pool open. With the large volume of requests that we were making to the API, the connection pool was quickly becoming exhausted, leading to timeouts waiting for a free connection. Our average API call completed in 10 ms in Portland, but almost 3 seconds in Amsterdam!

But why the drop in message throughput? Each checker process gets assigned a set of partitions of the Kafka stream to consume. Our API is load-balanced. Since we hold the connection open throughout the life of the process, some processes had a connection to the Amsterdam API, and others had a connection to the Portland API. The partitions linked to Portland were processed quickly, but the ones consumed by the Amsterdam-bound processes were lagging behind:

^{Kafka lag (number of messages waiting to be processed within a single consumer group) by partition for one of our checkers. Note that we have 30 partitions in this case. Exactly 15 partitions can be seen lagging behind (the lines that reach or approach zero later than around 03/10 03:00). This is because the load balancer splits traffic evenly between our API endpoints.}

This was a simple fix: we switched our API to active-passive, ensuring the active API followed our primary database. Our latency problems disappeared overnight.

Rethinking the scheduler

We’d scaled Kafka. We’d optimised our database queries. We’d fixed our API. However, we still had a problem: we needed to be sure our scans would be roughly uniformly distributed in time. It wasn’t feasible to queue all of our scans at the same time, as our Kafka topic uses a time-based retention policy: the scans would pile up in Kafka, and eventually be deleted before they could be processed.

Our scheduler was not good at uniformly distributing our scans. The number of scans that would be triggered at a given time was spiky and unpredictable. At certain points throughout the week, hundreds of thousands of scans would be triggered within minutes of each other. What was going on?

The scheduler triggers scans on fixed recurring periods. In pseudocode, the scheduler looked like this:

Loop forever:
    Find accounts where last_scheduled_at + scanning frequency <= now
    For each account:
        Trigger scan for account
        Trigger scan for all zones in the account
        Update last_scheduled_at = now

We quickly noticed that last_scheduled_at was similar for a large number of accounts in our database, which was responsible for some of this unevenness.

However, even with perfectly even distribution, increasing our scanning frequency would have compounded this problem. For example, changing the scanning frequency from every 15 days to every seven days would mean 53% of accounts would suddenly be due for a scan.

There was a further problem with this logic. Some accounts have a very large number of zones. When these accounts were scheduled, there was a cascade of scans for all of their zones. This was saturating our Kafka partitions and leading to delays for scans of much smaller accounts.

To fix these problems, we made three key changes:

Schedule zones independently of accounts: each zone gets its own last_scheduled_at field.
Randomize the last_scheduled_at time for existing accounts and zones.
Introduce adaptive rate limiting for scan scheduling.

Scheduling zones independently was an obvious way to solve the problem of large accounts. Randomizing the last_scheduled_at time (and ensuring that no scans were delayed during this process) allowed us to fix the existing unevenness in our database.

Adaptive rate limiting is slightly more interesting. Rate limiting would allow us to solve the problem of a spike in scans when we change scanning frequencies. For example, if we wanted to increase our scanning frequency to every 7 days, and we had 50 million accounts, then a rate limit of ~83 scans/second would ensure that they were spread out evenly across 7 days.

But what if we added 10 million more accounts? Then, this rate limit would force us to take 8 days to scan all of these accounts. This is where the adaptive part comes in: the rate limit is asynchronously recalculated every half-hour based on the total number of accounts and zones we have, and our scanning frequencies. This ensures we continue scanning on time even if we onboard thousands or millions more accounts and zones.

func computeRate(free, pro, biz, ent int64) rate.Limit {
   r := float64(free)/freeScanInterval.Seconds() +
      float64(pro)/proScanInterval.Seconds() +
      float64(biz)/bizScanInterval.Seconds() +
      float64(ent)/entScanInterval.Seconds()


   // Guard against zero counts. We always want to schedule at least one scan per second.
   if r < 1 {
      r = 1
   }


   // Increase rate limit beyond the 'perfect' value, to have a buffer in case of any downtime
   // or spikes in load.
   r *= rateLimitBufferFactor


   return rate.Limit(r)
}

Where we stand today

^{With these fixes, our 7-day moving average throughput per checker over time rose by more than 10x.}

Before these improvements, we were executing around 10 scans per second. The gap between this and our target throughput of 100 scans per second seemed vast. We discussed throwing more resources at the problem, throwing more partitions at our Kafka topic – even throwing out our entire architecture.

But our fixes made all the difference. Today, Security Insights sustains over 120 scans per second during peak scheduling, exceeding our 10x improvement goal. Our internal API is no longer timing out, and our Kafka lag metrics look much healthier. These scalability improvements have allowed us to turn on automatic scanning for all free accounts and zones and increase the scanning frequency for all customers:

Free: every 7 days
Pro and Business: every 3 days
Enterprise: daily

The improved system stability has given us confidence to build new features that we were previously constrained from creating. We’ve added the ability to perform granular on-demand scans. You can now manually re-scan a Cloudflare account, zone, insight, or insight type.

^{Starting a granular on-demand scan from the}^{Security Overview page}^{in the Cloudflare dashboard}

The lesson we learned is that it’s crucial to deeply understand the existing system before throwing anything away. By looking closely at our code, SQL queries, logs, and metrics (especially metrics!), we were able to increase our capacity without simply adding more pods or partitions. By questioning our assumptions, digging into weird-looking metrics, and refusing to take the easy shortcuts (such as increasing API client-side timeouts), we built a more stable and resilient system.

Throwing more resources at the problem might sometimes be the answer, but at Cloudflare, we believe in engineering our way out of problems.

Security Insights scans are enabled by default on all Cloudflare plans. Log in to the Cloudflare dashboard today to review and manage your security insights.

Route public traffic to private applications with Cloudflare

Enrique Somoza — Wed, 10 Jun 2026 13:00:00 GMT

For most of the Internet’s history, public and private infrastructure operated as separate worlds. Public applications lived behind content delivery networks (CDNs) and web application firewalls (WAFs). Private applications lived behind virtual private networks (VPNs), firewalls, and separate operational stacks. We think that distinction is becoming obsolete.

Many of the applications organizations care about are not public websites. They are internal APIs, AI agent backends, MCP servers, operational tools, and services that were never designed to be exposed to the public Internet. Yet these applications still need modern security, performance, and programmability services. Security should be a property of the traffic reaching an application, not an accident of where the application happens to sit.

Until now, applying those services to private applications often required public IPs, firewall exceptions, connector software, or complex networking. As a result, many private applications missed out on capabilities such as WAF, bot management, rate limiting, caching, traffic acceleration, rewrites, and Workers, despite needing the same protections and controls as public-facing applications.

Today, we're launching Application Services for Private Origins in closed beta for eligible Enterprise customers. Customers can now securely route traffic to private origins without exposing those origins to the public Internet. This allows Cloudflare's security, performance, and programmability services to protect applications running on private networks, just as they do for public Internet applications.

WAF rules, bot management, rate limiting, caching, rewrites, and Workers can now sit in front of private origins without requiring public IP exposure, inbound firewall rules, or cloudflared running on the origin.

Four use cases, one application layer

This routing model builds on connectivity patterns Cloudflare already supports today through Cloudflare Tunnel, Cloudflare One Client, and private network integrations. For years, Cloudflare Tunnel has allowed customers to route public traffic to private applications through cloudflared. This new capability extends the same model to existing Cloudflare WAN or Cloudflare Mesh connectivity without requiring connector software running on the origin.

Much of that connectivity is orchestrated through Cloudflare’s private networking routing layer that determines how traffic reaches private destinations across Cloudflare Tunnels, Virtual Networks, Cloudflare Mesh, and other connectivity models. Customers can define their routing behavior through APIs and the dashboard instead of managing separate networking stacks for each product.

We have extended Cloudflare’s private networking layer directly into the application services stack, allowing security and performance proxy infrastructure to treat private IPs as valid origin targets for public hostnames. As a result, the same private IPs previously reachable only through Cloudflare Tunnel, Cloudflare One, Cloudflare Mesh, or Cloudflare WAN can now sit behind Cloudflare’s security, performance, and programmability services the same way public origins already do.

This also creates a more unified model across Cloudflare products. Workers VPC bindings and Spectrum private origin routing now rely on the same underlying private connectivity layer, giving customers a single source of truth for controlling how private traffic moves through their Cloudflare environment.

Application traffic now falls into four combinations based on where users come from and where applications live:

The combination on the upper right is what Cloudflare has always done: users on the Internet reach applications on the Internet, with Cloudflare in the middle. The bottom right is Cloudflare One: users on private networks reach public services securely.

The upper left is what we are shipping today. The bottom left, private-to-private, is what we are building toward next.

What is shipping today

Until now, getting public traffic to a private origin often meant making tradeoffs. Customers could use Cloudflare Tunnel, which runs cloudflared, our connector software, on or near the origin, or Cloudflare Load Balancing with private origin pools for health checks and failover. In many cases, organizations also maintained parallel infrastructure such as public-facing load balancers, reverse proxies, mTLS between hops, and TLS termination across multiple layers. As a result, applying Cloudflare's full Application Services stack to private applications often required additional complexity, operational overhead, or separate products. Application Services for Private Origins removes those tradeoffs.

What was missing was a path for customers who already operate Cloudflare WAN (IPsec tunnels, GRE tunnels, CNI links) or Cloudflare Mesh. They had built private connectivity into Cloudflare for site-to-site networking and Zero Trust, and they wanted to use that same connectivity for public traffic to private origins. That is what Application Services for Private Origins delivers.

When you toggle Use private network routing on a proxied A or AAAA record, Cloudflare's WAF, rate limiting, caching, bot management, and transform rules all run as normal on Cloudflare’s network. The only difference is the final hop: instead of reaching the origin over the public Internet, Cloudflare routes the connection through your existing private network connectivity.

The toggle is enabled automatically for RFC 1918 private IPv4 ranges (10.x.x.x, 172.16.x.x–172.31.x.x, and 192.168.x.x), RFC 6598 CGNAT ranges (100.64.x.x–100.127.x.x), and RFC 4193 Unique Local IPv6 Addresses (FC00::/7), since these addresses are only reachable within private networks. For public IP addresses that are reachable only through your private network or tunnel, you can enable the toggle manually.

What the API looks like

For customers automating deployments through the API, private routing is simply an additional attribute on a standard DNS record.

POST /zones/{zone_id}/dns_records
{
 "type": "A",
 "name": "app.example.com",
 "content": "10.0.0.50",
 "ttl": 300,
 "proxied": true,
 "use_private_routing": true
}

Behind the scenes, Cloudflare's proxy platform determines where to send traffic for app.example.com by querying Cloudflare's Origin API. The response includes metadata indicating that the destination should be reached through a private network path:

{
 "zone_name": "example.com",
 "ipv4_addresses": ["10.0.0.50"],
 "use_private_routing": true
}

The use_private_routing flag is the key signal. When our proxy sees it, instead of attempting to connect directly to the private IP address over the public Internet, it hands the request to our private networking layer, which then routes the connection across the customer's existing private network connectivity, whether that's IPsec, GRE, Cloudflare Tunnel, CNI, or Cloudflare Mesh.

Beyond HTTP: Spectrum and Workers VPC

The same routing model now extends beyond HTTP applications. The origin does not have to be a web server. It can be a TCP database, a UDP logging endpoint, or a private API that Workers call directly. The common thread is that Cloudflare sits between your traffic and your private network, applying the same security, performance, and routing layer regardless of protocol or where the request originated.

Spectrum, Cloudflare's Layer 4 proxy, can now sit in front of TCP and UDP services running on private IPs. Instead of creating a load balancer pool as an intermediary, Spectrum applications can specify a virtual_network_id directly on the origin configuration. When you create a Spectrum application, you can include the virtual network ID alongside your private origin IP:

{
 "protocol": "tcp/22",
 "dns": {
   "type": "CNAME",
   "name": "ssh.example.com"
 },
 "origin_direct": ["tcp://10.0.0.50:22"],
 "virtual_network_id": "fab9ac85-491b-44c8-b7ae-dd44d4f4672e"
}

When you create or update a Spectrum application with a private origin and virtual network, Cloudflare verifies that the IP address matches a route in your Cloudflare Tunnel before the configuration is saved. If no matching route exists, the API rejects the request and the application is not created. Once saved, Spectrum hands the connection to your virtual network, which routes it through the associated tunnel, via the same path that HTTP traffic uses when you enable private network routing on a DNS record. In this initial release, Spectrum private origins are supported through Cloudflare Tunnel. Support for additional private network connectivity options will follow in future releases.

This means you can now put Spectrum in front of any TCP/UDP service running on a private IP. The service stays private. No public IP, connector software, or load balancer required.

Workers VPC closes the loop for code running on Cloudflare. A binding tells the Workers runtime to route through the same private path as DNS records. Browsers, mobile apps, Workers, and AI agents all reach your private origins through Cloudflare: DNS records for Internet traffic, bindings for Workers.

What comes next

Public-to-private routing is in closed beta today, and we are targeting GA (General Availability) in Q4 2026.

Beyond GA, we are building toward private-to-private traffic flows: users, services, and AI agents on private networks securely reaching applications on other private networks, with Cloudflare’s application services sitting in the middle.

We are moving toward a model where the same Cloudflare infrastructure can secure traffic regardless of whether the user or the origin is public.

The end state is a world where an employee on Cloudflare One Client accessing wiki.company.internal gets the same WAF, rate limiting, and bot management protections as a customer accessing a public API. An AI agent consuming a proprietary internal API runs through the same security stack as a browser. Service-to-service traffic across clouds and data centers gets the same controls as Internet traffic, even when neither the user nor the server sits on the public Internet.

Get started today

Routing to private origins is available today in closed beta for eligible Enterprise customers. Reach out to your Cloudflare account team to request access. Once enabled, follow our developer documentation, which walks through the full setup. You will need Cloudflare One connectivity (IPsec, GRE, CNI, or Cloudflare Mesh) and a return route for Cloudflare’s source IP range 100.64.0.0/12 in your private network.

Questions or feedback? Join the conversation in our community forums or reach out to your account team.

Defend against frontier cyber models: Cloudflare's architecture as customer zero

Rohit Chenna Reddy — Tue, 09 Jun 2026 06:00:00 GMT

A few weeks ago, we wrote about Project Glasswing and what we observed when we pointed cyber frontier models at our own code. Since then, we’ve seen that the part of the post that has resonated most deeply is the argument that the architecture around the vulnerability matters more than the speed of the patch.

In the conversations we've had with CISOs and security teams since, the questions have been consistent: what does our architecture actually look like, what should we monitor for, where do we start, and how can Cloudflare help?

Before getting into the details: the architecture below is built almost entirely from Cloudflare's own products, because Cloudflare security is customer zero for the security products we build. The Cloudflare stack already exists in front of our code, employees, and customer-facing applications. If you're a Cloudflare customer, every layer below is available to you today. If you're not, the principles still apply to whatever stack you've built.

What a cyber frontier model actually changes

In the previous post, we showed how a cyber frontier model like Mythos changes the attacker’s timeline. It can find vulnerabilities, reason through exploit chains, and generate working proofs faster than earlier models. While models like Mythos do not change the shape of an intrusion — reconnaissance, initial access, lateral movement, persistence, and exfiltration still have to happen — the difference is in the speed and scale. When pointed at the open web, a model can find and hit low-hanging fruit quickly. Against a hardened target, it still has to probe, and adapt, and it often produces more noise than a careful human operator would.

Discovery, exploit chain construction, and proof-of-concept generation used to be the gating constraints on producing a working attack. A frontier model handles all three in a fraction of the time. Work that used to be slow and methodical is now fast and indiscriminate.

While AI is accelerating how fast developer teams at Cloudflare and many other companies can ship code, the security team’s work has not compressed the same way. An attacker only needs one opening to get in, while security teams need to find and close them all. Writing a fix, regressing it, and shipping it without breaking the code around it has constraints that AI doesn't remove. We learned this the hard way when we let an AI coding assistant write its own patches against our own bugs, as we described at the end of the previous post. Some of those patches fixed the original bug while quietly breaking something else the code depended on.

As these models become more competent and capable, our main focus from a threat standpoint comes down to three things. Each one shapes the architecture we walk through in the rest of this post.

The first is the speed of discovery. Frontier models make it easier to search large bodies of public code, including the open-source libraries that many companies depend on. That does not mean every bug in a library is exploitable, or that library bugs are where most vulnerabilities live. Exploitability still depends on how the code is used, whether attacker-controlled input can reach the vulnerable path, and the protections that sit around it. But widely used open-source libraries and frameworks give attackers a shared surface to study at scale. When a real, reachable vulnerability exists there, a model can help find it, reason about possible exploit paths, and generate proof-of-concept variants faster than maintainers and defenders can review every downstream use. The gap between when an attacker discovers a vulnerability and when defenders learn it exists is what worries us most. If you are not running these models against your own code, it is safe to assume someone else is.
The second is exploit volume and adaptation. A model can produce thousands of variations of a single exploit and run reconnaissance at the same scale. All that volume gives an attacker an advantage, but it won’t necessarily get them past signature-based detections. Many of those iterations will have the same underlying signature, so a rule that catches the first one will catch the rest. Adaptation is how they will get past signature-based detections. Ask a model to show you a SQL injection, and it will return a textbook example. Tell it there is a WAF in the way, and it will start probing, learning what gets blocked, and rewriting the payload until it can slip past the rule blocking it.
The third is the impact when a vulnerability is inevitably exploited. No architecture catches everything. After the vulnerability is exploited, the question we ask ourselves is: where can the attacker get to with one identity, one path, or one credential, before something else stops them? If the answer is "anywhere they want," the vulnerability was never the problem. The architecture around the vulnerability was.

Cloudflare’s superpower: visibility

We see roughly a fifth of the web and that tells us, in real time, which payloads are mutating, which patterns are picking up, and where attacker tooling is moving next. Two teams turn that visibility into defense.

First is Cloudforce One, our threat intelligence, research, and operations team, which sits within the Cloudflare security organization. They turn what we see across the network into insights the rest of the stack can act on: tracked adversaries, emerging campaigns, and indicators of compromise (IOCs). The hard part of this work was never knowing what is malicious — it was the delay in mitigation. Knowledge of a new threat normally has to travel from a threat report, into a feed, and then into a company’s defense before it can be used to block anything. Attackers have learned to move faster than that. Our network closes that gap: Cloudflare customers can now use Cloudforce One threat intelligence directly within the WAF to block high-risk traffic.

Second is the team that owns the WAF engine that does the actual detecting: the managed rulesets that run in front of our own properties and are available to every Cloudflare customer, the machine learning behind WAF Attack Score, and the relationships that sometimes let us ship a rule before a CVE is publicly disclosed. The team is globally distributed and moves fast, releasing rules within hours of a proof-of-concept of an attack becoming known. Once a detection is deployed, it reaches our entire network, along with every Cloudflare customer, in under 30 seconds. React2Shell is a recent example: a managed WAF rule was protecting our own properties, and everyone else's on Cloudflare, hours before the official advisory was published.

The scoring layer, the defenses we put in front of the application, and the containment around the vulnerability all build on what these two teams see.

Scores over signatures

Signature-based defenses were built for a world where novel exploits were scarce and variations took weeks. Cloudflare's traditional SLA from a fresh proof-of-concept to a live, deployed rule has been 12 hours. With the advent of frontier models, this is not good enough anymore. Detections need to be in place before a CVE is discovered. This is why we layer ML-based detection in front of the traditional signature-based WAF.

The model is trained on a large body of past attack traffic, and it catches new variants of vulnerabilities before they're publicly known. A novel SQL injection or remote code execution chain is almost always a rearrangement of attack shapes the model has seen before, even when the specific exploit is brand new. We run the model on every request and assign a WAF Attack Score between 1 and 99, based on how closely the request resembles those underlying shapes, not against a list of known-bad signatures. The lower the score, the more aggressively we treat the request. That score determines whether we let the request through. We apply a similar scoring methodology to AI prompts with AI Security for Apps: rather than check each prompt against a list of known malicious prompts, we score how closely a prompt resembles an actual attack.

The architecture around the vulnerability

Those capabilities only matter once they're stacked in front of an application, and the first layer in our defense-in-depth approach is the WAF. Anything that matches a known-bad pattern gets dropped before it reaches the application, which clears the bulk of the obvious traffic and lets the more specialized layers below focus on what's left. On the API surface, we run a positive security model through API Shield. Instead of trying to anticipate every bad request, we describe what a valid request to each API looks like, either from the API's own definition or learned from our real traffic, and anything that doesn't fit doesn't get through. This neutralizes the advantage of frontier AI models: because we only permit validated traffic, generating thousands of new attack variations fails to bypass the system.

^{Cloudflare’s layered architecture}

Bot Management catches probing traffic on our network before frontier models can build a map. It scores every request on how likely it is to be automated, using the same signals across our whole network: how the client behaves, whether it looks like a real browser, and whether the connection matches a known-bad pattern. An attack only lands if it can find a soft spot.

Zero Trust Network Access is used for every internal application. The implicit trust of being inside the network is replaced with explicit per-request identity and policy for every employee accessing every tool. The value of this was clear when one of our engineers shipped a misconfigured tool. A flat network would have exposed everything on the same segment, but in our deployment, the exposure stopped at the tool itself. We built Require Access Protection afterwards so newly deployed or misconfigured applications can't be reachable before an access policy is in place.

IdP Federation makes that secure by default posture easier to keep consistent across every Cloudflare account — which becomes even more necessary when more people are shipping internal tools quickly. Instead of asking each team to wire up SSO separately, we configure our identity provider (IdP) once and share it across the organization. New accounts get SSO automatically, recipient-side IdP connections are read-only, and Access policies in each account still evaluate the resulting identity as part of the normal request flow.

MCP Server Portal gives teams a controlled way to connect AI agents to enterprise systems. Agents access MCP servers that are centrally managed through a single portal, with every action logged. That way when an agent acts on someone's behalf, we know what it did, what it touched, and whether it should have been allowed to. The full picture of how we built it is in our post on enterprise MCP.

AI Gateway runs in front of our internal AI tools the same way AI Security for Apps runs in front of customer-facing AI features, with the same scoring and the same visibility. Inside the company, the visibility piece is more useful than the blocking, because we needed to see what engineers were actually building before we could write meaningful policy on it.

Where your teams can start

Frontier models can help attackers find vulnerabilities, adapt payloads, and move faster, but they still have to pass through the layered defense you deploy in front of your application. That is where teams should start:

Put inspection in front of public applications.
Define what valid API traffic looks like.
Use bot detection to limit automated probing.
Require identity and access policy before any internal tool is reachable.

For AI and agentic systems:

Route model traffic through a gateway.
Keep agents connected through approved MCP servers.
Log what they do.

The goal is to make sure that when one layer misses, the next layer limits what the attacker can see, reach, or change.

That is the point of the architecture around the vulnerability: to limit the scope of an attack. The vulnerability may be what starts the attack, but the architecture determines how far it can go.

How do we know this approach works?

Plenty of security stacks look impenetrable on a whiteboard but fall over in practice. That is why we test ours continuously, both at the perimeter and inside our environment, with our red team involved across both.

At the perimeter, frontier models are one tool we use to test our application security stack as an adaptive attacker. These models sit alongside the rest of our red team and detection workflows including: manual testing, threat intelligence, observed traffic patterns, proof-of-concept analysis, and signals from our own network. Together, those inputs help us decide where to aim testing: newly launched products, recently changed surfaces, and the paths an attacker is most likely to probe first. The most important part is the process that follows. When something gets through, we identify the gap, use the right mix of tools to understand it, write the rule or mitigation, ship the update, and test again to make sure the gap is closed.

Inside the environment, our red team starts from the assumption that the perimeter has already failed. They look at what has changed, where sensitive systems carry risk, and whether one compromised identity, path, or credential can reach farther than it should. When we change the architecture based on what they find, they run the scenario again against the new version to confirm the gap is actually closed.

We confirm that this architecture is working by continuously testing its behavior during failures, rather than relying on the perfection of individual layers.

If your team is working on the same problems and would like to compare notes, reach out to us at security-ai-research@cloudflare.com.

The Cloudflare Blog

Your Worker can now have its own cache in front of it

Why server-rendered apps need a cache in front

stale-while-revalidate is the part that makes it feel instant

One URL, many representations: Vary works

This is your Worker's cache, not your zone's

Two tiers, every Worker, no configuration

Run your app near the user and near the data

Multi-tenant by default, with ctx.props

A cache between every Worker entrypoint

First-class support in your framework

See your cache on the same dashboard as your Worker

Billing

What's next

Try it

Announcing the Monetization Gateway: charge for any resource behind Cloudflare via x402

The evolving business model of the web

A refresher on x402

What the Monetization Gateway does

Where we see this going

Sign up for our waitlist

Content Independence Day, one year on: building the business model for the agentic Internet

Part I: The Internet has changed – faster than anyone expected

The vertical adoption curve

The decline of the open web

The agentic Internet is here

Crawlers have changed their purpose

The old business model is gone

The implications are industry-agnostic

Part II: The market has emerged

What we built

Transparency and control created scarcity

Scarcity created leverage

Leverage is changing the balance of power

The market is maturing, but inefficiencies remain

The Google convergence problem

Part III: A unique view of the ecosystem

Part IV: Lessons from an emerging market

Transparency must become the standard

Better AI requires better signals

Markets need better discovery before better pricing

Part V. Building the infrastructure for the agentic Internet

Making AI search smarter

Rebuilding the bargain

Making search smarter

From Pay Per Crawl to Pay Per Use

The headline we want to earn

Your site, your rules: new AI traffic options for all customers

Now, AI can be anything

A pragmatic taxonomy

New options to manage AI traffic

Setting new defaults

BotBase: a new visibility plane for Enterprise customers

How does a crawler use my content?

What does it mean for a bot to be Verified?

Experimenting with transitive trust

Set your terms today

Unmasking the crawls with Attribution Business Insights

The new economics of the Internet

Introducing Attribution Business Insights

From data to business strategy

What’s next

How we built saga rollbacks for Cloudflare Workflows

Try it out

How we designed the API

Why not a fluent or builder API?

Rollback as step metadata

How it works under the hood

What’s next

Unlocking the Cloudflare app ecosystem with OAuth for all

Scaling the ecosystem securely

Planning the upgrade to our OAuth engine

Executing the upgrade

Upgrading to 1.X

Upgrading to 2.X

Performance improvements

Database

Hydra performance

Self-managed OAuth for all

The White House's post-quantum executive order is an important milestone. It’s time to get to work

`stale-while-revalidate` is the part that makes it feel instant

One URL, many representations: `Vary` works

Multi-tenant by default, with `ctx.props`