The Cloudflare Blog

Browser Run: now running on Cloudflare Containers, it’s faster and more scalable

Ruskin Constant — Wed, 13 May 2026 13:00:00 GMT

We’ve enabled higher usage limits, faster performance, and better reliability for Browser Run by rebuilding on top of Cloudflare’s Containers.

You can now spin up 60 browsers per minute via the Workers binding and run up to 120 concurrently — 4x the previous limit. Also, Quick Action response times dropped more than 50%. You don't need to change anything: these improvements are live today. On top of that, we’re shipping fixes and new features faster than before. Read on to learn how we did it and see the data.

Remind me: what is Browser Run?

Browser Run enables developers to programmatically control and interact with headless browser instances running on Cloudflare’s global network. That’s useful for end-to-end testing of web applications, securely investigating suspicious URLs, and leveraging how browsers can easily render PDF documents, amongst other quick actions like capturing screenshots and extracting content. More recently, it’s become a critical enabler of AI agents to interact with the web. We’re building Browser Run to be the go-to platform to responsibly utilize automated browsers securely at massive scale.

Outgrowing our bunk bed

Before adopting Cloudflare Containers, we shared infrastructure with Browser Isolation (BISO). While technically similar, BISO’s larger container images slowed startup and development. Crucially, BISO browsers lacked optimal global distribution, compromising resiliency and latency. Additionally, typical BISO users’ long, steady sessions clashed with Browser Run’s short, spiky usage, creating scaling bottlenecks and availability delays.

Thankfully, after much internal development, Cloudflare released Durable Object (DO)-enabled Containers open beta last year, meaning we were ready for a tentative adoption that ultimately benefited both product platforms. Like most successful product platforms, we’re committed to building on our own platform wherever feasible so that we can feel and fix any pain points ahead of any external customers.

The migration: Containers

We started a gradual migration by inserting a Worker in our incoming request paths to provide some Container-powered browsers to a handful of users alongside those from BISO. This dual support during development was key: it allowed us to compare performance, isolate implementation bugs and ultimately gain confidence in the benefits of the Container-driven approach.

Ramping up adoption, we first used the Container browsers for all of our Quick Actions endpoints, then for connections via the Workers browser binding on free accounts, followed by pay-as-you-go accounts in order to validate stability before we rolled it out to all remaining contract customers, ensuring a transition that required no action or existing worker redeployments from our customers.

Challenges: performance and scale bottlenecks

On our end, though, we faced a fresh set of challenges getting familiar with a novel, unstable early-stage Containers platform interface that was light on documentation, light on observability, and light on colleagues in an overlapping timezone. However, our feedback to our own teams as Customer Zero meant that we could provide a tight feedback loop leading to substantial upgrades that benefit our external customers too. Nevertheless, there was a lot of friction to overcome initially, most of which were to be expected for a closed beta in active development. Other hurdles to overcome were intrinsic to the new technical environment.

For example, once our browsers could run globally, our architecture had to adapt. DO-enabled Containers create a Durable Object as close to the incoming request as possible, but the connected Container may spin up on the other side of the world. This works fine for one-shot messages like "start my app," but when you're establishing a WebSocket between them and exchanging dozens of messages for a screenshot request, those extra milliseconds crossing the globe start adding up.

Our solution? Create regional pools of pre-warmed DO-backed browser containers to constrain the max distance (and hence max latency) between DOs and containers. When a request comes in, we pick a DO-container pair closest to the user within that region. This keeps latency low on both hops: user to DO, and DO to container. It adds a few more moving parts to our overall architecture, but we figured that was worthwhile so long as we had observability into the global state of each browser so that we could allocate and re-allocate capacity according to changing demand. A perfect use case for Workers KV…to a point.

Demand for our headless browsers has been ramping up since the beginning of last year. In short, AI agent builders discovered Browser Run and quickly brought request volumes outpacing our existing capacity. We quickly hit the limits of how quickly we could adjust our pool capacity to serve this new demand with a scalable approach. KV’s eventual consistency of around 30 seconds was becoming a bottleneck on our critical request path. You might check KV, see a container as "available," but by the time you route to it (30 seconds later), it's already claimed. That lag creates race conditions and overallocation of browsers, severely limiting how fast we could scale to meet demand spikes.

Migrating from KV to D1 + Queues

We previously stored each container state in KV. This meant that we could keep getting a minute old state due to cache TTL (recently KV changed the minimum cache TTL to 30 seconds, but even so that value is still too high).

We decided to migrate the container state into D1 instances instead. D1's transactional nature is a good fit here. Once we assign a browser to a user, it's exclusively theirs. Browsers are not shared resources. SQLite transactions ensure atomic assignment and prevent race conditions where two requests might claim the same browser simultaneously.

Here’s a simplified version of our browser acquisition query:

WITH candidate_pool AS (
    -- candidate pool logic to pick based on latency and other rules
)
UPDATE containers
SET status = 'picked'
WHERE sessionId IN (
    SELECT sessionId
    FROM candidate_pool
    ORDER BY RANDOM()
    LIMIT ?5
)
RETURNING data

We keep D1 shards per location and given that we may have several thousand containers running, and that each container needs to update its state every 5 seconds, we kept running into a problem: we would overload the database. For instance, if each write takes 1ms we can only write at most 1,000 times, which at one row per write would mean that we could only have 5,000 containers before overloading the database.

However, if we batch those writes, we can get much higher values, because batch writes are not significantly longer than individual ones, so we can increase the throughput in orders of magnitude. In our case, we use 100 row batches, which means we can now update a maximum of 500,000 containers per location. This headroom means capacity planning is no longer a bottleneck.

Currently, our P95 for batch write is 0.1ms!

To batch writes, we use Queues: every 5 seconds, each container computes its own state and adds it to its location queue. We then configure a worker consumer with 100 batch size and 1 second batch timeout:

{
    ...
    "queues": {
        "consumers": [
            {
                "queue": "production-core-containers-queue-weur",
                "max_batch_size": 100,
                "max_batch_timeout": 1,
                "max_retries": 1,
            },
            ...
        ]
        ...
    }
}

With this configuration, we achieve acceptable lag times well below 2 seconds. That said, queue backlogs can still cause stale state. When this happens, each region falls back to a designated backup region until the primary queue catches up.

Additional perks for quick actions

With dedicated infrastructure, we could now make upgrades to the browser container image without unwanted side effects or bloat for other products like BISO. This opened the door to optimize quick actions like screenshots and content extraction. Previously, our workers established a WebSocket to the remote browser and sent instructions one at a time: open a page, navigate to the URL, wait for it to load and take the screenshot. Each step had to be completed before the next could begin.

However, now we send all parameters in a single HTTP request directly to the container, and the entire flow executes internally without any back-and-forth between the worker and browser.

Results: massive performance boost and increased limits

We’ve seen a sharp decrease in average quick-action response time, as users are able to get what they need from a browser session in less time: less time waiting for browsers to be ready and faster processing of their DevTools Protocol messages.

Overcoming our real-time state management at this new scale meant we could spend more time in the playground, discovering and cooking up new features such as our recently launched /crawl endpoint.

Better browser flexibility

We also benefitted from another important perk by leaving behind shared Browser Isolation containers: faster upgrades.

When our browsers ran on shared product infrastructure, upgrading Chrome meant coordinating across multiple teams and products, each with their own roadmap and priorities. However, now that we run our own container image, we can upgrade at a faster tempo. For example, WebGL, a much-requested feature, is now available for browser-based rendering along with WebMCP (Model Context Protocol for the web) which enables new agentic interaction patterns. Both are made possible because we can control the browser version and flags without unwanted side effects in other Cloudflare products.

In a nutshell, we’re just getting started with unleashing the power of browsers at scale, especially for agentic development. We hope you’re diving in too — check out our docs.

Get started

Browser Run is available on all Workers plans. Start with the quick start guide, explore the Quick Actions, or try the /crawl endpoint to deeply extract data from any webpage, following links across the site.

Building AI agents? Check out our Agents SDK with built-in Browser Run support.

Introducing Dynamic Workflows: durable execution that follows the tenant

Dan Lapid — Fri, 01 May 2026 13:00:00 GMT

When we first launched Workers eight years ago, it was a direct-to-developers platform. Over the years, we have expanded and scaled the ecosystem so that platforms could not only build on Workers directly, but they could also enable their customers to ship code to us through many multi-tenant applications. We now see on Workers: Applications where users describe what they want, and the AI writes the implementation. Multi-tenant SaaS where every customer's business logic is, at runtime, some TypeScript the platform has never seen before. Agents that write and run their own tools. CI/CD products where every repo defines its own pipeline.

Last month, when we shipped the Dynamic Workers open beta, we gave those platforms a clean primitive for the compute side: hand the Workers runtime some code at runtime, get back an isolated, sandboxed Worker, on the same machine, in single-digit milliseconds. Durable Object Facets extended the same idea to storage — each dynamically-loaded app can have its own SQLite database, spun up on demand, with the platform sitting in front, as a supervisor. Artifacts did the same for source control: a Git-native, versioned filesystem you can create by the tens of millions, one per agent, one per session, one per tenant. So, we have dynamic deployment for storage and source control. What’s next?

Today, we are bridging durable execution and dynamic deployment with Dynamic Workflows.

The gap between durable and dynamic execution

Cloudflare Workflows is our durable execution engine. It turns a run(event, step) function into a program where every step survives failures, can sleep for hours or days, can wait for external events, and resumes exactly where it left off when the isolate is recycled. It's the right primitive for anything that has to "keep going" past a single request: onboarding flows, video transcoding pipelines, multi-stage billing, long-running agent loops, and — as of Workflows V2 — up to 50,000 concurrent instances and 300 new instances per second per account, redesigned for the agentic era.

But Workflows has always had one assumption baked in: the workflow code is part of your deployment. Your wrangler.jsonc has a block that says "when the engine calls into WORKFLOWS, run the class called MyWorkflow." One binding, one class. Per deploy.

That works fine if you own all the code. It's fine if you're running a traditional application.

It stops working the moment you want to let your customer ship their workflow.

Say you're building an app platform where the AI writes TypeScript for every tenant. Say you're running a CI/CD product where each repository has its own pipeline. Say you're using an agents SDK where each agent writes its own durable plan. In every one of these cases, the workflow is different for every tenant, every agent, every request. There is no single class to bind.

This is the same shape of problem that Dynamic Workers solved for compute and that Durable Object Facets solved for storage. We just hadn't solved it for durable execution yet.

Dynamic Workflows

@cloudflare/dynamic-workflows is a small library. Roughly 300 lines of TypeScript. It lets a single Worker — the Worker Loader — route every create() call to a different tenant's code, and, critically, have the Workflows engine dispatch run(event, step) back to that same code when the workflow actually executes, seconds or hours or days later.

Here's the whole pattern. A Worker Loader:

import {
  createDynamicWorkflowEntrypoint,
  DynamicWorkflowBinding,
  wrapWorkflowBinding,
} from '@cloudflare/dynamic-workflows';

// The library looks this class up on cloudflare:workers exports.
export { DynamicWorkflowBinding };

function loadTenant(env, tenantId) {
  return env.LOADER.get(tenantId, async () => ({
    compatibilityDate: '2026-01-01',
    mainModule: 'index.js',
    modules: { 'index.js': await fetchTenantCode(tenantId) },
    // The tenant sees this as a normal Workflow binding.
    env: { WORKFLOWS: wrapWorkflowBinding({ tenantId }) },
  }));
}

// Register this as class_name in wrangler.jsonc.
export const DynamicWorkflow = createDynamicWorkflowEntrypoint(
  async ({ env, metadata }) => {
    const stub = loadTenant(env, metadata.tenantId);
    return stub.getEntrypoint('TenantWorkflow');
  }
);

export default {
  fetch(request, env) {
    const tenantId = request.headers.get('x-tenant-id');
    return loadTenant(env, tenantId).getEntrypoint().fetch(request);
  },
};

Add to your wrangler.jsonc:

"workflows": [
		{
			"name": "dynamic-workflow",
			"binding": "WORKFLOW",
			"class_name": "DynamicWorkflow"
		}
	]

The tenant writes plain, idiomatic Workflows code. They have no idea they're being dispatched:

import { WorkflowEntrypoint } from 'cloudflare:workers';

export class TenantWorkflow extends WorkflowEntrypoint {
  async run(event, step) {
    return step.do('greet', async () => `Hello, ${event.payload.name}!`);
  }
}

export default {
  async fetch(request, env) {
    const instance = await env.WORKFLOWS.create({ params: await request.json() });
    return Response.json({ id: await instance.id });
  },
};

That's it. The tenant calls env.WORKFLOWS.create(...) against what looks like a perfectly normal Workflow binding. Workflow IDs, .status(), .pause(), retries, hibernation, durable steps, step.sleep('24 hours'), step.waitForEvent() — everything works the way it always has.

The library handles one thing: making sure that when the Workflows engine eventually wakes up and calls run(event, step), it ends up inside the right tenant's code.

How it works

Three layers: the Workflows engine (platform) on top, your Worker Loader in the middle, your tenant's code (a Dynamic Worker) on the bottom.

When a request reaches the Worker Loader, it routes the execution to the correct dynamic code on the fly. The rest of the execution is a handoff between these three layers, left-to-right in time: the request enters, bounces up to the engine, is persisted, and later bounces back down again.

Walking the flow:

① → ② Entering the tenant's code. The Worker Loader receives an HTTP request, figures out which tenant it's for, loads that tenant's code via the Worker Loader, and forwards the request to its default.fetch. The env it hands the tenant contains WORKFLOWS: wrapWorkflowBinding({ tenantId }). As far as the tenant is concerned, that looks and acts like a real Workflow binding.

③ Up to the Worker Loader. When the tenant calls env.WORKFLOWS.create({ params }), it's actually making a Remote Procedure Call (RPC) into the Worker Loader — the wrapped binding is a WorkerEntrypoint subclass (DynamicWorkflowBinding) that the runtime specialized with the tenant's metadata at load time. That's why you have to export { DynamicWorkflowBinding } from your Worker Loader: the runtime builds per-tenant stubs by looking the class up in cloudflare:workers exports. Bindings that cross the Dynamic Worker boundary have to be RPC stubs — a plain { create, get } object can't be structured-cloned, and the raw Workflow binding isn't serializable either.

Inside the Worker Loader, the wrapped binding transparently rewrites the payload:

tenant calls:  create({ params: { name: 'Alice' } })
                            │
                            ▼
engine sees:   create({ params: {
                  __workerLoaderMetadata: { tenantId: 't-42' },
                  params: { name: 'Alice' }
               }})

④ Up to the engine. The Worker Loader then calls .create() on the real WORKFLOWS binding with the envelope as the params. From here the Workflows engine takes over. It persists event.payload — which now includes the envelope — and schedules the run. Every time the engine later wakes up the workflow (whether that’s after a 24-hour sleep, a crash, or a deploy), the metadata rides along with the payload, waiting to route the run.

One implication: treat the metadata as a routing hint, not as authorization. The tenant can read it back via instance.status(). Don't put secrets in there.

⑤ → ⑥ The engine comes back down. When the engine is ready to run a step, it calls .run(event, step) on the class you registered in wrangler.jsonc — the one createDynamicWorkflowEntrypoint gave you. That class unwraps the envelope, hands the metadata to the loadRunner callback you wrote, and forwards the unwrapped event through to whatever runner the callback returns.

The callback is where everything interesting happens, and it's entirely yours. Fetch the tenant's latest source from R2. Check their plan tier and pick a region. Attach a tail Worker for per-tenant logging. Bundle TypeScript on the fly with @cloudflare/worker-bundler. In the common case, you just hand off to the Worker Loader:

const stub = env.LOADER.get(tenantId, () => loadTenantCode(tenantId));
return stub.getEntrypoint('TenantWorkflow');

The Worker Loader caches by ID, so a workflow that runs many steps over many hours reuses the same dynamic Worker across them. When the isolate eventually gets evicted, the next step.do() pulls the code again and keeps going — the tenant's workflow has no idea anything happened. A Dynamic Worker boots in single-digit milliseconds using a few megabytes of memory, so the dispatch overhead is essentially free. You can have a million tenants, each with their own distinct workflow code, each spun up lazily on the step boundary where it's needed, and none of them cost anything while idle.

The escape hatch

If you want to subclass WorkflowEntrypoint yourself — to add logging around run(), wire up per-tenant observability, or thread custom state through — the library exposes the lower-level dispatchWorkflow primitive that createDynamicWorkflowEntrypoint is built on:

import { dispatchWorkflow } from '@cloudflare/dynamic-workflows';

export class MyDynamicWorkflow extends WorkflowEntrypoint {
  async run(event, step) {
    return dispatchWorkflow(
      { env: this.env, ctx: this.ctx },
      event,
      step,
      ({ metadata, env }) => loadRunnerForTenant(env, metadata),
    );
  }
}

Everything else — IDs, pause/resume, sendEvent, retries — falls through to the real Workflows engine untouched.

Dynamic Workers are the primitive

Step back from the specifics for a second. Every interesting line of this library is either a wrapper around .create() on the outbound side or a wrapper around WorkflowEntrypoint on the inbound side. The actual work — spinning up the tenant's code, sandboxing it, routing RPC across the boundary, caching the isolate, hibernating between steps — is all done by Dynamic Workers underneath.

That's the real story, and it's a lot bigger than Workflows

Dynamic Workers is the primitive that swallows everything. Durable Object Facets is the same pattern applied to Durable Objects. Dynamic Workflows is that same pattern applied to WorkflowEntrypoint. Each one is the same small amount of envelope-and-unwrap glue between the static binding you've always had and the dynamic version you can now hand to your customers.

And we're not stopping at Workflows. Every binding that Workers currently exposes is heading for a dynamic counterpart — queues where each producer ships its own handler, caches, databases, object stores, AI bindings, and MCP servers where every tenant brings their own tools. Whatever you bind to a Worker today, you will soon be able to bind dynamically: dispatched per tenant, per agent, per request, at zero idle cost.

The unit economics of running a platform like this are, frankly, absurd. Shipping a multi-tenant product used to mean giving every customer their own container, their own database, their own disk, their own scheduler, and stitching it together with orchestration glue, service meshes, and hair-pulling billing math. Many of these applications have to support thousands of customers at the very least; millions, at the most. On Dynamic Workers and everything composing on top of them, idle tenants cost approximately nothing and active tenants share the same hardware through isolate-level multi-tenancy. The floor drops several orders of magnitude. A platform that used to cap out at thousands of paying customers can now reasonably serve tens of millions.

What this unlocks

Agent platforms that plan like engineers

Coding agents — OpenCode, Claude Code, Codex, Pi — have been proving for the past year that LLMs are far better at writing code than at making sequential tool calls. The Cloudflare Agents SDK and Project Think extend that insight into durable execution: with primitives like fibers and sub-agents, an agent's long-running plan can survive crashes, hibernation, and redeploys without the user noticing.

Dynamic Workflows is the piece that lets that plan be a first-class Cloudflare Workflow — something the agent literally writes and the platform literally runs, with the full durability machinery behind it. A run(event, step) function the model wrote a minute ago, where every step.do(...) is independently retryable, every step.sleep('24 hours') hibernates for free, and every step.waitForEvent(...) waits indefinitely for the human to approve the next action. The agent writes the workflow; the platform runs it; neither has to know ahead of time what the plan looks like.

SDKs and frameworks where the user brings the logic

If you're shipping a framework where your customer writes the run(event, step) function — a workflow builder UI, a visual automation tool, a per-tenant extension system, a low-code tool for non-developers — Dynamic Workflows is now the primitive that makes it work without compromise. You call wrapWorkflowBinding({ tenantId }) once, hand the result to their code as WORKFLOWS, and every workflow instance they create is automatically tagged, routed back, and executed in their sandbox. The framework owns the Worker Loader; the user owns the workflow; neither has to care about the other.

CI/CD at primitive speed

Here's the use case that's been getting us most excited.

Every CI/CD platform in existence is, underneath, a dispatcher of per-repo configuration files: "run these steps, in this order, with these secrets, cache these directories, upload these artifacts." Each repo has its own pipeline. Each branch might have its own variant. Each pull request spawns an instance of that pipeline that has to run to completion, survive a machine crash, retry a flaky step, stream logs, pause for approvals, and persist results.

That's exactly the shape of a durable workflow. The reason CI hasn't been built that way until now is that nobody had a cloud primitive where the workflow itself is different for every repo, dispatched at runtime, at zero provisioning cost. Now you do.

Here's what a CI pipeline looks like when it's just code your customer ships with their repo — say, in .cloudflare/ci.ts. The workflow itself is real; the runInSandbox() / summarise() / GitHub binding helpers below are platform-provided glue, the kind of thing you'd ship once in your dispatcher:

import { WorkflowEntrypoint } from 'cloudflare:workers';

export class CIPipeline extends WorkflowEntrypoint {
  async run(event, step) {
    const { repo, sha, branch, pr } = event.payload;

    // Fork an isolated copy of the repo at this commit. Seconds, not minutes.
    const workspace = await step.do('checkout', () =>
      this.env.ARTIFACTS.fork(repo, { sha })
    );

    await step.do('install', () => runInSandbox(workspace, ['pnpm', 'install']));

    // Each parallel step is independently retryable.
    const [lint, test, build] = await Promise.all([
      step.do('lint',  () => runInSandbox(workspace, ['pnpm', 'lint'])),
      step.do('test',  () => runInSandbox(workspace, ['pnpm', 'test'])),
      step.do('build', () => runInSandbox(workspace, ['pnpm', 'build'])),
    ]);

    if (pr) {
      await step.do('comment', () =>
        this.env.GITHUB.commentOnPR(repo, pr, summarise({ lint, test, build }))
      );
    }

    // Workflow hibernates until approval arrives. No VM held open.
    if (branch === 'main') {
      await step.waitForEvent('approval', { type: 'deploy-approval', timeout: '24 hours' });
      await step.do('deploy', () => runInSandbox(workspace, ['pnpm', 'deploy']));
    }
  }
}

The platform owns the dispatcher. It ingests a webhook, figures out which repo it came from, loads that repo's CIPipeline class as a Dynamic Worker, and hands the run-off to Dynamic Workflows. The platform doesn't know what's in the pipeline. It doesn't need to. It's running a durable function that happens to live in the customer's repo.

Now line up what each step actually does:

Artifacts gives every repo a Git-native, versioned filesystem that lives on Cloudflare's globally distributed network. ArtifactFS hydrates the tree lazily, so even a multi-GB repo is ready to work within single-digit seconds — and fork() gives each CI run its own isolated copy, with no git clone tax.
Dynamic Workers run each lightweight step (lint, format, typecheck, bundle) in a sandboxed isolate that boots in milliseconds, on the same machine as the repo's data. No VM provisioning, no image pull, no cold start.
Dynamic Workflows holds the whole run together. Steps are retryable and durable. The run hibernates for free while waiting on approvals. State and progress survive deploys, evictions, and crashes.
Sandboxes handle the heavy corners — the step that needs docker build, the integration suite that needs Postgres running, the Rust compile that needs 8 cores. Snapshots to R2 mean even those warm-start in a couple of seconds.

A traditional CI run for a mid-sized JS repo looks something like: allocate VM (15-30s) → pull base image (10s) → git clone (10s) → npm ci (30-60s) → run tests (actual work) → tear down. Several minutes of ceremony before the first test runs, and you pay for the whole VM the whole time.

The same pipeline on this stack looks like: edge fork of the repo (seconds) → each step boots a fresh isolate or snapshot-restored sandbox in milliseconds → runs the actual work → hibernates. Nothing has to cold-start. Nothing has to be provisioned ahead of time. Nothing has to be kept warm. The repo doesn't move — the compute comes to it.

CI has never been this fast, and the reason it hasn't is that none of these primitives have existed together in one place. Now they do.

Try it

@cloudflare/dynamic-workflows is MIT-licensed and on npm today:

npm install @cloudflare/dynamic-workflows

It runs on top of Dynamic Workers, which is in open beta on the Workers Paid plan. The repo includes a working example — an interactive browser playground where you write a TenantWorkflow class, hit Run, and watch the steps execute with live-streaming logs and a per-step checklist that lights up as each step.do() commits. Clone it, deploy it, show it to a coworker.

If you're a platform, an SDK, a framework, or a CI/CD product, and you want to give your customers their own workflows without running their code in your own process: this is the primitive we built for you. If you're building agents that write durable plans, this is the primitive that makes those plans real Workflows. If you're just watching all of this, and it looks fun to build on top of: we'd love to see what you make.

Find us in the Cloudflare Developers Discord.

Agents can now create Cloudflare accounts, buy domains, and deploy

Sid Chatterjee — Thu, 30 Apr 2026 13:00:00 GMT

Coding agents are great at building software. But to deploy to production they need three things from the cloud they want to host their app — an account, a way to pay, and an API token. Until now these have been tasks that humans handle directly. Increasingly, agents handle them on the user’s behalf. The agent needs to perform all the tasks a human customer can. They’re given higher-order problems to solve and choose to use Cloudflare and call Cloudflare APIs.

Starting today, agents can provision Cloudflare on behalf of their users. They can create a Cloudflare account, start a paid subscription, register a domain, and get back an API token to deploy code right away. Humans can be in the loop to grant permission and must accept Cloudflare's terms of service, but no human steps are otherwise required from start to finish. There’s no need to go to the dashboard, copy and paste API tokens, or enter credit card details. Without any extra setup, agents have everything they need to deploy a new production application in one shot. And with Cloudflare’s Code Mode MCP server and Agent Skills, they’re even better at it.

This all works via a new protocol that we’ve co-designed with Stripe as part of the launch of Stripe Projects.

We’re excited to launch this new partnership with Stripe, and also to offer $100,000 in Cloudflare credits to all new startups who incorporate using Stripe Atlas. But this new protocol also makes it possible for any platform with signed-in users to integrate with Cloudflare in the same way Stripe does, with zero friction for the end user.

How it works: zero to production without any setup or manual steps

Install the Stripe CLI with the Stripe Projects plugin, login to Stripe, and then start a new project:

stripe projects init

Then prompt your agent to build something new and deploy it to a new domain. You can watch a condensed two-minute video of this entire flow below:

If the email you’re logged into Stripe with already has a Cloudflare account, you’ll be prompted with a typical OAuth flow to grant the agent access. If there is no existing Cloudflare account for the email you’re logged in with, Cloudflare will provision an account automatically for you and your agent:

You will see the agent build and deploy a site to a new Cloudflare account, and then use the Stripe Projects CLI to register the domain:

The agent will prompt for input and approval when necessary. For example, if your Stripe account doesn’t yet have a linked payment method, the agent will prompt you to add one:

At the end, the agent has deployed to production, and the app runs on the newly registered domain:

The agent has gone from literal zero, no Cloudflare account at all, without any preconfigured Agent Skills or MCP server, to having:

Provisioned a new Cloudflare account
Obtained an API token
Purchased a domain
Deployed an app to production

But wait — how did the agent discover that it could do all of this? How did it know what services it could provision, and how to purchase a domain? How did it gain the context it needed to understand how to deploy to Cloudflare? Let’s dig in.

How the protocol and integration works

There are three components to the interaction between the agent, Stripe, and Cloudflare shown above:

Discovery — the agent can call a command to query the catalog of available services.
Authorization — the platform attests to the identity of the user, allowing providers to provision accounts or link existing ones, and securely issue credentials back to the agent.
Payment — the platform provides a payment token that providers can use to bill the customer, allowing the agent to start subscriptions, make purchases and be billed on a usage basis.

These build on prior art and existing standards like OAuth, OIDC and payment tokenization — but are used together to remove many steps that might otherwise require a human in the loop.

Discovery: how agents find services they can provision themselves

In the agent session above, before the agent ran the CLI command stripe projects add cloudflare/registrar:domain, it first had to discover the Cloudflare Registrar service. It did this by calling the stripe projects catalog command, which returns available services:

The full set of Cloudflare products and services from other providers is long and growing — arguably overwhelming to humans. But for agents, this catalog of services is exactly the context they need. The agent chooses services to use from this catalog based on what the user has asked them to do and the user’s preferences — but the user needs no prior knowledge of what services are offered by which providers, and does not need to provide any input. Providers like Cloudflare make this catalog available via a simple REST API that returns JSON, and that gives agents everything they need.

Authorization: instant account creation for new users

When the agent chooses a service and provisions it (ex: stripe projects add cloudflare/registrar:domain), it provisions the resource within a Cloudflare account. But how is it able to create one on demand, without sending a human to a signup page?

Remember how at the start, the user signed in to their Stripe account? Stripe acts as the identity provider, attesting to the user’s identity. Cloudflare automatically provisions a new account for the user if no account already exists, and returns credentials back to the Stripe Projects CLI, which are securely stored, but available to the agent to use to make authenticated requests to Cloudflare. This means if someone is brand new to Cloudflare or other services, they can start building right away with their agent, without extra steps.

If the user already has a Cloudflare account, they’re sent through a standard OAuth flow to grant access to the Stripe Projects CLI, allowing them to provision resources on their existing Cloudflare account.

Payment: give your agent a budget it can spend, without giving it your credit card info

You might rightly worry, “What if my agent goes a bit overboard and starts buying dozens of domains? Will I end up on the hook for a massive bill? Can I really trust my agent with my credit card?”

The protocol accounts for this in two ways. When an agent provisions a paid service, Stripe includes a payment token in the request to the Provider (Cloudflare). Raw payment details like credit card numbers aren’t ever shared with the agent. Stripe then sets a default limit of $100.00 USD/month as the maximum the agent can spend on any one provider. When you’re ready to raise this limit, you can then set Budget Alerts on your Cloudflare account.

Any platform with signed-in users can integrate with Cloudflare in the same way Stripe does

Any platform with signed-in users can act as the “Orchestrator”, playing the same role Stripe does with Stripe Projects, and integrate with Cloudflare.

Let’s say your product is a coding agent. You’d love for people to be able to take what they’ve built and get it deployed to production, using Cloudflare and other services. But the last thing you want is to send people down a maze of authorization flows and decision trees of where and how to deploy it. You just want to let people ship.

Your platform acts as the Orchestrator, with the already signed-in user. When your user needs a domain, a storage bucket, a sandbox to give their agent, or anything else, you make one API call to Cloudflare to provision a new Cloudflare account to them, and get back a token to make authenticated requests on their behalf.

Or let’s say you want Cloudflare customers to be able to easily provision your service, similar to how Cloudflare is partnering with Planetscale to make it possible to create Planetscale Postgres databases directly from Cloudflare. We started working with Planetscale on this well before this new protocol got off the ground, but the flow here is quite similar. Cloudflare acts as the Orchestrator, letting you connect to your PlanetScale account, create databases, and use the user’s existing payment method for billing.

This new protocol starts to standardize the types of cross-product integrations that many platforms have been doing for years, often in ways that were one off or bespoke to a particular platform. Without a standard, each integration required engineering work that often couldn’t be leveraged for future integrations. Similar to how the OAuth standard made it possible to delegate access to your account to other platforms, the protocol uses OAuth and extends further into payments and account creation, doing so in a way that treats agents as a first-class concern.

We’re excited to continue evolving the standard, and to work with Stripe on sharing a more official specification soon. We’re also excited to integrate with more platforms — email us at agenticpartnerships@cloudflare.com, and tell us how you want your platform to integrate with Cloudflare.

Give your agent the power to provision and pay

Stripe Projects is in open beta, and you can get started even if you don’t yet have a Cloudflare account. Just install the Stripe CLI, log in to Stripe, and then start a new project:

stripe projects init

Prompt your agent to build something new on Cloudflare, and show us what you’ve built!

Making Rust Workers reliable: panic and abort recovery in wasm‑bindgen

Guy Bedford — Wed, 22 Apr 2026 13:00:00 GMT

Rust Workers run on the Cloudflare Workers platform by compiling Rust to WebAssembly, but as we’ve found, WebAssembly has some sharp edges. When things go wrong with a panic or an unexpected abort, the runtime can be left in an undefined state. For users of Rust Workers, panics were historically fatal, poisoning the instance and possibly even bricking the Worker for a period of time.

While we were able to detect and mitigate these issues, there remained a small chance that a Rust Worker would unexpectedly fail and cause other requests to fail along with it. An unhandled Rust abort in a Worker affecting one request might escalate into a broader failure affecting sibling requests or even continue to affect new incoming requests. The root cause of this was in wasm-bindgen, the core project that generates the Rust-to-JavaScript bindings Rust Workers depend on, and its lack of built-in recovery semantics.

In this post, we’ll share how the latest version of Rust Workers handles comprehensive Wasm error recovery that solves this abort-induced sandbox poisoning. This work has been contributed back into wasm-bindgen as part of our collaboration within the wasm-bindgen organization formed last year. First with panic=unwind support, which ensures that a single failed request never poisons other requests, and then with abort recovery mechanisms that guarantee Rust code on Wasm can never re-execute after an abort.

Initial recovery mitigations

Our initial attempts to address reliability in this area focused on understanding and containing failures caused by Rust panics and aborts in production Rust Workers. We introduced a custom Rust panic handler that tracked failure state within a Worker and triggered full application reinitialization before handling subsequent requests. On the JavaScript side, this required wrapping the Rust-JavaScript call boundary using Proxy‑based indirection to ensure that all entrypoints were consistently encapsulated. We also made targeted modifications to the generated bindings to correctly reinitialize the WebAssembly module after a failure.

While this approach relied on custom JavaScript logic, it demonstrated that reliable recovery was achievable and eliminated the persistent failure modes we were seeing in practice. This solution was shipped by default to all workers‑rs users starting in version 0.6, and it laid the groundwork for the more general, upstreamed abort recovery mechanisms described in the sections that follow.

Implementing `panic=unwind` with WebAssembly Exception Handling

The abort recovery mechanisms described above ensure that a Worker can survive a failure, but they do so by reinitializing the entire application. For stateless request handlers, this is fine. But for workloads that hold meaningful state in memory, such as Durable Objects, reinitialization means losing that state entirely. A single panic in one request could wipe the in-memory state being used by other concurrent requests.

In most native Rust environments, panics can be unwound, allowing destructors to run and the program to recover without losing state. In WebAssembly, things historically looked very different. Rust compiled to Wasm via wasm32-unknown-unknown defaults to panic=abort, so a panic inside a Rust Worker would abruptly trap with an unreachable instruction and exit Wasm back to JS with a WebAssembly.RuntimeError.

To recover from panics without discarding instance state, we needed panic=unwind support for wasm32-unknown-unknown in wasm-bindgen, made possible by the WebAssembly Exception Handling proposal, which gained wide engine support in 2023.

We start by compiling with RUSTFLAGS='-Cpanic=unwind' cargo build -Zbuild-std, which rebuilds the standard library with unwind support and generates code with proper panic unwinding. For example:

struct HasDropA;
struct HasDropB;
extern "C" {
    fn imported_func();
}

fn some_func() {
    let a = HasDropA;
    let b = HasDropB;
    imported_func();
}

compiles to WebAssembly as:

try
  call 
catch_all
  call 
  call 
  rethrow
end
call 
call

This ensures that even if imported_func() panics, destructors still run. Similarly, std::panic::catch_unwind(|| some_func()) compiles into:

try
  call 
  ;; set result to Ok(return value)
catch
  try
    call 
    ;; set result to Err(panic payload)
  catch_all
    call 
    unreachable
  end
end

Getting this to work end-to-end required several changes to the wasm-bindgen toolchain. The WebAssembly parser Walrus did not know how to handle try/catch instructions, so we added support for them. The descriptor interpreter also needed to be taught how to evaluate code containing exception handling blocks. At that point, the full application could be built with panic=unwind.

The final step was modifying the exports generated by wasm-bindgen to catch panics at the Rust-JavaScript boundary and surface them as JavaScript PanicError exceptions. One subtlety: Rust will catch foreign exceptions and abort when unwinding through extern "C" functions, so exports needed to be marked extern "C-unwind" to explicitly allow unwinding across the boundary. For futures, a panic rejects the JavaScript Promise with a PanicError.

Closures required special attention to ensure unwind safety was properly checked, via a new MaybeUnwindSafe trait that checks UnwindSafe only when built with panic=unwind. This quickly exposed a problem, though: many closures capture references that remain after an unwind, making them inherently unwind-unsafe. To avoid a situation where users are encouraged to incorrectly wrap closures in AssertUnwindSafe just to satisfy the compiler, we added Closure::new_aborting variants, which terminate on panic instead of unwinding in cases where unwind safety can't be guaranteed.

With panic unwinding enabled:

Panics in exported Rust functions are caught by wasm-bindgen
Panics surface to JavaScript as PanicError exceptions
Async exports reject their returned promises with a PanicError
Rust destructors run correctly
The WebAssembly instance remains valid and reusable

The full details of the approach and how to use it in wasm-bindgen are covered in the latest guide page for Wasm Bindgen: Catching Panics.

Abort recovery

Even with panic=unwind support, aborts still happen - out-of-memory errors being one common cause. Because aborts can’t unwind, there is no possibility of state recovery at all, but we can at least detect and recover from aborts for future operations to avoid invalid state erroring subsequent requests.

Panic unwind support introduced a new problem for abort recovery. When we receive an error from Wasm we don’t know if it came from an extern “C-unwind” foreign error, or if it was a genuine abort. Aborts can take many shapes in WebAssembly.

We had two options to solve this technically: either mark all errors which are definitely aborts, or mark all errors which are definitely unwinds. Either could have worked but we chose the latter. Since our foreign exception handling was directly using raw WAT-level (WebAssembly text format) Exception Handling instructions already, we found it easier to implement exception tags for foreign exceptions to distinguish them from aborting non-unwind-safe exceptions.

With the ability to clearly distinguish between recoverable and non-recoverable errors thanks to this Exception.Tag feature in WebAssembly Exception Handling, we were able to then integrate both a new abort handler as well as abort reentrancy guards. A new abort hook, set_on_abort, can be used at initialization time to attach a handler that recovers accordingly for the platform embedding’s needs.

Hardening panic and abort handling is critical to avoiding invalid execution state. WebAssembly allows deeply interleaved call stacks, where Wasm can call into JavaScript and JavaScript can re-enter Wasm at arbitrary depths, while alongside this, multiple tasks can be functioning in the same instance. Previously, an abort occurring in one task or nested stack was not guaranteed to invalidate higher stacks through JS, leading to undefined behavior. Care was required to ensure we can guarantee the execution model, and contribution in this space remains ongoing.

While aborts are never ideal, and reinitialization on failure is an absolute worst-case scenario, implementing critical error recovery as the last line of defense ensures execution correctness and that future operations will be able to succeed. The invalid state does not persist, ensuring a single failure does not cascade into multiple failures.

Extension: abort reinitialization for wasm-bindgen libraries

While we were working on this, we realized that this is a common problem for libraries used by JS that are built with wasm-bindgen, and that they would also benefit from attaching an abort handler to be able to perform recovery.

But when building Wasm as an ES module and importing it directly (e.g. via import { func } from ‘wasm-dep’), it’s not clear what the recovery mechanism would be for a Wasm abort while calling func() for an already-linked and initialized library that is in a user JS application.

While not strictly a Rust Workers use case, our team also supports JS-based Workers users who run Rust-backed Wasm library dependencies. If we could fix this problem at the same time, that could indirectly also benefit Wasm usage on the Cloudflare Workers platform.

To support automatic abort recovery for Wasm library use cases, we added support for an experimental reinitialization mechanism into wasm‑bindgen, --reset-state-function. This exposes a function that allows the Rust application to effectively request that it reset its internal Wasm instance back to its initial state for the next call, without requiring consumers of the generated bindings to reimport or recreate them. Class instances from the old instance will throw as their handles become orphaned, but new classes can then be constructed. The JS application using a Wasm library is errored but not bricked.

The full technical details of this feature and how to use it in wasm-bindgen are covered in the new wasm-bindgen guide section Wasm Bindgen: Handling Aborts.

Maturing the Rust Wasm Exception Handling ecosystem

Upstream contributions for this work did not stop at the wasm-bindgen project. Building for Wasm with panic=unwind still requires an experimental nightly Rust target, so we’ve also been working to advance Rust’s Wasm support for WebAssembly Exception Handling to help bring this to stable Rust.

During the development of WebAssembly Exception Handling, a late‑stage specification change resulted in two variants: legacy exception handling and the final modern exception handling "with exnref". Today, Rust’s WebAssembly targets still default to emitting code for the legacy variant. While legacy exception handling is widely supported, it is now deprecated.

Modern WebAssembly Exception Handling is supported as of the following JS platform releases:

Runtime	Version	Release Date
v8	13.8.1	April 28, 2025
workerd	v1.20250620.0	June 19, 2025
Chrome	138	June 28, 2025
Firefox	131	October 1, 2024
Safari	18.4	March 31, 2025
Node.js	25.0.0	October 15, 2025

As we were investigating the support matrix, the largest concern ended up being the Node.js 24 LTS release schedule, which would have left the entire ecosystem stuck on legacy WebAssembly Exception Handling until April 2028.

Having discovered this discrepancy, we were able to backport modern exception handling to the Node.js 24 release, and even backport the fixes needed to make it work on the Node.js 22 release line to ensure support for this target. This should allow the modern Exception Handling proposal to become the default target next year.

Over the coming months, we’ll be working to make the transition to stable panic=unwind and modern Exception Handling as invisible as possible to end users.

While these long‑term investments in the ecosystem take time, they help build a stronger foundation for the Rust WebAssembly community as a whole, and we’re glad to be able to contribute to these improvements.

Using panic unwind in Rust Workers

As of version 0.8.0 of Rust Workers, we have a new --panic-unwind flag, which can be added to the build command, following the instructions here.

With this flag, panics can be fully recovered, and abort recovery will use the new abort classification and recovery hook mechanism. We highly recommend upgrading and trying it out for a more stable Rust Workers experience, and plan to make panic=unwind the default in a subsequent release. Users remaining on panic=abort will still continue to take advantage of the previous custom recovery wrapper handling from 0.6.0.

Committing to Rust Workers stability

This work is part of our ongoing effort towards a stable release for Rust Workers. By solving these sharp edges of the Wasm platform foundations at their root, and contributing back to the ecosystem where it makes sense, we build stronger foundations not just for our platform, but the entire Rust, JS, and Wasm ecosystem.

We have a number of future improvements planned for Rust Workers, and we’ll soon be sharing updates on this additional work, including wasm-bindgen generics and automated bindgen, which Guy Bedford from our team previewed in a talk on Rust & JS Interoperability at Wasm.io last month.

Find us in #rust‑on‑workers on the Cloudflare Discord. We also welcome feedback and discussion and especially all new contributors to the workers-rs and wasm-bindgen GitHub projects.

Building the agentic cloud: everything we launched during Agents Week 2026

Ming Lu — Mon, 20 Apr 2026 13:00:00 GMT

Today marks the end of our first Agents Week, an innovation week dedicated entirely to the age of agents. It couldn’t have been more timely: over the past year, agents have swiftly changed how people work. Coding agents are helping developers ship faster than ever. Support agents resolve tickets end-to-end. Research agents validate hypotheses across hundreds of sources in minutes. And people aren't just running one agent: they're running several in parallel and around the clock.

As Cloudflare's CTO Dane Knecht and VP of Product Rita Kozlov noted in our welcome to Agents Week post, the potential scale of agents is staggering: If even a fraction of the world's knowledge workers each run a few agents in parallel, you need compute capacity for tens of millions of simultaneous sessions. The one-app-serves-many-users model the cloud was built on doesn't work for that. But that's exactly what developers and businesses want to do: build agents, deploy them to users, and run them at scale.

Getting there means solving problems across the entire stack. Agents need compute that scales from full operating systems to lightweight isolates. They need security and identity built into how they run. They need an agent toolbox: the right models, tools, and context to do real work. All the code that agents generate needs a clear path from afternoon prototype to production app. And finally, as agents drive a growing share of Internet traffic, the web itself needs to adapt for the emerging agentic web. Turns out, the containerless, serverless compute platform we launched eight years ago with Workers was ready-made for this moment. Since then, we've grown it into a full platform, and this week we shipped the next wave of primitives purpose-built for agents, organized around exactly those problems.

We are here to create Cloud 2.0 — the agentic cloud. Infrastructure designed for a world where agents are a primary workload.

Here's a list of everything we announced this week — we wouldn’t want you to miss a thing.

Compute

It starts with compute. Agents need somewhere to run, and somewhere to store and run the code they write. Not all agents need the same thing: some need a full operating system to install packages and run terminal commands, most need something lightweight that starts in milliseconds and scales to millions. This week we shipped the environments to run them, as well as a new Git-compatible workspace for agents:

Announcement	Summary
Artifacts: Versioned storage that speaks Git	Give your agents, developers, and automations a home for code and data. We’ve just launched Artifacts: Git-compatible versioned storage built for agents. Create tens of millions of repos, fork from any remote, and hand off a URL to any Git client.
Agents have their own computers with Sandboxes GA	Cloudflare Sandboxes give AI agents a persistent, isolated environment: a real computer with a shell, a filesystem, and background processes that starts on demand and picks up exactly where it left off.
Dynamic, identity-aware, and secure: egress controls for Sandboxes	Outbound Workers for Sandboxes provide a programmable, zero-trust egress proxy for AI agents. This allows developers to inject credentials and enforce dynamic security policies without exposing sensitive tokens to untrusted code.
Durable Objects in Dynamic Workers: Give each AI-generated app its own database	Durable Object Facets allows Dynamic Workers to instantiate Durable Objects with their own isolated SQLite databases. This enables developers to build platforms that run persistent, stateful code generated on-the-fly.
Rearchitecting the Workflows control plane for the agentic era	Cloudflare Workflows, a durable execution engine for multi-step applications, now supports 50,000 concurrency and 300 creation rate limits through a rearchitectured control plane, helping scale to meet the use cases for durable background agents.

Security

Running agents and their code is only half the challenge. Agents connect to private networks, access internal services, and take autonomous actions on behalf of users. When anyone in an organization can spin up their own agents, security can't be an afterthought. It has to be the default. This week, we launched the tools to make that easy.

Announcement	Summary
Secure private networking for everyone: users, nodes, agents, Workers — introducing Cloudflare Mesh	Cloudflare Mesh provides secure, private network access for users, nodes, and autonomous AI agents. By integrating with Workers VPC, developers can now grant agents scoped access to private databases and APIs without manual tunnels.
Managed OAuth for Access: make internal apps agent-ready in one click	Managed OAuth for Cloudflare Access helps AI agents securely navigate internal applications. By adopting RFC 9728, agents can authenticate on behalf of users without using insecure service accounts.
Securing non-human identities: automated revocation, OAuth, and scoped permissions	Cloudflare is introducing scannable API tokens, enhanced OAuth visibility, and GA for resource-scoped permissions. These tools help developers implement a true least-privilege architecture while protecting against credential leakage.
Scaling MCP adoption: our reference architecture for enterprise MCP deployments	We share Cloudflare's internal strategy for governing MCP using Access, AI Gateway, and MCP server portals. We also launch Code Mode to slash token costs and recommend new rules for detecting Shadow MCP in Cloudflare Gateway.

Agent Toolbox

A capable agent needs to be able to think and remember, communicate, and see. This means being powered with the right models, with access to the right tools and the right context for their task at hand. This week we shipped the primitives — inference, search, memory, voice, email, and a browser — that turn an agent into something that actually gets work done.

Announcement	Summary
Project Think: building the next generation of AI agents on Cloudflare	Announcing a preview of the next edition of the Agents SDK — from lightweight primitives to a batteries-included platform for AI agents that think, act, and persist.
Add voice to your agent	An experimental voice pipeline for the Agents SDK enables real-time voice interactions over WebSockets. Developers can now build agents with continuous STT and TTS in just ~30 lines of server-side code.
Cloudflare Email Service: now in public beta. Ready for your agents	Agents are becoming multi-channel. That means making them available wherever your users already are — including the inbox. Cloudflare Email Service enters public beta with the infrastructure layer to make that easy: send, receive, and process email natively from your agents.
Cloudflare's AI platform: an inference layer designed for agents	We're building Cloudflare into a unified inference layer for agents, letting developers call models from 14+ providers. New features include Workers binding for running third-party models and an expanded catalog with multimodal models.
Building the foundation for running extra-large language models	We built a custom technology stack to run fast large language models on Cloudflare’s infrastructure. This post explores the engineering trade-offs and technical optimizations required to make high-performance AI inference accessible.
Unweight: how we compressed an LLM 22% without sacrificing quality	Running large LLMs across Cloudflare’s network requires us to be smarter and more efficient about GPU memory bandwidth. That’s why we developed Unweight, a lossless inference-time compression system that achieves up to a 22% model footprint reduction, so that we can deliver faster and cheaper inference than ever before.
Agents that remember: introducing Agent Memory	Cloudflare Agent Memory is a managed service that gives AI agents persistent memory, allowing them to recall what matters, forget what doesn't, and get smarter over time.
AI Search: the search primitive for your agents	AI Search is the search primitive for your agents. Create instances dynamically, upload files, and search across instances with hybrid retrieval and relevance boosting. Just create a search instance, upload, and search.
Browser Run: give your agents a browser	Browser Rendering is now Browser Run, with Live View, Human in the Loop, CDP access, session recordings, and 4x higher concurrency limits for AI agents.

Prototype to production

The best infrastructure is also one that’s easy to use. We want to meet developers and their agents where they’re already working: in the terminal, in the editor, in a prompt, and make the full Cloudflare platform accessible without context-switching.

Announcement	Summary
Building a CLI for all of Cloudflare	We’re introducing cf, a new unified CLI designed for consistency across the Cloudflare platform, alongside Local Explorer for debugging local data. These tools simplify how developers and AI agents interact with our nearly 3,000 API operations.
Introducing Agent Lee - a new interface to the Cloudflare stack	Agent Lee is an in-dashboard agent that shifts Cloudflare’s interface from manual tab-switching to a single prompt. Using sandboxed TypeScript, it helps you troubleshoot and manage your stack as a grounded technical collaborator.
Introducing Flagship: feature flags built for the age of AI	Introducing Flagship, a native feature flag service built on Cloudflare’s global network to eliminate the latency of third-party providers. By using KV and Durable Objects, Flagship allows for sub-millisecond flag evaluation.
Deploy Postgres and MySQL databases with PlanetScale + Workers	Learn how to deploy PlanetScale Postgres and MySQL databases via Cloudflare and connect Cloudflare Workers.
Register domains wherever you build: Cloudflare Registrar API now in beta	The Cloudflare Registrar API is now in beta. Developers and AI agents can search, check availability, and register domains at cost directly from their editor, their terminal, or their agent — without leaving their workflow.

Agentic Web

As more agents come online, they're still browsing an Internet that was built for people. Existing websites need new tools to control what bots can access their content, package and present it for agents, and measure how ready they are for this shift.

Announcement	Summary
Introducing the Agent Readiness score. Is your site agent-ready?	The Agent Readiness score can help site owners understand how well their websites support AI agents. Here we explore new standards, share Radar data, and detail how we made Cloudflare’s docs the most agent-friendly on the web.
Redirects for AI Training enforces canonical content	Soft directives don’t stop crawlers from ingesting deprecated content. Redirects for AI Training allows anybody on Cloudflare to redirect verified crawlers to canonical pages with one toggle and no origin changes.
Agents Week: Network performance update	By migrating our request handling layer to a Rust-based architecture called FL2, Cloudflare has increased its performance lead to 60% of the world’s top networks. We use real-user measurements and TCP connection trimeans to ensure our data reflects the actual experience of people on the Internet
Shared dictionary compression that keeps up with the agentic web	We give you a sneak peek of our support for shared compression dictionaries, show you how it improves page load times, and reveal when you’ll be able to try the beta yourself.

That’s a wrap

Agents Week 2026 is ending, but the agentic cloud is just getting started. Everything we shipped this week — from compute and security to the agent toolbox and the agentic web — is the foundation. We're going to keep building on it to give you everything you need to build what's next.

We also have more blog posts coming out today and tomorrow to continue the story, so keep an eye out for the latest at our blog.

If you're building on any of what we announced this week, we want to hear about it. Come find us on X or Discord, or head to the developer documentation.

The AI engineering stack we built internally — on the platform we ship

Ayush Thakur — Mon, 20 Apr 2026 13:00:00 GMT

In the last 30 days, 93% of Cloudflare’s R&D organization used AI coding tools powered by infrastructure we built on our own platform.

Eleven months ago, we undertook a major project: to truly integrate AI into our engineering stack. We needed to build the internal MCP servers, access layer, and AI tooling necessary for agents to be useful at Cloudflare. We pulled together engineers from across the company to form a tiger team called iMARS (Internal MCP Agent/Server Rollout Squad). The sustained work landed with the Dev Productivity team, who also own much of our internal tooling including CI/CD, build systems, and automation.

Here are some numbers that capture our own agentic AI use over the last 30 days:

3,683 internal users actively using AI coding tools (60% company-wide, 93% across R&D), out of approximately 6,100 total employees
47.95 million AI requests
295 teams are currently utilizing agentic AI tools and coding assistants.
20.18 million AI Gateway requests per month
241.37 billion tokens routed through AI Gateway
51.83 billion tokens processed on Workers AI

The impact on developer velocity internally is clear: we’ve never seen a quarter-to-quarter increase in merge requests to this degree.

As AI tooling adoption has grown the 4-week rolling average has climbed from ~5,600/week to over 8,700. The week of March 23 hit 10,952, nearly double the Q4 baseline.

MCP servers were the starting point, but the team quickly realized we needed to go further: rethink how standards are codified, how code gets reviewed, how engineers onboard, and how changes propagate across thousands of repos.

This post dives deep into what that looked like over the past eleven months and where we ended up. We're publishing now, to close out Agents Week, because the AI engineering stack we built internally runs on the same products we’re shipping and enhancing this week.

The architecture at a glance

The engineer-facing tools layer (OpenCode, Windsurf, and other MCP-compatible clients) include both open-source and third-party coding assistant tools.

Each layer maps to a Cloudflare product or tool we use:

What we built	Built with
Zero Trust authentication	Cloudflare Access
Centralized LLM routing, cost tracking, BYOK, and Zero Data Retention controls	AI Gateway
On-platform inference with open-weight models	Workers AI
MCP Server Portal with single OAuth	Workers + Access
AI Code Reviewer CI integration	Workers + AI Gateway
Sandboxed execution for agent-generated code (Code Mode)	Dynamic Workers
Stateful, long-running agent sessions	Agents SDK (McpAgent, Durable Objects)
Isolated environments for cloning, building, and testing	Sandbox SDK — GA as of Agents Week
Durable multi-step workflows	Workflows — scaled 10x during Agents Week
16K+ entity knowledge graph	Backstage (OSS)

None of this is internal-only infrastructure. Everything (besides Backstage) listed above is a shipping product, and many of them got substantial updates during Agents Week.

We’ll walk through this in three acts:

The platform layer — how authentication, routing, and inference work (AI Gateway, Workers AI, MCP Portal, Code Mode)
The knowledge layer — how agents understand our systems (Backstage, AGENTS.md)
The enforcement layer — how we keep quality high at scale (AI Code Reviewer, Engineering Codex)

Act 1: The platform layer

How AI Gateway helped us stay secure and improve the developer experience

When you have over 3,600+ internal users using AI coding tools daily, you need to solve for access and visibility across many clients, use cases, and roles.

Everything starts with Cloudflare Access, which handles all authentication and zero-trust policy enforcement. Once authenticated, every LLM request routes through AI Gateway. This gives us a single place to manage provider keys, cost tracking, and data retention policies.

^{The OpenCode AI Gateway overview: 688.46k requests per day, 10.57B tokens per day, routing to four providers through one endpoint.}

AI Gateway analytics show how monthly usage is distributed across model providers. Over the last month, internal request volume broke down as follows.

Provider	Requests/month	Share
Frontier Labs (OpenAI, Anthropic, Google)	13.38M	91.16%
Workers AI	1.3M	8.84%

Frontier models handle the bulk of complex agentic coding work for now, but Workers AI is already a significant part of the mix and handles an increasing share of our agentic engineering workloads.

How we increasingly leverage Workers AI

Workers AI is Cloudflare's serverless AI inference platform which runs open-source models on GPUs across our global network. Beyond huge cost improvements compared to frontier models, a key advantage is that inference stays on the same network as your Workers, Durable Objects, and storage. No cross-cloud hops to deal with, which cause more latency, network flakiness, and additional networking configuration to manage.

^{Workers AI usage in the last month: 51.47B input tokens, 361.12M output tokens.}

Kimi K2.5, launched on Workers AI in March 2026, is a frontier-scale open-source model with a 256k context window, tool calling, and structured outputs. As we described in our Kimi K2.5 launch post, we have a security agent that processes over 7 billion tokens per day on Kimi. That would cost an estimated $2.4M per year on a mid-tier proprietary model. But on Workers AI, it's 77% cheaper.

Beyond security, we use Workers AI for documentation review in our CI pipeline, for generating AGENTS.md context files across thousands of repositories, and for lightweight inference tasks where same-network latency matters more than peak model capability.

As open-source models continue to improve, we expect Workers AI to handle a growing share of our internal workloads.

One thing we got right early: routing through a single proxy Worker from day one. We could have had clients connect directly to AI Gateway, which would have been simpler to set up initially. But centralizing through a Worker meant we could add per-user attribution, model catalog management, and permission enforcement later without touching any client configs. Every feature described in the bootstrap section below exists because we had that single choke point. The proxy pattern gives you a control plane that direct connections don't, and if we plug in additional coding assistant tools later, the same Worker and discovery endpoint will handle them.

How it works: one URL to configure everything

The entire setup starts with one command:

opencode auth login https://opencode.internal.domain

That command triggers a chain that configures providers, models, MCP servers, agents, commands, and permissions, without the user touching a config file.

Step 1: Discover auth requirements. OpenCode fetches config from a URL like https://opencode.internal.domain/.well-known/opencode.

This discovery endpoint is served by a Worker and the response has an auth block telling OpenCode how to authenticate, along with a config block with providers, MCP servers, agents, commands, and default permissions:

{
  "auth": {
    "command": ["cloudflared", "access", "login", "..."],
    "env": "TOKEN"
  },
  "config": {
    "provider": { "..." },
    "mcp": { "..." },
    "agent": { "..." },
    "command": { "..." },
    "permission": { "..." }
  }
}

Step 2: Authenticate via Cloudflare Access. OpenCode runs the auth command and the user authenticates through the same SSO they use for everything else at Cloudflare. cloudflared returns a signed JWT. OpenCode stores it locally and automatically attaches it to every subsequent provider request.

Step 3: Config is merged into OpenCode. The config provided is shared defaults for the entire organization, but local configs always take priority. Users can override the default model, add their own agents, or adjust project and user scoped permissions without affecting anyone else.

Inside the proxy Worker. The Worker is a simple Hono app that does three things:

Serves the shared config. The config is compiled at deploy time from structured source files and contains placeholder values like {baseURL} for the Worker's origin. At request time, the Worker replaces these, so all provider requests route through the Worker rather than directly to model providers. Each provider gets a path prefix (/anthropic, /openai, /google-ai-studio/v1beta, /compat for Workers AI) that the Worker forwards to the corresponding AI Gateway route.
Proxies requests to AI Gateway. When OpenCode sends a request like POST /anthropic/v1/messages, the Worker validates the Cloudflare Access JWT, then rewrites headers before forwarding:
```
Stripped:   authorization, cf-access-token, host
Added:      cf-aig-authorization: Bearer 
            cf-aig-metadata: {"userId": ""}
```
The request goes to AI Gateway, which routes it to the appropriate provider. The response passes straight through with zero buffering. The apiKey field in the client config is empty because the Worker injects the real key server-side. No API keys exist on user machines.
Keeps the model catalog fresh. An hourly cron trigger fetches the current OpenAI model list from models.dev, caches it in Workers KV, and injects store: false on every model for Zero Data Retention. New models get ZDR automatically without a config redeploy.

Anonymous user tracking. After JWT validation, the Worker maps the user's email to a UUID using D1 for persistent storage and KV as a read cache. AI Gateway only ever sees the anonymous UUID in cf-aig-metadata, never the email. This gives us per-user cost tracking and usage analytics without exposing identities to model providers or Gateway logs.

Config-as-code. Agents and commands are authored as markdown files with YAML frontmatter. A build script compiles them into a single JSON config validated against the OpenCode JSON schema. Every new session picks up the latest version automatically.

The overall architecture is simple and easy for anyone to deploy with our developer platform: a proxy Worker, Cloudflare Access, AI Gateway, and a client-accessible discovery endpoint that configures everything automatically. Users run one command and they're done. There’s nothing for them to configure manually, no API keys on laptops or MCP server connections to manually set up. Making changes to our agentic tools and updating what 3,000+ people get in their coding environment is just a wrangler deploy away.

The MCP Server Portal: one OAuth, multiple MCP tools

We described our full approach to governing MCP at enterprise scale in a separate post, including how we use MCP Server Portals, Cloudflare Access, and Code Mode together. Here's the short version of what we built internally.

Our internal portal aggregates 13 production MCP servers exposing 182+ tools across Backstage, GitLab, Jira, Sentry, Elasticsearch, Prometheus, Google Workspace, our internal Release Manager, and more. This unifies access and simplifies everything giving us one endpoint and one Cloudflare Access flow governing access to every tool.

Each MCP server is built on the same foundation: McpAgent from the Agents SDK, workers-oauth-provider for OAuth, and Cloudflare Access for identity. The whole thing lives in a single monorepo with shared auth infrastructure, Bazel builds, CI/CD pipelines, and catalog-info.yaml for Backstage registration. Adding a new server is mostly copying an existing one and changing the API it wraps. For more on how this works and the security architecture behind it, see our enterprise MCP reference architecture.

Code Mode at the portal layer

MCP is the right protocol for connecting AI agents to tools, but it has a practical problem: every tool definition consumes context window tokens before the model even starts working. As the number of MCP servers and tools grows, so does the token overhead, and at scale, this becomes a real cost. Code Mode is the emerging fix: instead of loading every tool schema up front, the model discovers and calls tools through code.

Our GitLab MCP server originally exposed 34 individual tools (get_merge_request, list_pipelines, get_file_content, and so on). Those 34 tool schemas consumed roughly 15,000 tokens of context window per request. On a 200K context window, that's 7.5% of the budget gone before asking a question. Multiplied across every request, every engineer, every day, it adds up.

MCP Server Portals now support Code Mode proxying, which lets us solve that problem centrally instead of one server at a time. Rather than exposing every upstream tool definition to the client, the portal collapses them into two portal-level tools: portal_codemode_search and portal_codemode_execute.

The nice thing about doing this at the portal layer is that it scales cleanly. Without Code Mode, every new MCP server adds more schema overhead to every request. With portal-level Code Mode, the client still only sees two tools even as we connect more servers behind the portal. That means less context bloat, lower token cost, and a cleaner architecture overall.

Act 2: The knowledge layer

Backstage: the knowledge graph underneath all of it

Before the iMARS team could build MCP servers that were actually useful, we needed to solve a more fundamental problem: structured data about our services and infrastructure. We need our agents to understand context outside the code base, like who owns what, how services depend on each other, where the documentation lives, and what databases a service talks to.

We run Backstage, the open-source internal developer portal originally built by Spotify, as our service catalog. It's self-hosted (not on Cloudflare products, for the record) and it tracks things like:

2,055 services, 167 libraries, and 122 packages
228 APIs with schema definitions
544 systems (products) across 45 domains
1,302 databases, 277 ClickHouse tables, 173 clusters
375 teams and 6,389 users with ownership mappings
Dependency graphs connecting services to the databases, Kafka topics, and cloud resources they rely on

Our Backstage MCP server (13 tools) is available through our MCP Portal, and an agent can look up who owns a service, check what it depends on, find related API specs, and pull Tech Insights scores, all without leaving the coding session.

Without this structured data, agents are working blind. They can read the code in front of them, but they can't see the system around it. The catalog turns individual repos into a connected map of the engineering organization.

AGENTS.md: getting thousands of repos ready for AI

Early in the rollout, we kept seeing the same failure mode: coding agents produced changes that looked plausible and were still wrong for the repo. Usually the problem was local context: the model didn't know the right test command, the team's current conventions, or which parts of the codebase were off-limits. That pushed us toward AGENTS.md: a short, structured file in each repo that tells coding agents how the codebase actually works and forces teams to make that context explicit.

What AGENTS.md looks like

We built a system that generates AGENTS.md files across our GitLab instance. Because these files sit directly in the model's context window, we wanted them to stay short and high-signal. A typical file looks like this:

# AGENTS.md

## Repository
- Runtime: cloudflare workers
- Test command: `pnpm test`
- Lint command: `pnpm lint`

## How to navigate this codebase
- All cloudflare workers  are in src/workers/, one file per worker
- MCP server definitions are in src/mcp/, each tool in a separate file
- Tests mirror source: src/foo.ts -> tests/foo.test.ts

## Conventions
- Testing: use Vitest with `@cloudflare/vitest-pool-workers` (Codex: RFC 021, RFC 042)
- API patterns: Follow internal REST conventions (Codex: API-REST-01)

## Boundaries
- Do not edit generated files in `gen/`
- Do not introduce new background jobs without updating `config/`

## Dependencies
- Depends on: auth-service, config-service
- Depended on by: api-gateway, dashboard

When an agent reads this file, it doesn't have to infer the repo from scratch. It knows how the codebase is organized, which conventions to follow and which Engineering Codex rules apply.

How we generate them at scale

The generator pipeline pulls entity metadata from our Backstage service catalog (ownership, dependencies, system relationships), analyzes the repository structure to detect the language, build system, test framework, and directory layout, then maps the detected stack to relevant Engineering Codex standards. A capable model then generates the structured document, and the system opens a merge request so the owning team can review and refine it.

We've processed roughly 3,900 repositories this way. The first pass wasn't always perfect, especially for polyglot repos or unusual build setups, but even that baseline was much better than asking agents to infer everything from scratch.

The initial merge request solved the bootstrap problem, but keeping these files current mattered just as much. A stale AGENTS.md can be worse than no file at all. We closed that loop with the AI Code Reviewer, which can flag when repository changes suggest that AGENTS.md should be updated.

Act 3: The enforcement layer

The AI Code Reviewer

Every merge request at Cloudflare gets an AI code review. Integration is straightforward: teams add a single CI component to their pipeline, and from that point every MR is reviewed automatically.

We use GitLab's self-hosted solution as our CI/CD platform. The reviewer is implemented as a GitLab CI component that teams include in their pipeline. When an MR is opened or updated, the CI job runs OpenCode with a multi-agent review coordinator. The coordinator classifies the MR by risk tier (trivial, lite, or full) and delegates to specialized review agents: code quality, security, codex compliance, documentation, performance, and release impact. Each agent connects to the AI Gateway for model access, pulls Engineering Codex rules from a central repo, and reads the repository's AGENTS.md for codebase context. Results are posted back as structured MR comments.

A separate Workers-based config service handles centralized model selection per reviewer agent, so we can shift models without changing the CI template. The review process itself runs in the CI runner and is stateless per execution.

The output format

We spent time getting the output format right. Reviews are broken into categories (Security, Code Quality, Performance) so engineers can scan headers rather than reading walls of text. Each finding has a severity level (Critical, Important, Suggestion, or Optional Nits) that makes it immediately clear what needs attention versus what's informational.

The reviewer maintains context across iterations. If it flagged something in a previous review round that has since been fixed, it acknowledges that rather than re-raising the same issue. And when a finding maps to an Engineering Codex rule, it cites the specific rule ID, turning an AI suggestion into a reference to an organizational standard.

Workers AI handles about 15% of the reviewer's traffic, primarily for documentation review tasks where Kimi K2.5 performs well at a fraction of the cost of frontier models. Models like Opus 4.6 and GPT 5.4 handle security-sensitive and architecturally complex reviews where reasoning capability matters most.

Over the last 30 days:

100% AI code reviewer coverage across all repos on our standard CI pipeline.
5.47M AI Gateway requests
24.77B tokens processed

We're releasing a detailed technical blog post alongside this one that covers the reviewer's internal architecture, including how we route between models, the multi-agent orchestration, and the cost optimization strategies we've developed.

Engineering Codex: engineering standards as agent skills

The Engineering Codex is Cloudflare's new internal standards system where our core engineering standards live. We have a multi-stage AI distillation process, which outputs a set of codex rules ("If you need X, use Y. You must do X, if you are doing Y or Z.") along with an agent skill that uses progressive disclosure and nested hierarchical information directories and links across markdown files.

This skill is available for engineers to use locally as they build with prompts like “how should I handle errors in my Rust service?” or “review this TypeScript code for compliance.” Our Network Firewall team audited rampartd using a multi-agent consensus process where every requirement was scored COMPLIANT, PARTIAL, or NON-COMPLIANT with specific violation details and remediation steps reducing what previously required weeks of manual work to a structured, repeatable process.

At review time, the AI Code Reviewer cites specific Codex rules in its feedback.

^{AI Code Review: showing categorized findings (Codex Compliance in this case) noting the codex RFC violation.}

None of these pieces are especially novel on their own. Plenty of companies run service catalogs, ship reviewer bots, or publish engineering standards. The difference is the wiring. When an agent can pull context from Backstage, read AGENTS.md for the repo it’s editing, and get reviewed against Codex rules by the same toolchain, the first draft is usually close enough to ship. That wasn’t true six months ago.

The scoreboard

From launching this effort to 93% R&D adoption took less than a year.

Company-wide adoption (Feb 5 – April 15, 2026):

Metric	Value
Active users	3,683 (60% of the company)
R&D team adoption	93%
AI messages	47.95M
Teams with AI activity	295
OpenCode messages	27.08M
Windsurf messages	434.9K

AI Gateway (last 30 days, combined):

Metric	Value
Requests	20.18M
Tokens	241.37B

Workers AI (last 30 days):

Metric	Value
Input tokens	51.47B
Output tokens	361.12M

What's next: background agents

The next evolution in our internal engineering stack will include background agents: agents that can be spun up on demand with the same tools available locally (MCP portal, git, test runners) but running entirely in the cloud. The architecture uses Durable Objects and the Agents SDK for orchestration, delegating to Sandbox containers when the job requires a full development environment like cloning a repo, installing dependencies, or running tests. The Sandbox SDK went GA during Agents Week.

Long-running agents, shipped natively into the Agents SDK during Agents Week, solve the durable session problem that previously required workarounds. The SDK now supports sessions that run for extended periods without eviction, enough for an agent to clone a large repo, run a full test suite, iterate on failures, and open a MR in a single session.

This represents an eleven-month effort to rethink not just how code gets written, but how it gets reviewed, how standards are enforced, and how changes ship safely across thousands of repos. Every layer runs on the same products our customers use.

Start building

Agents Week just shipped everything you need. The platform is here.

npx create-cloudflare@latest --template cloudflare/agents-starter

That agents starter gets you running. The diagram below is the full architecture for when you’re ready to grow it, your tools layer on top (chatbot, web UI, CLI, browser extension), the Agents SDK handling session state and orchestration in the middle, and the Cloudflare services you call from it underneath.

Docs: Agents SDK · Sandbox SDK · AI Gateway · Workers AI · Workflows · Code Mode · MCP on Cloudflare

Repos: cloudflare/agents · cloudflare/sandbox-sdk · cloudflare/mcp-server-cloudflare · cloudflare/skills

For more on how we’re using AI at Cloudflare, read the post on our process for AI Code Review. And check out everything we shipped during Agents Week.

We’d love to hear what you build. Find us on Discord, X, and Bluesky.

^{Ayush Thakur built the AGENTS.md system and the AI Gateway integration for the OpenCode infrastructure, Scott Roemeschke is the Engineering Manager of the Developer Productivity team at Cloudflare, Rajesh Bhatia leads the Productivity Platform function at Cloudflare. This post was a collaborative effort across the Devtools team, with help from volunteers across the company through the iMARS (Internal MCP Agent/Server Rollout Squad) tiger team.}

Orchestrating AI Code Review at scale

Ryan Skidmore — Mon, 20 Apr 2026 13:00:00 GMT

Code review is a fantastic mechanism for catching bugs and sharing knowledge, but it is also one of the most reliable ways to bottleneck an engineering team. A merge request sits in a queue, a reviewer eventually context-switches to read the diff, they leave a handful of nitpicks about variable naming, the author responds, and the cycle repeats. Across our internal projects, the median wait time for a first review was often measured in hours.

When we first started experimenting with AI code review, we took the path that most other people probably take: we tried out a few different AI code review tools and found that a lot of these tools worked pretty well, and a lot of them even offered a good amount of customisation and configurability! Unfortunately, though, the one recurring theme that kept coming up was that they just didn’t offer enough flexibility and customisation for an organisation the size of Cloudflare.

So, we jumped to the next most obvious path, which was to grab a git diff, shove it into a half-baked prompt, and ask a large language model to find bugs. The results were exactly as noisy as you might expect, with a flood of vague suggestions, hallucinated syntax errors, and helpful advice to "consider adding error handling" on functions that already had it. We realised pretty quickly that a naive summarisation approach wasn't going to give us the results we wanted, especially on complex codebases.

Instead of building a monolithic code review agent from scratch, we decided to build a CI-native orchestration system around OpenCode, an open-source coding agent. Today, when an engineer at Cloudflare opens a merge request, it gets an initial pass from a coordinated smörgåsbord of AI agents. Rather than relying on one model with a massive, generic prompt, we launch up to seven specialised reviewers covering security, performance, code quality, documentation, release management, and compliance with our internal Engineering Codex. These specialists are managed by a coordinator agent that deduplicates their findings, judges the actual severity of the issues, and posts a single structured review comment.

We've been running this system internally across tens of thousands of merge requests. It approves clean code, flags real bugs with impressive accuracy, and actively blocks merges when it finds genuine, serious problems or security vulnerabilities. This is just one of the many ways we’re improving our engineering resiliency as part of Code Orange: Fail Small.

This post is a deep dive into how we built it, the architecture we landed on, and the specific engineering problems you run into when you try to put LLMs in the critical path of your CI/CD pipeline, and more critically, in the way of engineers trying to ship code.

The architecture: plugins all the way to the moon

When you are building internal tooling that has to run across thousands of repositories, hardcoding your version control system or your AI provider is a great way to ensure you'll be rewriting the whole thing in six months. We needed to support GitLab today and who knows what tomorrow, alongside different AI providers and different internal standards requirements, without any component needing to know about the others.

We built the system on a composable plugin architecture where the entry point delegates all configuration to plugins that compose together to define how a review runs. Here is what the execution flow looks like when a merge request triggers a review:

Each plugin implements a ReviewPlugin interface with three lifecycle phases. Bootstrap hooks run concurrently and are non-fatal, meaning if a template fetch fails, the review just continues without it. Configure hooks run sequentially and are fatal, because if the VCS provider can't connect to GitLab, there is no point in continuing the job. Finally, postConfigure runs after the configuration is assembled to handle asynchronous work like fetching remote model overrides.

The ConfigureContext gives plugins a controlled surface to affect the review. They can register agents, add AI providers, set environment variables, inject prompt sections, and alter fine-grained agent permissions. No plugin has direct access to the final configuration object. They contribute through the context API, and the core assembler merges everything into the opencode.json file that OpenCode consumes.

Because of this isolation, the GitLab plugin doesn't read Cloudflare AI Gateway configurations, and the Cloudflare plugin doesn't know anything about GitLab API tokens. All VCS-specific coupling is isolated in a single ci-config.ts file.

Here is the plugin roster for a typical internal review:

Plugin	Responsibility
`@opencode-reviewer/gitlab`	GitLab VCS provider, MR data, MCP comment server
`@opencode-reviewer/cloudflare`	AI Gateway configuration, model tiers, failback chains
`@opencode-reviewer/codex`	Internal compliance checking against engineering RFCs
`@opencode-reviewer/braintrust`	Distributed tracing and observability
`@opencode-reviewer/agents-md`	Verifies the repo's AGENTS.md is up to date
`@opencode-reviewer/reviewer-config`	Remote per-reviewer model overrides from a Cloudflare Worker
`@opencode-reviewer/telemetry`	Fire-and-forget review tracking

How we use OpenCode under the hood

We picked OpenCode as our coding agent of choice for a couple of reasons:

We use it extensively internally, meaning we were already very familiar with how it worked
It’s open source, so we can contribute features and bug fixes upstream as well as investigate issues really easily when we spot them (at the time of writing, Cloudflare engineers have landed over 45 pull requests upstream!)
It has a great open source SDK, allowing us to easily build plugins that work flawlessly

But most importantly, because it is structured as a server first, with its text-based user interface and desktop app acting as clients on top. This was a hard requirement for us because we needed to create sessions programmatically, send prompts via an SDK, and collect results from multiple concurrent sessions without hacking around a CLI interface.

The orchestration works in two distinct layers:

The Coordinator Process: We spawn OpenCode as a child process using Bun.spawn. We pass the coordinator prompt via stdin rather than as a command-line argument, because if you have ever tried to pass a massive merge request description full of logs as a command-line argument, you have probably met the Linux kernel's ARG_MAX limit. We learned this pretty quickly when E2BIG errors started showing up on a small percentage of our CI jobs for incredibly large merge requests. The process runs with --format json, so all output arrives as JSONL events on stdout:

const proc = Bun.spawn(
  ["bun", opencodeScript, "--print-logs", "--log-level", logLevel,
   "--format", "json", "--agent", "review_coordinator", "run"],
  {
    stdin: Buffer.from(prompt),
    env: {
      ...sanitizeEnvForChildProcess(process.env),
      OPENCODE_CONFIG: process.env.OPENCODE_CONFIG_PATH ?? "",
      BUN_JSC_gcMaxHeapSize: "2684354560", // 2.5 GB heap cap
    },
    stdout: "pipe",
    stderr: "pipe",
  },
);

The Review Plugin: Inside the OpenCode process, a runtime plugin provides the spawn_reviewers tool. When the coordinator LLM decides it is time to review the code, it calls this tool, which launches the sub-reviewer sessions through OpenCode's SDK client:

const createResult = await this.client.session.create({
  body: { parentID: input.parentSessionID },
  query: { directory: dir },
});

// Send the prompt asynchronously (non-blocking)
this.client.session.promptAsync({
  path: { id: task.sessionID },
  body: {
    parts: [{ type: "text", text: promptText }],
    agent: input.agent,
    model: { providerID, modelID },
  },
});

Each sub-reviewer runs in its own OpenCode session with its own agent prompt. The coordinator doesn't see or control what tools the sub-reviewers use. They are free to read source files, run grep, or search the codebase as they see fit, and they simply return their findings as structured XML when they finish.

What’s JSONL, and what do we use it for?

One of the big challenges that you typically face when working with systems like this is the need for structured logging, and while JSON is a fantastic-structured format, it requires everything to be “closed out” to be a valid JSON blob. This is especially problematic if your application exits early before it has a chance to close everything out and write a valid JSON blob to disk — and this is often when you need the debug logs most.

This is why we use JSONL (JSON Lines), which does exactly what it says in the tin: it’s a text format where every line is a valid, self-contained JSON object. Unlike a standard JSON array, you don't have to parse the whole document to read the first entry. You read a line, parse it, and move on. This means you don’t have to worry about buffering massive payloads into memory, or hoping for a closing ] that may never arrive because the child process ran out of memory.

In practice, it looks like this:

Stripped:   authorization, cf-access-token, host
Added:      cf-aig-authorization: Bearer 
            cf-aig-metadata: {"userId": ""}

Every CI system that needs to parse structured output from a long-running process eventually lands on something like JSONL — but we didn’t want to reinvent the wheel. (And OpenCode already supports it!)

The streaming pipeline

We process the coordinator's output in real-time, though we buffer and flush every 100 lines (or 50ms) to save our disks from a slow but painful appendFileSync death.

We watch for specific triggers as the stream flows in and pull out relevant data, like token usage out of step_finish events to track costs, and we use error events to kick off our retry logic. We also make sure to keep an eye out for output truncation — if a step_finish arrives with reason: "length", we know the model hit its max_tokens limit and got cut off mid-sentence, so we should automatically retry.

One of the operational headaches we didn’t predict was that large, advanced models like Claude Opus 4.7 or GPT-5.4 can sometimes spend quite a while thinking through a problem, and to our users this can make it look exactly like a hung job. We found that users would frequently cancel jobs and complain that the reviewer wasn’t working as intended, when in reality it was working away in the background. To counter this, we added an extremely simple heartbeat log that prints "Model is thinking... (Ns since last output)" every 30 seconds which almost entirely eliminated the problem.

Specialised agents instead of one big prompt

Instead of asking one model to review everything, we split the review into domain-specific agents. Each agent has a tightly scoped prompt telling it exactly what to look for, and more importantly, what to ignore.

The security reviewer, for example, has explicit instructions to only flag issues that are "exploitable or concretely dangerous":

## What to Flag
- Injection vulnerabilities (SQL, XSS, command, path traversal)
- Authentication/authorisation bypasses in changed code
- Hardcoded secrets, credentials, or API keys
- Insecure cryptographic usage
- Missing input validation on untrusted data at trust boundaries

## What NOT to Flag
- Theoretical risks that require unlikely preconditions
- Defense-in-depth suggestions when primary defenses are adequate
- Issues in unchanged code that this MR doesn't affect
- "Consider using library X" style suggestions

It turns out that telling an LLM what not to do is where the actual prompt engineering value resides. Without these boundaries, you get a firehose of speculative theoretical warnings that developers will immediately learn to ignore.

Every reviewer produces findings in a structured XML format with a severity classification: critical (will cause an outage or is exploitable), warning (measurable regression or concrete risk), or suggestion (an improvement worth considering). This ensures we are dealing with structured data that drives downstream behavior, rather than parsing advisory text.

The models we use

Because we split the review into specialised domains, we don't need to use a super expensive, highly capable model for every task. We assign models based on the complexity of the agent's job:

Top-tier: Claude Opus 4.7 and GPT-5.4: Reserved exclusively for the Review Coordinator. The coordinator has the hardest job — reading the output of seven other models, deduplicating findings, filtering out false positives, and making a final judgment call. It needs the highest reasoning capability available.
Standard-tier: Claude Sonnet 4.6 and GPT-5.3 Codex: The workhorse for our heavy-lifting sub-reviewers (Code Quality, Security, and Performance). These are fast, relatively cheap, and excellent at spotting logic errors and vulnerabilities in code.
Kimi K2.5: Used for lightweight, text-heavy tasks like the Documentation Reviewer, Release Reviewer, and the AGENTS.md Reviewer.

These are the defaults, but every single model assignment can be overridden dynamically at runtime via our reviewer-config Cloudflare Worker, which we'll cover in the control plane section below.

Prompt injection prevention

Agent prompts are built at runtime by concatenating the agent-specific markdown file with a shared REVIEWER_SHARED.md file containing mandatory rules. The coordinator's input prompt is assembled by stitching together MR metadata, comments, previous review findings, diff paths, and custom instructions into structured XML.

We also had to sanitise user-controlled content. If someone puts Repository: evil-corp in their MR description, they could theoretically break out of the XML structure and inject their own instructions into the coordinator's prompt. We strip these boundary tags out entirely, because we've learned over time to never underestimate the creativity of Cloudflare engineers when it comes to testing a new internal tool:

const PROMPT_BOUNDARY_TAGS = [
  "mr_input", "mr_body", "mr_comments", "mr_details",
  "changed_files", "existing_inline_findings", "previous_review",
  "custom_review_instructions", "agents_md_template_instructions",
];
const BOUNDARY_TAG_PATTERN = new RegExp(
  `]*>`, "gi"
);

Saving tokens with shared context

The system doesn't embed full diffs in the prompt. Instead, it writes per-file patch files to a diff_directory and passes the path. Each sub-reviewer reads only the patch files relevant to its domain.

We also extract a shared context file (shared-mr-context.txt) from the coordinator's prompt and write it to disk. Sub-reviewers read this file instead of having the full MR context duplicated in each of their prompts. This was a deliberate decision, as duplicating even a moderately-sized MR context across seven concurrent reviewers would multiply our token costs by 7x.

The coordinator helps keep things focused

After spawning all sub-reviewers, the coordinator performs a judge pass to consolidate the results:

Deduplication: If the same issue is flagged by both the security reviewer and the code quality reviewer, it gets kept once in the section where it fits best.
Re-categorisation: A performance issue flagged by the code quality reviewer gets moved to the performance section.
Reasonableness filter: Speculative issues, nitpicks, false positives, and convention-contradicted findings get dropped. If the coordinator isn't sure, it uses its tools to read the source code and verify.

The overall approval decision follows a strict rubric:

Condition	Decision	GitLab Action
All LGTM (“looks good to me”), or only trivial suggestions	`approved`	`POST /approve`
Only suggestion-severity items	`approved_with_comments`	`POST /approve`
Some warnings, no production risk	`approved_with_comments`	`POST /approve`
Multiple warnings suggesting a risk pattern	`minor_issues`	`POST /unapprove` (revoke prior bot approval)
Any critical item, or production safety risk	`significant_concerns`	`/submit_review requested_changes` (block merge)

The bias is explicitly toward approval, meaning a single warning in an otherwise clean MR still gets approved_with_comments rather than a block.

Because this is a production system that directly sits between engineers shipping code, we made sure to build an escape hatch. If a human reviewer comments break glass, the system forces an approval regardless of what the AI found. Sometimes you just need to ship a hotfix, and the system detects this override before the review even starts, so we can track it in our telemetry and aren’t caught out by any latent bugs or LLM provider outages.

Risk tiers: don't send the dream team to review a typo fix

You don't need seven concurrent AI agents burning Opus-tier tokens to review a one-line typo fix in a README. The system classifies every MR into one of three risk tiers based on the size and nature of the diff:

// Simplified from packages/core/src/risk.ts
function assessRiskTier(diffEntries: DiffEntry[]) {
  const totalLines = diffEntries.reduce(
    (sum, e) => sum + e.addedLines + e.removedLines, 0
  );
  const fileCount = diffEntries.length;
  const hasSecurityFiles = diffEntries.some(
    e => isSecuritySensitiveFile(e.newPath)
  );

  if (fileCount > 50 || hasSecurityFiles) return "full";
  if (totalLines <= 10 && fileCount <= 20)  return "trivial";
  if (totalLines <= 100 && fileCount <= 20) return "lite";
  return "full";
}

Security-sensitive files: anything touching auth/, crypto/, or file paths that sound even remotely security-related always trigger a full review, because we’d rather spend a bit extra on tokens than potentially miss a security vulnerability.

Each tier gets a different set of agents:

Tier	Lines Changed	Files	Agents	What Runs
Trivial	≤10	≤20	2	Coordinator + one generalised code reviewer
Lite	≤100	≤20	4	Coordinator + code quality + documentation + (more)
Full	>100 or >50 files	Any	7+	All specialists, including security, performance, release

The trivial tier also downgrades the coordinator from Opus to Sonnet, for example, as a two-reviewer check on a minor change doesn't require an extremely capable and expensive model to evaluate.

Diff filtering: getting rid of the noise

Before the agents see any code, the diff goes through a filtering pipeline that strips out noise like lock files, vendored dependencies, minified assets, and source maps:

const NOISE_FILE_PATTERNS = [
  "bun.lock", "package-lock.json", "yarn.lock",
  "pnpm-lock.yaml", "Cargo.lock", "go.sum",
  "poetry.lock", "Pipfile.lock", "flake.lock",
];

const NOISE_EXTENSIONS = [".min.js", ".min.css", ".bundle.js", ".map"];

We also filter out generated files by scanning the first few lines for markers like // @generated or /* eslint-disable */. However, we explicitly exempt database migrations from this rule, since migration tools often stamp files as generated even though they contain schema changes that absolutely need to be reviewed.

The spawn_reviewers tool: concurrent orchestration

The spawn_reviewers tool manages the lifecycle of up to seven concurrent reviewer sessions with circuit breakers, failback chains, per-task timeouts, and retry logic. It acts essentially as a tiny scheduler for LLM sessions.

Determining when an LLM session is actually "done" is surprisingly tricky. We rely primarily on OpenCode's session.idle events, but we back that up with a polling loop that checks the status of all running tasks every three seconds. This polling loop also implements inactivity detection. If a session has been running for 60 seconds with no output at all, it is killed early and marked as an error, which catches sessions that crash on startup before producing any JSONL.

Timeouts operate at three levels:

Per-task: 5 minutes (10 for code quality, which reads more files). This prevents one slow reviewer from blocking the rest.
Overall: 25 minutes. A hard cap for the entire spawn_reviewers call. When it hits, every remaining session is aborted.
Retry budget: 2 minutes minimum. We don't bother retrying if there isn't enough time left in the overall budget.

Resilience: circuit breakers and failback chains

Running seven concurrent AI model calls means you are absolutely going to hit rate limits and provider outages. We implemented a circuit breaker pattern inspired by Netflix's Hystrix, adapted for AI model calls. Each model tier has independent health tracking with three states:

When a model's circuit opens, the system walks a failback chain to find a healthy alternative. For example:

const DEFAULT_FAILBACK_CHAIN = {
  "opus-4-7":   "opus-4-6",    // Fall back to previous generation
  "opus-4-6":   null,          // End of chain
  "sonnet-4-6": "sonnet-4-5",
  "sonnet-4-5": null,
};

Each model family is isolated, so if one model is overloaded, we fall back to an older generation model rather than crossing streams. When a circuit opens, we allow exactly one probe request through after a two-minute cooldown to see if the provider has recovered, which prevents us from stampeding a struggling API.

Error classification

When a sub-reviewer session fails, the system needs to decide if it should trigger model failback or if it's a problem that a different model won't fix. The error classifier maps OpenCode's error union type to a shouldFailback boolean:

switch (err.name) {
  case "APIError":
    // Only retryable API errors (429, 503) trigger failback
    return { shouldFailback: Boolean(data.isRetryable), ... };
  case "ProviderAuthError":
    // Auth failure (a different model won't fix bad credentials)
    return { shouldFailback: false, ... };
  case "ContextOverflowError":
    // Too many tokens (a different model has the same limit)
    return { shouldFailback: false, ... };
  case "MessageAbortedError":
    // User/system abort (not a model problem)
    return { shouldFailback: false, ... };
}

Only retryable API errors trigger failback. Auth errors, context overflow, aborts, and structured output errors do not.

Coordinator-level failback

The circuit breaker handles sub-reviewer failures, but the coordinator itself can also fail. The orchestration layer has a separate failback mechanism: if the OpenCode child process fails with a retryable error (detected by scanning stderr for patterns like "overloaded" or "503"), it hot-swaps the coordinator model in the opencode.json config file and retries. This is a file-level swap that reads the config JSON, replaces the review_coordinator.model key, and writes it back before the next attempt.

The control plane: Workers for config and telemetry

If a model provider goes down at 8 a.m. UTC when our colleagues in Europe are just waking up, we don’t want to wait for an on-call engineer to make a code change to switch out the models we’re using for the reviewer. Instead, the CI job fetches its model routing configuration from a Cloudflare Worker backed by Workers KV.

The response contains per-reviewer model assignments and a providers block. When a provider is disabled, the plugin filters out all models from that provider before selecting the primary:

function filterModelsByProviders(models, providers) {
  return models.filter((m) => {
    const provider = extractProviderFromModel(m.model);
    if (!provider) return true;       // Unknown provider → keep
    const config = providers[provider];
    if (!config) return true;         // Not in config → keep
    return config.enabled;            // Disabled → filter out
  });
}

This means we can flip a switch in KV to disable an entire provider, and every running CI job will route around it within five seconds. The config format also carries failback chain overrides, allowing us to reshape the entire model routing topology from a single Worker update.

We also use a fire-and-forget TrackerClient that talks to a separate Cloudflare Worker to track job starts, completions, findings, token usage, and Prometheus metrics. The client is designed to never block the CI pipeline, using a 2-second AbortSignal.timeout and pruning pending requests if they exceed 50 entries. Prometheus metrics are batched on the next microtask and flushed right before the process exits, forwarding to our internal observability stack via Workers Logging, so we know exactly how many tokens we are burning in real time.

Re-reviews: not starting from scratch

When a developer pushes new commits to an already-reviewed MR, the system runs an incremental re-review that is aware of its own previous findings. The coordinator receives the full text of its last review comment and a list of inline DiffNote comments it previously posted, along with their resolution status.

The re-review rules are strict:

Fixed findings: Omit from the output, and the MCP server auto-resolves the corresponding DiffNote thread.
Unfixed findings: Must be re-emitted even if unchanged, so the MCP server knows to keep the thread alive.
User-resolved findings: Respected unless the issue has materially worsened.
User replies: If a developer replies "won't fix" or "acknowledged", the AI treats the finding as resolved. If they reply "I disagree", the coordinator will read their justification and either resolve the thread or argue back.

We also made sure to build in a small Easter egg and made sure that the reviewer can also handle one lighthearted question per MR. We figured a little personality helps build rapport with developers who are being reviewed (sometimes brutally) by a robot, so the prompt instructs it to keep the answer brief and warm before politely redirecting back to the review.

Keeping AI context fresh: the AGENTS.md Reviewer

AI coding agents rely heavily on AGENTS.md files to understand project conventions, but these files rot incredibly fast. If a team migrates from Jest to Vitest but forgets to update their instructions, the AI will stubbornly keep trying to write Jest tests.

We built a specific reviewer just to assess the materiality of an MR and yell at developers if they make a major architectural change without updating the AI instructions. It classifies changes into three tiers:

High materiality (strongly recommend update): package manager changes, test framework changes, build tool changes, major directory restructures, new required env vars, CI/CD workflow changes.
Medium materiality (worth considering): major dependency bumps, new linting rules, API client changes, state management changes.
Low materiality (no update needed): bug fixes, feature additions using existing patterns, minor dependency updates, CSS changes.

It also penalizes anti-patterns in existing AGENTS.md files, like generic filler ("write clean code"), files over 200 lines that cause context bloat, and tool names without runnable commands. A concise, functional AGENTS.md with commands and boundaries is always better than a verbose one.

How our teams use it

The system ships as a fully contained internal GitLab CI component. A team adds it to their .gitlab-ci.yml:

include:
  - component: $CI_SERVER_FQDN/ci/ai/opencode@~latest

The component handles pulling the Docker image, setting up Vault secrets, running the review, and posting the comment. Teams can customise behavior by dropping an AGENTS.md file in their repo root with project-specific review instructions, and teams can opt to provide a URL to an AGENTS.md template that gets injected into all agent prompts to ensure their standard conventions apply across all of their repositories without needing to keep multiple AGENTS.md files up to date.

The entire system also runs locally. The @opencode-reviewer/local plugin provides a /fullreview command inside OpenCode's TUI that generates diffs from the working tree, runs the same risk assessment and agent orchestration, and posts results inline. It's the exact same agents and prompts, just running on your laptop instead of in CI.

Show me the numbers!

We have been running this system for about a month now, and we track everything through our review-tracker Worker. Here is what the data looks like across 5,169 repositories from March 10 to April 9, 2026.

The overview

In the first 30 days, the system completed 131,246 review runs across 48,095 merge requests in 5,169 repositories. The average merge request gets reviewed 2.7 times (the initial review, plus re-reviews as the engineer pushes fixes), and the median review completes in 3 minutes and 39 seconds. That is fast enough that most engineers see the review comment before they have finished context-switching to another task. The metric we’re the proudest about, though, is that engineers have only needed to “break glass” 288 times (0.6% of merge requests).

On the cost side, the average review costs $1.19 and the median is $0.98. The distribution has a long tail of expensive reviews – massive refactors that trigger full-tier orchestration. The P99 review costs $4.45, which means 99% of reviews come in under five dollars.

Percentile	Cost per review	Review duration
Median	$0.98	3m 39s
P90	$2.36	6m 27s
P95	$2.93	7m 29s
P99	$4.45	10m 21s

What it found

The system produced 159,103 total findings across all reviews, broken down as follows:

That is about 1.2 findings per review on average, which is deliberately low. We biased hard for signal over noise, and the "What NOT to Flag" prompt sections are a big part of why the numbers look like this rather than 10+ findings per review of dubious quality.

The code quality reviewer is the most prolific, producing nearly half of all findings. Security and performance reviewers produce fewer findings but at higher average severity, but the absolute numbers tell the full story — code quality produces nearly half of all findings by volume, while the security reviewer flags the highest proportion of critical issues at 4%:

Reviewer	Critical	Warning	Suggestion	Total
Code Quality	6,460	29,974	38,464	74,898
Documentation	155	9,438	16,839	26,432
Performance	65	5,032	9,518	14,615
Security	484	5,685	5,816	11,985
Codex (compliance)	224	4,411	5,019	9,654
AGENTS.md	18	2,675	4,185	6,878
Release	19	321	405	745

Token usage

Over the month, we processed approximately 120 billion tokens in total. The vast majority of those are cache reads, which is exactly what we want to see — it means the prompt caching is working, and we are not paying full input pricing for repeated context across re-reviews.

Our cache hit rate sits at 85.7%, which saves us an estimated five figures compared to what we would pay at full input token pricing. This is partially thanks to the shared context file optimisation — sub-reviewers reading from a cached context file rather than each getting their own copy of the MR metadata, but also by using the exact same base prompts across all runs, across all merge requests.

Here is how the token usage breaks down by model and by agent:

Model	Input	Output	Cache Read	Cache Write	% of Total
Top-tier models (Claude Opus 4.7, GPT-5.4)	806M	1,077M	25,745M	5,918M	51.8%
Standard-tier models (Claude Sonnet 4.6, GPT-5.3 Codex)	928M	776M	48,647M	11,491M	46.2%
Kimi K2.5	11,734M	267M	0	0	0.0%

Top-tier models and Standard-tier models split the cost roughly 52/48, which makes sense given that the top-tier models have to do a lot more complex work (one session per review, but with expensive extended thinking and large output) while the standard-tier models handle three sub-reviewers per full review. Kimi processes the most raw input tokens (11.7B) but costs “nothing” since it runs through Workers AI.

The per-agent breakdown shows where the tokens actually go:

Agent	Input	Output	Cache Read	Cache Write
Coordinator	513M	1,057M	20,683M	5,099M
Code Quality	428M	264M	19,274M	3,506M
Engineering Codex	409M	236M	18,296M	3,618M
Documentation	8,275M	216M	8,305M	616M
Security	199M	149M	8,917M	2,603M
Performance	157M	124M	6,138M	2,395M
AGENTS.md	4,036M	119M	2,307M	342M
Release	183M	5M	231M	15M

The coordinator produces by far the most output tokens (1,057M) because it has to write the full structured review comment. The documentation reviewer has the highest raw input (8,275M) because it processes every file type, not just code. The release reviewer barely registers because it only runs when release-related files are in the diff.

Cost by risk tier

The risk tier system is doing its job. Trivial reviews (typo fixes, small doc changes) cost 20 cents on average, while full reviews with all seven agents average $1.68. The spread is exactly what we designed for:

Tier	Reviews	Avg Cost	Median	P95	P99
Trivial	24,529	$0.20	$0.17	$0.39	$0.74
Lite	27,558	$0.67	$0.61	$1.15	$1.95
Full	78,611	$1.68	$1.47	$3.35	$5.05

So, what does a review look like?

We’re glad you asked! Here’s an example of what a particularly egregious review looks like:

As you can see, the reviewer doesn’t beat around the bush and calls out problems when it sees them.

Limitations we're honest about

This isn't a replacement for human code review, at least not yet with today’s models. AI reviewers regularly struggle with:

Architectural awareness: The reviewers see the diff and surrounding code, but they don't have the full context of why a system was designed a certain way or whether a change is moving the architecture in the right direction.
Cross-system impact: A change to an API contract might break three downstream consumers. The reviewer can flag the contract change, but it can't verify that all consumers have been updated.
Subtle concurrency bugs: Race conditions that depend on specific timing or ordering are hard to catch from a static diff. The reviewer can spot missing locks, but not all the ways a system can deadlock.
Cost scales with diff size: A 500-file refactor with seven concurrent frontier model calls costs real money. The risk tier system manages this, but when the coordinator's prompt exceeds 50% of the estimated context window, we emit a warning. Large MRs are inherently expensive to review.

We’re just getting started

For more on how we’re using AI at Cloudflare, read our post on our internal AI engineering stack. And check out everything we shipped during Agents Week.

Have you integrated AI into your code review? We’d love to hear about it. Find us on Discord, X, and Bluesky.

Interested in building cutting edge projects like this, on cutting edge technology? Come build with us!

Redirects for AI Training enforces canonical content

Cam Whiteside — Fri, 17 Apr 2026 13:00:00 GMT

Cloudflare's Wrangler CLI has published several major versions over the past six years, each containing at least some critical changes to commands, configuration, or how developers interact with the platform. Like any actively maintained open-source project, we keep documentation for older versions available. The v1 documentation carries a deprecation banner, a noindex meta tag, and canonical tags pointing to current docs. Every advisory signal says the same thing: this content is outdated, look elsewhere. AI training crawlers don’t reliably honor those signals.

We use AI Crawl Control on developers.cloudflare.com, so we know that bots in the AI Crawler Category visited 4.8 million times over the last 30 days, and they consumed deprecated content at the same rate as current content. The advisory signals made no measurable difference. The effect is cumulative because AI agents don't always fetch content live; they draw on trained models. When crawlers ingest deprecated docs, agents inherit outdated foundations.

Today, we’re launching Redirects for AI Training to let you enforce that verified AI training crawlers are redirected to up-to-date content. Your existing canonical tags become HTTP 301 redirects for verified AI training crawlers, automatically, with one toggle, on all paid Cloudflare plans.

And because status codes are ultimately how the web communicates policy to crawlers, Radar's AI Insights page now includes Response status code analysis showing the various types (successful (2xx), redirection (3xx), client error (4xx), and server error (5xx) of status codes AI crawlers receive across all Cloudflare traffic as a view of how the web responds to AI crawlers today.

AI training crawlers face dead ends today

For search engines, noindex functions as a rich signal system, but there’s no equivalent inline directive a page can carry that says “don’t train on this”. Keeping a deprecated page live with a warning banner may work for humans, who read the notice and navigate on, but AI training crawlers ingest the full text and risk treating the banner as just one more paragraph, returning thousands of times even after the warning is visible.

Blocking creates its own problem: it produces a void with no signal about what the crawler should learn instead. robots.txt offers limited protection, but as automated traffic grows, maintaining per-crawler, per-path, per-content-update directives requires hefty manual upkeep. What crawlers need is specific direction: “Here is where the current content lives.”

The tag is an HTML element defined in RFC 6596 that tells search engines and automated systems which URL represents the authoritative version of a page. It’s already present on 65-69% of web pages and is generated automatically by platforms like EmDash, WordPress, and Contentful. That infrastructure declares what the current version of your content is, and Redirects for AI Training enforces it.

How it works

Redirects for AI Training operates on two inputs: Cloudflare's cf.verified_bot_category field and the tags already in your HTML. The AI Crawler category covers bots that crawl for AI model training, including GPTBot, ClaudeBot, and Bytespider, and is distinct from the AI Assistant and AI Search categories that cover AI Agents.

When a request arrives from a verified AI Crawler, Cloudflare reads the response HTML. If a non-self-referencing canonical tag is present, Cloudflare issues a 301 Moved Permanently to the canonical URL before returning the response. Human traffic, search indexing, and other automated traffic is unaffected.

Here’s what the exchange looks like for a GPTBot request to a deprecated path:

GET /durable-objects/api/legacy-kv-storage-api/

Host: developers.cloudflare.com

User-Agent: Mozilla/5.0 (compatible; GPTBot/1.1; +https://openai.com/gptbot)

HTTP/1.1 301 Moved Permanently

Location: https://developers.cloudflare.com/durable-objects/api/sqlite-storage-api/

What this does not do

It doesn't retroactively correct training data already ingested or cover unverified crawlers outside the AI Crawler bot category. Humans and AI Agents visiting deprecated pages will not be redirected. We also exclude cross-origin canonicals by design (tags directing to preferred URLs on different domains), since they’re often used for domain consolidation rather than content freshness. To avoid loops, self-referencing canonicals (a tag on a page pointing to its own URL) don't trigger a redirect either.

Why not just use redirect rules?

Single Redirect Rules can target AI crawlers by user-agent string, and if a site has just a handful of known deprecated paths, that works. But it doesn't scale: every new deprecated path requires a change to the rule, user-agents must be manually tracked, and it would contribute to plan limitations that may otherwise be used for campaign URLs or domain migrations. Redirect rules also manually re-encode what canonical tags already declare and fall out of sync as content changes.

What we found on our own documentation site

Our own experience shows that this problem is real. We run AI Crawl Control on developers.cloudflare.com using the same dashboard available to all Cloudflare customers. In March 2026, legacy Workers documentation was crawled around 46,000 times by OpenAI, 3,600 times by Anthropic, and 1,700 times by Meta.

That crawling of deprecated pages may be why when we asked a leading AI assistant in April 2026, "How do I write KV values using the Wrangler CLI?", it gave an out-of-date answer: "You write to Cloudflare KV via the Wrangler CLI using the kv:key put command."

In fact, the correct syntax (as at April 2026) is wrangler kv key put; the colon syntax (kv:key put) was deprecated in Wrangler 3.60.0. Our documentation carries an inline deprecation notice, but it's unclear how training pipelines interpret them.

So we enabled Redirects for AI Training on developers.cloudflare.com and measured the response. In the first seven days, 100% of AI training crawler requests to pages with non-self-referencing canonical tags were redirected and were not served with deprecated content.

We expect that redirecting crawlers to current content eventually improves AI-generated answers about legacy tools. Given the closed nature of training pipelines and variability in recrawl timing, this is a hypothesis we will continue to verify. But what the crawler receives at the point of access has seen immediate improvement.

How to enable

If your site has canonical tags, your existing content hierarchy can now be enforced for verified AI training crawlers. Cloudflare's verified bot classification handles crawler identification automatically.

In the dashboard: on any domain, go to AI Crawl Control > Quick Actions > Redirects for AI training > toggle on.

For path-specific control via Configuration Rules and Cloudflare for SaaS, see the full documentation.

How the web responds to AI crawlers

Redirects for AI Training turns one status code, 301 Moved Permanently, into an enforcement mechanism for your content policy. But 301 is one signal in a broader conversation between origins and crawlers. A 200 OK means content was served. A 403 Forbidden means access was blocked. A 402 Payment Required tells the client it needs to pay for access. Taken together, the distribution of status codes across AI crawler traffic reveals how the web is actually responding to crawlers at scale.

Radar’s AI Insights page now includes a Response status code analysis graph illustrating the distribution of the top response status codes or response status code groupings (selectable via a dropdown) for AI crawler traffic. The data can be filtered by industry set; the crawl purpose filter can also be applied in Data Explorer. Filtered analyses provide a perspective into whether certain types of crawlers behave differently, or if request patterns and distributions vary by industry.

In the general example shown below, we can see that for the time period covered by the graph, just over 70% of requests were serviced successfully (200), while 10.1% of the requests were redirected (301, 302) to another URL, and 3.7% were for files that weren’t found (404). Access to content was blocked for 8.3% of requests, receiving a 403 response status code. Grouped, we find that nearly 74% of requests received successful responses (2xx), 13.7% received client error responses (4xx), 11.3% received redirection messages (3xx), and 1.2% were sent server error responses (5xx).

This analysis has also been added to individual bot pages to provide insight into this aspect of a crawler’s behavior as well. In the GPTBot example shown below, we can see that for the time period covered by the graph, just over 80% of requests were serviced successfully (200), while 4.7% of the requests were redirected (301, 302) to another URL, and just 2.7% were for files that weren’t found (404). Nearly 6% were blocked, with Cloudflare returning a 403 response status code. Grouped, we find that 83% of requests received successful responses (2xx), nearly 10% received client error responses (4xx), 5.1% received redirection messages (3xx), and the remaining 2.2% got server error responses (5xx).

As noted above, Radar’s Data Explorer enables users to drill down further into the data by applying additional filters. For example, we can look at things like which crawlers are requesting the most non-existent content (resulting in a 404 response status code), and how that request traffic trends over time, or which industries are sending the most Redirection (3xx) response status codes to Training crawlers, and how that activity trends over time.

Response status code data, both in aggregate and on a per-bot basis, is also available through the Cloudflare Radar API.

Redirects for AI Training lets you shape what crawlers receive from your origin; Radar's status code analysis lets you see how the rest of the web is doing the same. Enable Redirects for AI Training in AI Crawl Control > Overview > Quick Actions to start replacing advisory signals with enforced outcomes on your site today.

Have questions or want to share what you're seeing? Join the discussion on the Cloudflare Community or find us on Discord.

Watch on Cloudflare TV

Introducing Flagship: feature flags built for the age of AI

Rohan Mukherjee — Fri, 17 Apr 2026 13:00:00 GMT

AI is writing more code than ever. AI-assisted contributions now account for a rapidly growing share of new code across the platform. Agentic coding tools like OpenCode and Claude Code are shipping entire features in minutes.

AI-generated code entering production is only going to accelerate. But the bigger shift isn't just speed — it's autonomy.

Today, an AI agent writes code and a human reviews, merges, and deploys it. Tomorrow, the agent does all of that itself. The question becomes: how do you let an agent ship to production without removing every safety net?

Feature flags are the answer. An agent writes a new code path behind a flag and deploys it — the flag is off, so nothing changes for users. The agent then enables the flag for itself or a small test cohort, exercises the feature in production, and observes the results. If metrics look good, it ramps the rollout. If something breaks, it disables the flag. The human doesn't need to be in the loop for every step — they set the boundaries, and the flag controls the blast radius.

This is the workflow feature flags were always building toward: not just decoupling deployment from release, but decoupling human attention from every stage of the shipping process. The agent moves fast because the flag makes it safe to move fast.

Today, we're announcing Flagship — Cloudflare's native feature flag service, built on OpenFeature, the CNCF open standard for feature flag evaluation. It works everywhere — Workers, Node.js, Bun, Deno, and the browser — but it's fastest on Workers, where flags are evaluated within the Cloudflare network. With the Flagship binding and OpenFeature, integration looks like this:

await OpenFeature.setProviderAndWait(
    new FlagshipServerProvider({ binding: env.FLAGS })
);

Flagship is now available in closed beta.

The problem with feature flags on Workers

Many Cloudflare developers have resorted to the pragmatic workaround: hardcoding flag logic directly into their Workers. And honestly, it works well enough in the beginning. Workers deploy in seconds, so flipping a boolean in code and pushing it to production is fast enough for most situations.

But it doesn't stay simple. One hardcoded flag becomes ten. Ten becomes fifty, owned by different teams, with no central view of what's on or off. There's no audit trail — when something breaks, you're searching git blame to figure out who toggled what.

Network call to external services

Another common pattern used on workers is to make an HTTP request to an external service in the following manner:

const response = await fetch("https://flags.example-service.com/v1/evaluate", {
      ...
      body: JSON.stringify({
        flagKey: "new-checkout-flow",
        context: {
          ...
        },
      }),
    });
const { value } = await response.json();
if (value === true) {
    return handleNewCheckout(request);
}
return handleLegacyCheckout(request);

That outbound request sits on the critical path of every single user request. It could add considerable latency depending on how far the user is from the flag service's region.

This is a strange situation. Your application runs at the edge, milliseconds from the user. But the feature flag check forces it to reach back across the Internet to another API before it can decide what to render.

Why local evaluation doesn't solve the problem

Some feature flag services offer a "local evaluation" SDK. Instead of calling a remote API on every request, the SDK downloads the full set of flag rules into memory and evaluates them locally. No outbound request per evaluation and the flag decision happens in-process.

On Workers, none of these assumptions hold. There is no long-lived process: a Worker isolate can be created, serve a request, and be evicted between one request and the next. A new invocation could mean re-initializing the SDK from scratch.

On a serverless platform, you need a distribution primitive that's already at the edge, one where the caching is managed for you, reads are local, and you don't need a persistent connection to keep things up to date.

Cloudflare KV is a great primitive for this!

How Flagship works

Flagship is built entirely on Cloudflare's infrastructure — Workers, Durable Objects, and KV. There are no external databases, no third-party services, and no centralized origin servers in the evaluation path.

When you create or update a flag, the control plane writes the change atomically to a Durable Object — a SQLite-backed, globally unique instance that serves as the source of truth for that app's flag configuration and changelog. Within seconds, the updated flag config is synced to Workers KV, Cloudflare's globally distributed key-value store, where it's replicated across Cloudflare's network.

When a request evaluates a flag, Flagship reads the flag config directly from KV at the edge — the same Cloudflare location already handling the request. The evaluation engine then runs right there in an isolate: it matches the request context against the flag's targeting rules, resolves the rollout percentage, and returns a variation. Both the data and the logic live at the edge — nothing is sent elsewhere to be evaluated.

Using Flagship: the Worker binding

For teams running Cloudflare Workers, Flagship offers a direct binding that evaluates flags inside the Workers runtime — no HTTP round-trip, no SDK overhead. Add the binding to your wrangler.jsonc and your Worker is connected:

{
  "flagship": [
    {
      "binding": "FLAGS",
      "app_id": ""
    }
  ]
}

That's it. Your account ID is inferred from your Cloudflare account, and the app_id ties the binding to a specific Flagship app. In your Worker, you just ask for a flag value:

export default {
  async fetch(request: Request, env: Env) {
    // Simple boolean check
    const showNewUI = await env.FLAGS.getBooleanValue('new-ui', false, {
      userId: 'user-42',
      plan: 'enterprise',
    });
    // Full evaluation details when you need them
    const details = await env.FLAGS.getStringDetails('checkout-flow', 'v1', {
      userId: 'user-42',
    });
    // details.value = "v2", details.variant = "new", details.reason = "TARGETING_MATCH"
  },
};

The binding supports typed accessors for every variation type - getBooleanValue(), getStringValue(), getNumberValue(), getObjectValue() - plus *Details() variants that return the resolved value alongside the matched variant and the reason it was selected. On evaluation errors, the default value is returned gracefully. On type mismatches, the binding throws an exception — that's a bug in your code, not a transient failure.

The SDK: OpenFeature-native

Most feature flag SDKs come with their own interfaces and evaluation patterns. Over time, those become deeply embedded in your codebase — and switching providers means rewriting every call site.

We didn't want to build another one of those. Flagship is built on OpenFeature, the CNCF open standard for feature flag evaluation. OpenFeature defines a common interface for flag evaluation across languages and providers — it's the same relationship that OpenTelemetry has to observability. You write your evaluation code once against the standard, and swap providers by changing a single line of configuration.

import { OpenFeature } from '@openfeature/server-sdk';
import { FlagshipServerProvider } from '@cloudflare/flagship/server';
await OpenFeature.setProviderAndWait(
  new FlagshipServerProvider({
    appId: 'your-app-id',
    accountId: 'your-account-id',
    authToken: 'your-cloudflare-api-token',
  })
);
const client = OpenFeature.getClient();
const showNewCheckout = await client.getBooleanValue(
  'new-checkout-flow',
  false,
  {
    targetingKey: 'user-42',
    plan: 'enterprise',
    country: 'US',
  }
);

If you're running on Workers with the Flagship binding, you can pass it directly to the OpenFeature provider. The binding already carries your account context, so there's nothing to configure — authentication is implicit.

import { OpenFeature } from '@openfeature/server-sdk';
import { FlagshipProvider } from '@cloudflare/flagship/server';
let initialized = false;
export default {
  async fetch(request: Request, env: Env) {
    if (!initialized) {
      await OpenFeature.setProviderAndWait(
        new FlagshipServerProvider({ binding: env.FLAGS })
      );
      initialized = true;
    }
    const client = OpenFeature.getClient();
    const showNewCheckout = await client.getBooleanValue('new-checkout-flow', false, {
      targetingKey: 'user-42',
      plan: 'enterprise',
    });
  },
};

Your evaluation code doesn't change — the OpenFeature interface is identical. But under the hood, Flagship evaluates flags through the binding instead of over HTTP. You get the portability of the standard with the performance of the binding.

A client-side provider is also available for browsers. It pre-fetches the flags you specify, caches them with a configurable TTL, and serves evaluations synchronously from that cache.

What you can do with Flagship

Flagship supports the patterns you'd expect from a feature flag service and the ones that become critical when AI-generated code is landing in production daily.

Flag values can be boolean, strings, numbers, or full JSON objects — useful for configuration blocks, UI theme definitions, or routing users to different API versions without maintaining separate code paths.

Targeting Rules

Each flag can have multiple rules, evaluated in priority order. The first rule that matches wins.

A rule consists of:

Conditions that determine whether the rule applies to a given context
A flag variation to serve when the rule matches
An optional rollout for percentage-based delivery
A priority that determines evaluation order when multiple rules are present (lower number = higher priority)

Nested Logical Conditions

Conditions can be composed using AND/OR logic, nested up to five levels deep. A single rule can express things like:

(plan == “enterprise” AND region == “us” ) OR (user.email.endsWith(“@cloudflare.com”))
= serve (“premium”)

At the top level of a rule, multiple conditions are combined with implicit AND where all conditions must pass for the rule to match. Within each condition, you can nest AND/OR groups for more complex logic.

Flag Rollouts by Percentage

Unlike gradual deployments, which split traffic between different uploaded versions of your Worker, feature flags let you roll out behavior by percentage within a single version that is serving 100% of traffic.

Any rule can include a percentage rollout. Instead of serving a variation to everyone who matches the conditions, you serve it to a percentage of them.

Rollouts use consistent hashing on the specified context attribute. The same attribute value (userId, for example) always hashes to the same bucket, so they won't flip between variations across requests. You can ramp from 5% to 10% to 50% to 100% of users, so those who were already in the rollout stay in it.

Built for what comes next

AI-generated code entering production is only going to accelerate. Agentic workflows will push it further — agents that autonomously deploy, test, and iterate on code in production. The teams that thrive in this world won't be the ones shipping the fastest. They'll be the ones who can ship fast and still maintain control over what their users see, roll back in seconds when something breaks, and gradually expose new code paths with confidence.

That's what Flagship is built for:

Evaluation across region Earth, cached globally using K/V.
A full audit trail. Every flag change is recorded with field-level diffs, so you know who changed what and when.
Dashboard integration. Anyone on the team can toggle a flag or adjust a rollout without touching code.
OpenFeature compatibility. Adopt Flagship without rewriting your evaluation code. Leave without rewriting it either.

Get started with Flagship

Starting today, Flagship is in private beta. You can request for access here. We'll share more details on pricing as we approach general availability.

Visit the Cloudflare dashboard to create your first Flagship app
Install the SDK: npm i @cloudflare/flagship; or use the Worker binding directly in your Worker
Read the documentation for integration guides and API reference
Check out the source code for examples and to contribute

If you're currently hardcoding flags in your Workers, or evaluating flags through an external service that adds latency to every request, give Flagship a try. We'd love to hear what you build.

Cloudflare’s AI Platform: an inference layer designed for agents

Ming Lu — Thu, 16 Apr 2026 14:05:00 GMT

AI models are changing quickly: the best model to use for agentic coding today might in three months be a completely different model from a different provider. On top of this, real-world use cases often require calling more than one model. Your customer support agent might use a fast, cheap model to classify a user's message; a large, reasoning model to plan its actions; and a lightweight model to execute individual tasks.

This means you need access to all the models, without tying yourself financially and operationally to a single provider. You also need the right systems in place to monitor costs across providers, ensure reliability when one of them has an outage, and manage latency no matter where your users are.

These challenges are present whenever you’re building with AI, but they get even more pressing when you’re building agents. A simple chatbot might make one inference call per user prompt. An agent might chain ten calls together to complete a single task and suddenly, a single slow provider doesn't add 50ms, it adds 500ms. One failed request isn't a retry, but suddenly a cascade of downstream failures.

Since launching AI Gateway and Workers AI, we’ve seen incredible adoption from developers building AI-powered applications on Cloudflare and we’ve been shipping fast to keep up! In just the past few months, we've refreshed the dashboard, added zero-setup default gateways, automatic retries on upstream failures, and more granular logging controls. Today, we’re making Cloudflare into a unified inference layer: one API to access any AI model from any provider, built to be fast and reliable.

One catalog, one unified endpoint

Starting today, you can call third-party models using the same AI.run() binding you already use for Workers AI. If you’re using Workers, switching from a Cloudflare-hosted model to one from OpenAI, Anthropic, or any other provider is a one-line change.

const response = await env.AI.run('anthropic/claude-opus-4-6',{
input: 'What is Cloudflare?',
}, {
gateway: { id: "default" },
});

For those who don’t use Workers, we’ll be releasing REST API support in the coming weeks, so you can access the full model catalog from any environment.

We’re also excited to share that you'll now have access to 70+ models across 12+ providers — all through one API, one line of code to switch between them, and one set of credits to pay for them. And we’re quickly expanding this as we go.

You can browse through our model catalog to find the best model for your use case, from open-source models hosted on Cloudflare Workers AI to proprietary models from the major model providers. We’re excited to be expanding access to models from Alibaba Cloud, AssemblyAI, Bytedance, Google, InWorld, MiniMax, OpenAI, Pixverse, Recraft, Runway, and Vidu — who will provide their models through AI Gateway. Notably, we’re expanding our model offerings to include image, video, and speech models so that you can build multimodal applications

Accessing all your models through one API also means you can manage all your AI spend in one place. Most companies today are calling an average of 3.5 models across multiple providers, which means no one provider is able to give you a holistic view of your AI usage. With AI Gateway, you’ll get one centralized place to monitor and manage AI spend.

By including custom metadata with your requests, you can get a breakdown of your costs on the attributes that you care about most, like spend by free vs. paid users, by individual customers, or by specific workflows in your app.

const response = await env.AI.run('@cf/moonshotai/kimi-k2.5',
      {
prompt: 'What is AI Gateway?'
      },
      {
metadata: { "teamId": "AI", "userId": 12345 }
      }
    );

Bring your own model

AI Gateway gives you access to models from all the providers through one API. But sometimes you need to run a model you've fine-tuned on your own data or one optimized for your specific use case. For that, we are working on letting users bring their own model to Workers AI.

The overwhelming majority of our traffic comes from dedicated instances for Enterprise customers who are running custom models on our platform, and we want to bring this to more customers. To do this, we leverage Replicate’s Cog technology to help you containerize machine learning models.

Cog is designed to be quite simple: all you need to do is write down dependencies in a cog.yaml file, and your inference code in a Python file. Cog abstracts away all the hard things about packaging ML models, such as CUDA dependencies, Python versions, weight loading, etc.

Example of a cog.yaml file:

build:
  python_version: "3.13"
  python_requirements: requirements.txt
predict: "predict.py:Predictor"

Example of a predict.py file, which has a function to set up the model and a function that runs when you receive an inference request (a prediction):

from cog import BasePredictor, Path, Input
import torch

class Predictor(BasePredictor):
    def setup(self):
        """Load the model into memory to make running multiple predictions efficient"""
        self.net = torch.load("weights.pth")

    def predict(self,
            image: Path = Input(description="Image to enlarge"),
            scale: float = Input(description="Factor to scale image by", default=1.5)
    ) -> Path:
        """Run a single prediction on the model"""
        # ... pre-processing ...
        output = self.net(input)
        # ... post-processing ...
        return output

Then, you can run cog build to build your container image, and push your Cog container to Workers AI. We will deploy and serve the model for you, which you then access through your usual Workers AI APIs.

We’re working on some big projects to be able to bring this to more customers, like customer-facing APIs and wrangler commands so that you can push your own containers, as well as faster cold starts through GPU snapshotting. We’ve been testing this internally with Cloudflare teams and some external customers who are guiding our vision. If you’re interested in being a design partner with us, please reach out! Soon, anyone will be able to package their model and use it through Workers AI.

The fast path to first token

Using Workers AI models with AI Gateway is particularly powerful if you’re building live agents – where a user's perception of speed hinges on time to first token or how quickly the agent starts responding, rather than how long the full response takes. Even if total inference is 3 seconds, getting that first token 50ms faster makes the difference between an agent that feels zippy and one that feels sluggish.

Cloudflare's network of data centers in 330 cities around the world means AI Gateway is positioned close to both users and inference endpoints, minimizing the network time before streaming begins.

Workers AI also hosts open-source models on its public catalog, which now includes large models purpose-built for agents, including Kimi K2.5 and real-time voice models. When you call these Cloudflare-hosted models through AI Gateway, there's no extra hop over the public Internet since your code and inference run on the same global network, giving your agents the lowest latency possible.

Built for reliability with automatic failover

When building agents, speed is not the only factor that users care about – reliability matters too. Every step in an agent workflow depends on the steps before it. Reliable inference is crucial for agents because one call failing can affect the entire downstream chain.

Through AI Gateway, if you're calling a model that's available on multiple providers and one provider goes down, we'll automatically route to another available provider without you having to write any failover logic of your own.

If you’re building long-running agents with Agents SDK, your streaming inference calls are also resilient to disconnects. AI Gateway buffers streaming responses as they’re generated, independently of your agent's lifetime. If your agent is interrupted mid-inference, it can reconnect to AI Gateway and retrieve the response without having to make a new inference call or paying twice for the same output tokens. Combined with the Agents SDK's built-in checkpointing, the end user never notices.

Replicate

The Replicate team has officially joined our AI Platform team, so much so that we don’t even consider ourselves separate teams anymore. We’ve been hard at work on integrations between Replicate and Cloudflare, which include bringing all the Replicate models onto AI Gateway and replatforming the hosted models onto Cloudflare infrastructure. Soon, you’ll be able to access the models you loved on Replicate through AI Gateway, and host the models you deployed on Replicate on Workers AI as well.

Get started

To get started, check out our documentation for AI Gateway or Workers AI. Learn more about building agents on Cloudflare through Agents SDK.

Watch on Cloudflare TV

Building the foundation for running extra-large language models

Michelle Chen — Thu, 16 Apr 2026 14:00:00 GMT

An agent needs to be powered by a large language model. A few weeks ago, we announced that Workers AI is officially entering the arena for hosting large open-source models like Moonshot’s Kimi K2.5. Since then, we’ve made Kimi K2.5 3x faster and have more model additions in-flight. These models have been the backbone of a lot of the agentic products, harnesses, and tools that we have been launching this week.

Hosting AI models is an interesting challenge: it requires a delicate balance between software and very, very expensive hardware. At Cloudflare, we’re good at squeezing every bit of efficiency out of our hardware through clever software engineering. This is a deep dive on how we’re laying the foundation to run extra-large language models.

Hardware configurations

As we mentioned in our previous Kimi K2.5 blog post, we’re using a variety of hardware configurations in order to best serve models. A lot of hardware configurations depend on the size of inputs and outputs that users are sending to the model. For example, if you are using a model to write fanfiction, you might give it a few small prompts (input tokens) while asking it to generate pages of content (output tokens).

Conversely, if you are running a summarization task, you might be sending in hundreds of thousands of input tokens, but only generating a small summary with a few thousand output tokens. Presented with these opposing use cases, you have to make a choice — should you tune your model configuration so it’s faster at processing input tokens, or faster at generating output tokens?

When we launched large language models on Workers AI, we knew that most of the use cases would be used for agents. With agents, you send in a large number of input tokens. It starts off with a large system prompt, all the tools, MCPs. With the first user prompt, that context keeps growing. Each new prompt from the user sends a request to the model, which consists of everything that was said before — all the previous user prompts, assistant messages, code generated, etc. For Workers AI, that means we had to focus on two things: fast input token processing and fast tool calling.

Prefill decode (PD) disaggregation

One hardware configuration that we use to improve performance and efficiency is disaggregated prefill. There are two stages to processing an LLM request: prefill, which processes the input tokens and populates the KV cache, and decode, which generates output tokens. Prefill is usually compute bound, while decode is memory bound. This means that the parts of the GPU that are used in each stage are different, and since prefill is always done before decode, the stages block one another. Ultimately, it means that we are not efficiently utilizing all of our GPU power if we do both prefill and decode on a single machine.

With prefill decode disaggregation, separate inference servers are run for each stage. First, a request is sent to the prefill stage which performs prefill and stores it in its KV cache. Then the same request is sent to the decode server, with information about how to transfer the KV cache from the prefill server and begin decoding. This has a number of advantages, because it allows the servers to be tuned independently for the role they are performing, scaled to account for more input-heavy or output-heavy traffic, or even to run on heterogeneous hardware.

This architecture requires a relatively complex load balancer to achieve. Beyond just routing the requests as described above, it must rewrite the responses (including streaming SSE) of the decode server to include information from the prefill server such as cached tokens. To complicate matters, different inference servers require different information to initiate the KV cache transfer. We extended this to implement token-aware load balancing, in which there is a pool of prefill and decode endpoints, and the load balancer estimates how many prefill or decode tokens are in-flight to each endpoint in the pool and attempts to spread this load evenly.

After our public model launch, our input/output patterns changed drastically again. We took the time to analyze our new usage patterns and then tuned our configuration to fit our customer’s use cases.

Here’s a graph of our p90 Time to First Token drop after shifting traffic to our new PD disaggregated architecture, whilst request volume increased, using the same quantity of GPUs. We see a significant improvement in the tail latency variance.

Similarly, p90 time per token went from ~100 ms with high variance to 20-30 ms, a 3x improvement in intertoken latency.

Prompt Caching

Since agentic use cases usually have long contexts, we optimize for efficient prompt caching in order to not recompute input tensors on every turn. We leverage a header called x-session-affinity in order to help requests route to the right region that previously had the computed input tensors. We wrote about this in our original blog post about launching large LLMs on Workers AI. We added session affinity headers to popular agent harnesses like OpenCode, where we noticed a significant increase in total throughput. A small difference in prompt caching from our users can sum to a factor of additional GPUs needed to run a model. While we have KV-aware routing internally, we also rely on clients sending the x-session-affinity in order to be explicit about prompt caching. We incentivize the use of the header by offering discounted cached tokens. We highly encourage users to leverage prompt caching in order to have faster inference and cheaper pricing.

We worked with our heaviest internal users to adopt this header. The result was an increase in input token cache hit ratios from 60% to 80% during peak times. This significantly increases the request throughput that we can handle, while offering better performance for interactive or time-sensitive sessions like OpenCode or AI code reviews.

KV-cache optimization

As we’re serving larger models now, one instance can span multiple GPUs. This means that we had to find an efficient way to share KV cache across GPUs. KV cache is where all the input tensors from prefill (result of prompts in a session) are stored, and initially lives in the VRAM of a GPU. Every GPU has a fixed VRAM size, but if your model instance requires multiple GPUs, there needs to be a way for the KV cache to live across GPUs and talk to each other. To achieve this for Kimi, we leveraged Moonshot AI’s Mooncake Transfer Engine and Mooncake Store.

Mooncake’s Transfer Engine is a high-performance data transfer framework. It works with different Remote Direct Memory Access (RDMA) protocols such as NVLink and NVMe over Fabric, which enables direct memory-to-memory data transfer without involving the CPU. It improves the speed of transferring data across multiple GPU machines, which is particularly important in multi-GPU and multi-node configurations for models.

When paired with LMCache or SGLang HiCache, the cache is shared across all nodes in the cluster, allowing a prefill node to identify and re-use a cache from a previous request that was originally pre-filled on a different node. This eliminates the need for session aware routing within a cluster and allows us to load balance the traffic much more evenly. Mooncake Store also allows us to extend the cache beyond GPU VRAM, and leverage NVMe storage. This extends the time that sessions remain in cache, improving our cache hit ratio and allowing us to handle more traffic and offer better performance to users.

Speculative decoding

LLMs work by predicting the next token in a sequence, based on the tokens that came before it. With a naive implementation, models only predict the next n token, but we can actually make it predict the next n+1, n+2... tokens in a single forward pass of the model. This popular technique is known as speculative decoding, which we’ve written about in a previous post on Workers AI.

With speculative decoding, we leverage a smaller LLM (the draft model) to generate a few candidate tokens for the target model to choose from. The target model then just has to select from a small pool of candidate tokens in a single forward pass. Validating the tokens is faster and less computationally expensive than using the larger target model to generate the tokens. However, quality is still upheld as the target model ultimately has to accept or reject the draft tokens.

In agentic use cases, speculative decoding really shines because of the volume of tool calls and structured outputs that models need to generate. A tool call is largely predictable — you know there will be a name, description, and it’s wrapped in a JSON envelope.

To do this with Kimi K2.5, we leverage NVIDIA’s EAGLE-3 (Extrapolation Algorithm for Greater Language-model Efficiency) draft model. The levers for tuning speculative decoding include the number of future tokens to generate. As a result, we’re able to achieve high-quality inference while speeding up tokens per second throughput.

Infire: our proprietary inference engine

As we announced during Birthday Week in 2025, Cloudflare has a proprietary inference engine, Infire, that makes machine learning models faster. Infire is an inference engine written in Rust, designed to support Cloudflare’s unique challenges with inference given our distributed global network. We’ve extended Infire support for this new class of large language models we are planning to run, which meant we had to build a few new features to make it all work.

Multi-GPU support

Large language models like Kimi K2.5 are over 1 trillion parameters, which is about 560GB of model weights. A typical H100 has about 80GB of VRAM and the model weights need to be loaded in GPU memory in order to run. This means that a model like Kimi K2.5 needs at least 8 H100s in order to load the model into memory and run — and that’s not even including the extra VRAM you would need for KV Cache, which includes your context window.

Since we initially launched Infire, we had to add support for multi-GPU, letting the inference engine run across multiple GPUs in either pipeline-parallel or tensor-parallel modes with expert-parallelism supported as well.

For pipeline parallelism, Infire attempts to properly load balance all stages of the pipeline, in order to prevent the GPUs of one stage from starving while other stages are executing. On the other hand, for tensor parallelism, Infire optimizes for reducing cross-GPU communication, making it as fast as possible. For most models, utilizing both pipeline parallelism and tensor parallelism in tandem provides the best balance of throughput and latency.

Even lower memory overhead

While already having much lower GPU memory overhead than vLLM, we optimized Infire even further, tightening the memory required for internal state like activations. Currently Infire is capable of running Llama 4 Scout on just two H200 GPUs with more than 56 GiB remaining for KV-cache, sufficient for more than 1.2m tokens. Infire is also capable of running Kimi K2.5 on 8 H100 GPUs (yes that is H100), with more than 30 GiB still available for KV-cache. In both cases you would have trouble even booting vLLM in the first place.

Faster cold-starts

While adding multi-GPU support, we identified additional opportunities to improve boot times. Even for the largest models, such as Kimi K2.5, Infire can begin serving requests in under 20 seconds. The load times are only bounded by the drive speed.

Maximizing our hardware for faster throughput

Investing in our proprietary inference engine enables us to maximize our hardware by getting up to 20% higher tokens per second throughput on unconstrained systems, and also enabling us to use lower-end hardware to run the latest models, where it was previously completely infeasible.

The journey doesn’t end

New technologies, research, and models come out on a weekly basis for the machine learning community. We’re continuously optimizing our technology stack in order to provide high-quality, performant inference for our customers while operating our GPUs efficiently. If these sound like interesting challenges for you – we’re hiring!

Artifacts: versioned storage that speaks Git

Dillon Mulroy — Thu, 16 Apr 2026 13:01:00 GMT

Agents have changed how we think about source control, file systems, and persisting state. Developers and agents are generating more code than ever — more code will be written over the next 5 years than in all of programming history — and it’s driven an order-of-magnitude change in the scale of the systems needed to meet this demand. Source control platforms are especially struggling here: they were built to meet the needs of humans, not a 10x change in volume driven by agents who never sleep, can work on several issues at once, and never tire.

We think there’s a need for a new primitive: a distributed, versioned filesystem that’s built for agents first and foremost, and that can serve the types of applications that are being built today.

We’re calling this Artifacts: a versioned file system that speaks Git. You can create repositories programmatically, alongside your agents, sandboxes, Workers, or any other compute paradigm, and connect to it from any regular Git client.

Want to give every agent session a repo? Artifacts can do it. Every sandbox instance? Also Artifacts. Want to create 10,000 forks from a known-good starting point? You guessed it: Artifacts again. Artifacts exposes a REST API and native Workers API for creating repositories, generating credentials, and commits for environments where a Git client isn’t the right fit (i.e. in any serverless function).

Artifacts is available in private beta and we’re aiming to open this up as a public beta by early May.

// Create a repo
const repo = await env.AGENT_REPOS.create(name)
// Pass back the token & remote to your agent
return { repo.remote, repo.token }

# Clone it and use it like any regular git remote
$ git clone https://x:${TOKEN}@123def456abc.artifacts.cloudflare.net/git/repo-13194.git

That’s it. A bare repo, ready to go, created on the fly, that any git client can operate it against.

And if you want to bootstrap an Artifacts repo from an existing git repository so that your agent can work on it independently and push independent changes, you can do that too with .import():

interface Env {
  ARTIFACTS: Artifacts
}

export default {
  async fetch(request: Request, env: Env) {
    // Import from GitHub
    const { remote, token } = await env.ARTIFACTS.import({
      source: {
        url: "https://github.com/cloudflare/workers-sdk",
        branch: "main",
      },
      target: {
        name: "workers-sdk",
      },
    })

    // Get a handle to the imported repo
    const repo = await env.ARTIFACTS.get("workers-sdk")

    // Fork to an isolated, read-only copy
    const fork = await repo.fork("workers-sdk-review", {
      readOnly: true,
    })

    return Response.json({ remote: fork.remote, token: fork.token })
  },
}

Check out the documentation to get started, or if you want to understand how Artifacts is being used, how it was built, and how it works under the hood: read on.

Why Git? What’s a versioned file system?

Agents know Git. It’s deep in the training data of most models. The happy path and the edge cases are well known to agents, and code-optimized models (and/or harnesses) are particularly good at using git.

Further, Git’s data model is not only good for source control, but for anything where you need to track state, time travel, and persist large amounts of small data. Code, config, session prompts and agent history: all of these are things (“objects”) that you often want to store in small chunks (“commits”) and be able to revert or otherwise roll back to (“history”).

We could have invented an entirely new, bespoke protocol… but then you have the bootstrap problem. AI models don’t know it, so you have to distribute skills, or a CLI, or hope that users are plugged into your docs MCP… all of that adds friction. If we can just give agents an authenticated, secure HTTPS Git remote URL and have them operate as if it were a Git repo, though? That turns out to work pretty well. And for non-Git-speaking clients — such as a Cloudflare Worker, a Lambda function, or a Node.js app — we’ve exposed a REST API and (soon) language-specific SDKs. Those clients can also use isomorphic-git, but in many cases a simpler TypeScript API can reduce the API surface needed.

Not just for source control

Artifacts’ Git API might make you think it’s just for source control, but it turns out that the Git API and data model is a powerful way to persist state in a way that allows you to fork, time-travel and diff state for any data.

Inside Cloudflare, we’re using Artifacts for our internal agents: automatically persisting the current state of the filesystem and the session history in a per-session Artifacts repo. This enables us to:

Persist sandbox state without having to provision (and keep) block storage around.
Share sessions with others and allow them to time-travel back through both session (prompt) state and file state, irrespective of whether there were commits to the “actual” repository (source control).
And the best: fork a session from any point, allowing our team to share sessions with a co-worker and have them pick it up from them. Debugging something and want another set of eyes? Send a URL and fork it. Want to riff on an API? Have a co-worker fork it and pick up from where you left off.

We’ve also spoken to teams who want to use Artifacts in cases where the Git protocol isn’t a requirement at all, but the semantics (reverting, cloning, diffing) are. Storing per-customer config as part of your product, and want the ability to roll back? Artifacts can be a good representation of this.

We’re excited to see teams explore the non-Git use-cases around Artifacts just as much as the Git-focused ones.

Under the hood

Artifacts are built on top of Durable Objects. The ability to create millions (or tens of millions+) of instances of stateful, isolated compute is inherent to how Durable Objects work today, and that’s exactly what we needed for supporting millions of Git repos per namespace.

Major League Baseball (for live game fan-out), Confluence Whiteboards, and our own Agents SDK use Durable Objects under the hood at significant scale, and so we’re building this on a primitive that we’ve had in production for some time.

What we did need, however, was a Git server implementation that could run on Cloudflare Workers. It needed to be small, as complete as possible, extensible (notes, LFS), and efficient. So we built one in Zig, and compiled it to Wasm.

Why did we use Zig? Three reasons:

The entire git protocol engine is written in pure Zig (no libc), compiled to a ~100KB WASM binary (with room for optimization!). It implements SHA-1, zlib inflate/deflate, delta encoding/decoding, pack parsing, and the full git smart HTTP protocol — all from scratch, with zero external dependencies other than the standard library.
Zig gives us manual control over memory allocation which is important in constrained environments like Durable Objects. The Zig Build System lets us easily share code between the WASM runtime (production) and native builds (testing against libgit2 for correctness verification).
The WASM module communicates with the JS host via a thin callback interface: 11 host-imported functions for storage operations (host_get_object, host_put_object, etc.) and one for streaming output (host_emit_bytes). The WASM side is fully testable in isolation.

Under the hood, Artifacts also uses R2 (for snapshots) and KV (for tracking auth tokens):

^{How Artifacts works (Workers, Durable Objects, and WebAssembly)}

A Worker acts as the front-end, handling authentication & authorization, key metrics (errors, latency) and looking up each Artifacts repository (Durable Object) on the fly.

Specifically:

Files are stored in the underlying Durable Object’s SQLite database.
- Durable Object storage has a 2MB max row size, so large Git objects are chunked and stored across multiple rows.
- We make use of the sync KV API (state.storage.kv) which is backed by SQLite under the hood.
DOs have ~128MB memory limits: this means we can spawn tens of millions of them (they’re fast and light) but have to work within those limits.
- We make heavy use of streaming in both the fetch and push paths, directly returning a `ReadableStream` built from the raw WASM output chunks.
- We avoid calculating our own git deltas, instead, the raw deltas and base hashes are persisted alongside the resolved object. On fetch, if the requesting client already has the base object, Zig emits the delta instead of the full object, which saves bandwidth and memory.
Support for both v1 and v2 of the git protocol.
- We support capabilities including ls-refs, shallow clones (deepen, deepen-since, deepen-relative), and incremental fetch with have/want negotiation.
- We have an extensive test suite with conformance tests against git clients and verification tests against a libgit2 server designed to validate protocol support.

On top of this, we have native support for git-notes. Artifacts is designed to be agent-first, and notes enable agents to add notes (metadata) to Git objects. This includes prompts, agent attribution and other metadata that can be read/written from the repo without mutating the objects themselves.

Big repos, big problems? Meet ArtifactFS.

Most repos aren’t that big, and Git is designed to be extremely efficient in terms of storage: most repositories take only a few seconds to clone at most, and that’s dominated by network setup time, auth, and checksumming. In most agent or sandbox scenarios, that’s workable: just clone the repo as the sandbox starts and get to work.

But what about a multi-GB repository and/or repos with millions of objects? How can we clone that repo quickly, without blocking the agent’s ability to get to work for minutes and consuming compute?

A popular web framework (at 2.4GB and with a long history!) takes close to 2 minutes to clone. A shallow clone is faster, but not enough to get down to single digit seconds, and we don’t always want to omit history (agents find it useful).

Can we get large repos down to ~10-15 seconds so that our agent can get to work? Well, yes: with a few tricks.

As part of our launch of Artifacts, we’re open-sourcing ArtifactFS, a filesystem driver designed to mount large Git repos as quickly as possible, hydrating file contents on the fly instead of blocking on the initial clone. It's ideal for agents, sandboxes, containers and other use cases where startup time is critical. If you can shave ~90-100 seconds off your sandbox startup time for every large repo, and you’re running 10,000 of those sandbox jobs per month: that’s 2,778 sandbox hours saved.

You can think of ArtifactFS as “Git clone but async”:

ArtifactFS runs a blobless clone of a git repository: it fetches the file tree and refs, but not the file contents. It can do that during sandbox startup, which then allows your agent harness to get to work.
In the background, it starts to hydrate (download) file contents concurrently via a lightweight daemon.
It prioritizes files that agents typically want to operate on first: package manifests (package.json, go.mod), configuration files, and code, deprioritizing binary blobs (images, executables and other non-text-files) where possible so that agents can scan the file tree as the files themselves are hydrated.
If a file isn’t fully hydrated when the agent tries to read it, the read will block until it has.

The filesystem does not attempt to “sync” files back to the remote repository: with thousands or millions of objects, that’s typically very slow, and since we’re speaking git, we don’t need to. Your agent just needs to commit and push, as it would with any repository. No new APIs to learn.

Importantly, ArtifactFS works with any Git remote, not just our own Artifacts. If you’re cloning large repos from GitHub, GitLab, or self-hosted Git infrastructure: you can still use ArtifactFS.

What’s coming?

Our release today is just the beta, and we’re already working on a number of features that you’ll see land over the next few weeks:

Expanding the available metrics we expose. Today we’re shipping metrics for key operations counts per namespace, repo and stored bytes per repo, so that managing millions of Artifacts isn’t toilsome.
Support for Event Subscriptions for repo-level events so that we can emit events on pushes, pulls, clones, and forks to any repository within a namespace. This will also allow you to consume events, write webhooks, and use those events to notify end-users, drive lifecycle events within your products, and/or run post-push jobs (like CI/CD).
Native TypeScript, Go and Python client SDKs for interacting with the Artifacts API
Repo-level search APIs and namespace-wide search APIs, e.g. “find all the repos with a package.json file”.

We’re also planning an API for Workers Builds, allowing you to run CI/CD jobs on any agent-driven workflow.

What will it cost me?

We’re still early with Artifacts, but want our pricing to work at agent-scale: it needs to be cost effective to have millions of repos, unused (or rarely used) repos shouldn’t be a drag, and our pricing should match the massively-single-tenant nature of agents.

You also shouldn’t have to think about whether a repo is going to be used or not, whether it’s hot or cold, and/or whether an agent is going to wake it up. We’ll charge you for the storage you consume and the operations (e.g. clones, forks, pushes & pulls) against each repo.

	$/unit	Included
Operations	$0.15 per 1,000 operations	First 10k included (per month)
Storage	$0.50/GB-mo	First 1GB included.

Big, busy repos will cost more than smaller, less-often-used repos, whether you have 1,000, 100,000, or 10 million of them.

We’ll also be bringing Artifacts to the Workers Free plan (with some fair limits) as the beta progresses, and we’ll provide updates throughout the beta should this pricing change and ahead of billing any usage.

Where do I start?

Artifacts is launching in private beta, and we expect public beta to be ready in early May (2026, to be clear!). We’ll be allowing customers in progressively over the next few weeks, and you can register interest for the private beta directly.

In the meantime, you can learn more about Artifacts by:

Reading the getting started guide in the docs.
Visiting the Cloudflare dashboard (Build > Storage & Databases > Artifacts)
Reading through the REST API examples
Learning more about how Artifacts works under the hood

Follow the changelog to track the beta as it progresses.

Watch on Cloudflare TV

Deploy Postgres and MySQL databases with PlanetScale + Workers

Vy Ton — Thu, 16 Apr 2026 13:00:22 GMT

Cloudflare announced our PlanetScale partnership last September to give Cloudflare Workers direct access to Postgres and MySQL databases for fast, full-stack applications.

Soon, we’re bringing our technologies even closer: you’ll be able to create PlanetScale Postgres and MySQL databases directly from the Cloudflare dashboard and API, and have them billed to your Cloudflare account.

You choose the data storage that fits your Worker application needs and keep a single system for billing as a Cloudflare self-serve or enterprise customer. Cloudflare credits like those given in our startup program or Cloudflare committed spend can be used towards PlanetScale databases.

Postgres & MySQL for Workers

SQL relational databases like Postgres and MySQL are a foundation of modern applications. In particular, Postgres has risen in developer popularity with its rich tooling ecosystem (ORMs, GUIs, etc) and extensions like pgvector for building vector search in AI-driven applications. Postgres is the default choice for most developers who need a powerful, flexible, and scalable database to power their applications.

You can already connect your PlanetScale account and create Postgres databases directly from the Cloudflare dashboard for your Workers. Starting next month, a new Cloudflare subscription will bill for new PlanetScale databases direct to your Cloudflare account as a self-serve or enterprise user.

^{How to create PlanetScale databases via}^{Cloudflare dashboard}^{after your PlanetScale account is connected. Cloudflare billing is coming next month.}

With our built-in integration, PlanetScale databases automatically work with Workers using Hyperdrive, our database connectivity service. Hyperdrive service manages database connection pools and query caching to make database queries fast and reliable. You just add a binding to your Worker’s config file:

// wrangler.jsonc file
{
  "hyperdrive": [
    {
      "binding": "DATABASE",
      "id": 
    }
  ]
}

And start running SQL queries via your Worker with your Postgres client of choice:

import { Client } from "pg";

export default {
  async fetch(request, env, ctx) {
   
    const client = new Client({ connectionString: env.DATABASE.connectionString });
    await client.connect();

    const result = await client.query("SELECT * FROM pg_tables");
    ...
}

PlanetScale developer experience

PlanetScale was the obvious choice to provide to the Workers community due to it’s unrivaled performance and reliability. Developers can choose from two of the most popular relational databases with Postgres or Vitess MySQL. PlanetScale matches how Cloudflare treats performance and reliability as key features of a developer platform. And with features like query insights and agent driven workflows for improving SQL query performance and branching for deploying code safely, including database changes, the PlanetScale database developer experience is first-class.

Cloudflare users get the exact same PlanetScale database developer experience. Your PlanetScale databases can be deployed directly from Cloudflare with connections managed via Hyperdrive, which already makes your existing regional databases fast with global Workers. This means access to the same PlanetScale database clusters at standard PlanetScale pricing with all features included like query insights and detailed breakdown of usage and costs.

^{A single node on PlanetScale Postgres starts at}^$5/month^.

Workers placement

With centralized databases, Workers can run right next to your primary database to reduce latency with an explicit placement hint. By default, Workers execute closest to a user request, which adds network latency when querying a central database especially for multiple queries. Instead, you can configure your Worker to execute in the closest Cloudflare data center to your PlanetScale database. In the future, Cloudflare can automatically set a placement hint based on the location of your PlanetScale database and reduce network latency to single digit milliseconds.

{
  "placement": {
    "region": "aws:us-east-1"
  }
}

Coming soon

You can deploy a PlanetScale Postgres database or connect an existing PlanetScale database to Workers today via the Cloudflare dashboard. Everything today is still billed via PlanetScale.

Launching next month, new PlanetScale databases can be billed to your Cloudflare account.

We are building more with our PlanetScale partners, such as Cloudflare API integration, so tell us what you’d like to see next.

Cloudflare Email Service: now in public beta. Ready for your agents

Thomas Gauvin — Thu, 16 Apr 2026 06:00:00 GMT

Email is the most accessible interface in the world. It is ubiquitous. There’s no need for a custom chat application, no custom SDK for each channel. Everyone already has an email address, which means everyone can already interact with your application or agent. And your agent can interact with anyone.

If you are building an application, you already rely on email for signups, notifications, and invoices. Increasingly, it is not just your application logic that needs this channel. Your agents do, too. During our private beta, we talked to developers who are building exactly this: customer support agents, invoice processing pipelines, account verification flows, multi-agent workflows. All built on top of email. The pattern is clear: email is becoming a core interface for agents, and developers need infrastructure purpose-built for it.

Cloudflare Email Service is that piece. With Email Routing, you can receive email to your application or agent. With Email Sending, you can reply to emails or send outbounds to notify your users when your agents are done doing work. And with the rest of the developer platform, you can build a full email client and Agents SDK onEmail hook as native functionality.

Today, as part of Agents Week, Cloudflare Email Service is entering public beta, allowing any application and any agent to send emails. We are also completing the toolkit for building email-native agents:

Email Sending binding, available from your Workers and the Agents SDK
A new Email MCP server
Wrangler CLI email commands
Skills for coding agents
An open-source agentic inbox reference app

Email Sending: now in public beta

Email Sending graduates from private beta to public beta today. You can now send transactional emails directly from Workers with a native Workers binding — no API keys, no secrets management.

export default {
  async fetch(request, env, ctx) {
    await env.EMAIL.send({
      to: "user@example.com",
      from: "notifications@your-domain.com",
      subject: "Your order has shipped",
      text: "Your order #1234 has shipped and is on its way."
    });
    return new Response("Email sent");
  },
};

Or send from any platform, any language, using the REST API and our TypeScript, Python, and Go SDKs:

curl "https://api.cloudflare.com/client/v4/accounts/{account_id}/email-service/send" \
   --header "Authorization: Bearer " \
   --header "Content-Type: application/json" \
   --data '{
     "to": "user@example.com",
     "from": "notifications@your-domain.com",
     "subject": "Your order has shipped",
     "text": "Your order #1234 has shipped and is on its way."
   }'

Sending email that actually reaches inboxes usually means wrestling with SPF, DKIM, and DMARC records. When you add your domain to Email Service, we configure all of it automatically. Your emails are authenticated and delivered, not flagged as spam. And because Email Service is a global service built on Cloudflare's network, your emails are delivered with low latency anywhere in the world.

Combined with Email Routing, which has been free and available for years, you now have complete bidirectional email within a single platform. Receive an email, process it in a Worker, and reply, all without leaving Cloudflare.

For the full deep dive on Email Sending, refer to our Birthday Week announcement. The rest of this post describes what Email Service unlocks for agents.

Agents SDK: your agent is email-native

The Agents SDK for building agents on Cloudflare already has a first-class onEmail hook for receiving and processing inbound email. But until now, your agent could only reply synchronously, or send emails to members of your Cloudflare account.

With Email Sending, that constraint is gone. This is the difference between a chatbot and an agent.

^{Email agents receive a message, orchestrate work across the platform, and respond asynchronously.}

A chatbot responds in the moment or not at all. An agent thinks, acts, and communicates on its own timeline. With Email Sending, your agent can receive a message, spend an hour processing data, check three other systems, and then reply with a complete answer. It can schedule follow-ups. It can escalate when it detects an edge case. It can operate independently. In other words: it can actually do work, not just answer questions.

Here's what a support agent looks like with the full pipeline — receive, persist, and reply:

import { Agent, routeAgentEmail } from "agents";
import { createAddressBasedEmailResolver, type AgentEmail } from "agents/email";
import PostalMime from "postal-mime";

export class SupportAgent extends Agent {
  async onEmail(email: AgentEmail) {
    const raw = await email.getRaw();
    const parsed = await PostalMime.parse(raw);

   // Persist in agent state
    this.setState({
      ...this.state,
      ticket: { from: email.from, subject: parsed.subject, body: parsed.text, messageId: parsed.messageId },
    });

    // Kick off long running background agent task 
    // Or place a message on a Queue to be handled by another Worker

    // Reply here or in other Worker handler, like a Queue handler
    await this.sendEmail({
      binding: this.env.EMAIL,
      fromName: "Support Agent",
      from: "support@yourdomain.com",
      to: this.state.ticket.from,
      inReplyTo: this.state.ticket.messageId,
      subject: `Re: ${this.state.ticket.subject}`,
      text: `Thanks for reaching out. We received your message about "${this.state.ticket.subject}" and will follow up shortly.`
    });
  }
}

export default {
  async email(message, env) {
    await routeAgentEmail(message, env, {
      resolver: createAddressBasedEmailResolver("SupportAgent"),
    });
  },
} satisfies ExportedHandler;

If you're new to the Agents SDK's email capabilities, here's what's happening under the hood.

Each agent gets its own identity from a single domain. The address-based resolver routes support@yourdomain.com to a "support" agent instance, sales@yourdomain.com to a "sales" instance, and so on. You don't need to provision separate inboxes — the routing is built into the address. You can even use sub-addressing (NotificationAgent+user123@yourdomain.com) to route to different agent namespaces and instances.

State persists across emails. Because agents are backed by Durable Objects, calling this.setState() means your agent remembers conversation history, contact information, and context across sessions. The inbox becomes the agent's memory, without needing a separate database or vector store.

Secure reply routing is built in. When your agent sends an email and expects a reply, you can sign the routing headers with HMAC-SHA256 so that replies route back to the exact agent instance that sent the original message. This prevents attackers from forging headers to route emails to arbitrary agent instances — a security concern that most "email for agents" solutions haven't addressed.

This is the complete email agent pipeline that teams are building from scratch elsewhere: receive email, parse it, classify it, persist state, kick off async workflows, reply or escalate — all within a single Agent class, deployed globally on Cloudflare's network.

Email tooling for your agents: MCP server, Wrangler CLI, and skills

Email Service isn't only for agents running on Cloudflare. Agents run everywhere, whether it’s coding agents like Claude Code, Cursor, or Copilot running locally or in remote environments, or production agents running in containers or external clouds. They all need to send email from those environments. We're shipping three integrations that make Email Service accessible to any agent, regardless of where it runs.

Email is now available through the Cloudflare MCP server, the same Code Mode-powered server that gives agents access to the entire Cloudflare API. With this MCP server, your agent can discover and call the Email endpoints to send and configure emails. You can send an email with a simple prompt:

"Send me a notification email at hello@example.com from my staging domain when the build completes"

For agents running on a computer or a sandbox with bash access, the Wrangler CLI solves the MCP context window problem that we discussed in the Code Mode blog post — tool definitions can consume tens of thousands of tokens before your agent even starts processing a single message. With Wrangler, your agent starts with near-zero context overhead and discovers capabilities on demand through `--help` commands. Here is how your agent can send an email via Wrangler:

wrangler email send \
  --to "teammate@example.com" \
  --from "agent@your-domain.com" \
  --subject "Build completed" \
  --text "The build passed. Deployed to staging."

Regardless of whether you give your agent the Cloudflare MCP or the Wrangler CLI, your agent will be able to now send emails on your behalf with just a prompt.

Skills

We are also publishing a Cloudflare Email Service skill. It gives your agents complete guidance: configuring the Workers binding, sending emails via the REST API or SDKs, handling inbound email with Email Routing configuration, building with Agents SDK, and managing email through Wrangler CLI or MCP. It also covers deliverability best practices and how to craft good transactional emails that land in inboxes rather than spam. Drop it into your project and your coding agent has everything needed to build production-ready email on Cloudflare.

Open-sourcing tools for email agents

During the private beta, we also experimented with email agents. It became clear that you often want to keep the human-in-the-loop element to review emails and see what the agent is doing.The best way to do that is to have a fully featured email client with agent automations built-in.

That’s why we built Agentic Inbox: a reference application with full conversation threading, email rendering, receiving and storing emails and their attachments, and automatically replying to emails. It includes a dedicated MCP server built-in, so external agents can draft emails for your review before sending from your agentic-inbox.

We’re open-sourcing Agentic Inbox as a reference application for how to build a full email application using Email Routing for inbound, Email Sending for outbound, Workers AI for classification, R2 for attachments, and Agents SDK for stateful agent logic. You can deploy it today to get a full inbox, email client and agent for your emails, with the click of a button.

We want email agent tooling to be composable and reusable. Rather than every team rebuilding the same inbound-classify-reply pipeline, start with this reference application. Fork it, extend it, use it as a starting point for your own email agents that fit your workflows.

Try it out today

Email is where the world’s most important workflows live, but for agents, it has often been a difficult channel to reach. With Email Sending now in public beta, Cloudflare Email Service becomes a complete platform for bidirectional communication, making the inbox a first-class interface for your agents.

Whether you’re building a support agent that meets customers in their inbox or a background process that keeps your team updated in real time, your agents now have a seamless way to communicate on a global scale. The inbox is no longer a silo. Now it’s one more place for your agents to be helpful.

Try out Email Sending in the Cloudflare Dashboard
Read the Email Service documentation
Follow the Agents SDK email docs
Check out the Email Service MCP server and Skills
Deploy the open-source reference app

Watch on Cloudflare TV

Project Think: building the next generation of AI agents on Cloudflare

Sunil Pai — Wed, 15 Apr 2026 13:01:00 GMT

Today, we're introducing Project Think: the next generation of the Agents SDK. Project Think is a set of new primitives for building long-running agents (durable execution, sub-agents, sandboxed code execution, persistent sessions) and an opinionated base class that wires them all together. Use the primitives to build exactly what you need, or use the base class to get started fast.

Something happened earlier this year that changed how we think about AI. Tools like Pi, OpenClaw, Claude Code, and Codex proved a simple but powerful idea: give an LLM the ability to read files, write code, execute it, and remember what it learned, and you get something that looks less like a developer tool and more like a general-purpose assistant.

These coding agents aren't just writing code anymore. People are using them to manage calendars, analyze datasets, negotiate purchases, file taxes, and automate entire business workflows. The pattern is always the same: the agent reads context, reasons about it, writes code to take action, observes the result, and iterates. Code is the universal medium of action.

Our team has been using these coding agents every day. And we kept running into the same walls:

They only run on your laptop or an expensive VPS: there's no sharing, no collaboration, no handoff between devices.
They're expensive when idle: a fixed monthly cost whether the agent is working or not. Scale that to a team, or a company, and it adds up fast.
They require management and manual setup: installing dependencies, managing updates, configuring identity and secrets.

And there's a deeper structural issue. Traditional applications serve many users from one instance. As mentioned in our Welcome to Agents Week post, agents are one-to-one. Each agent is a unique instance, serving one user, running one task. A restaurant has a menu and a kitchen optimized to churn out dishes at volume. An agent is more like a personal chef: different ingredients, different techniques, different tools every time.

That fundamentally changes the scaling math. If a hundred million knowledge workers each use an agentic assistant at even modest concurrency, you need capacity for tens of millions of simultaneous sessions. At current per-container costs, that's unsustainable. We need a different foundation.

That's what we've been building.

Introducing Project Think

Project Think ships a set of new primitives for the Agents SDK:

Durable execution with fibers: crash recovery, checkpointing, automatic keepalive
Sub-agents: isolated child agents with their own SQLite and typed RPC
Persistent sessions: tree-structured messages, forking, compaction, full-text search
Sandboxed code execution: Dynamic Workers, codemode, runtime npm resolution
The execution ladder: workspace, isolate, npm, browser, sandbox
Self-authored extensions: agents that write their own tools at runtime

Each of these is usable directly with the Agent base class. Build exactly what you need with the primitives, or use the Think base class to get started fast. Let's look at what each one does.

Long-running agents

Agents, as they exist today, are ephemeral. They run for a session, tied to a single process or device, and then they are gone. A coding agent that dies when your laptop sleeps, that’s a tool. An agent that persists — that can wake up on demand, continue work after interruptions, and carry forward the state without depending on your local runtime — that starts to look like infrastructure. And it changes the scaling model for agents completely.

The Agents SDK builds on Durable Objects to give every agent an identity, persistent state, and the ability to wake on message. This is the actor model: each agent is an addressable entity with its own SQLite database. It consumes zero compute when hibernated. When something happens (an HTTP request, a WebSocket message, a scheduled alarm, an inbound email) the platform wakes the agent, loads its state, and hands it the event. The agent does its work, then goes back to sleep.

	VMs / Containers	Durable Objects
Idle cost	Full compute cost, always	Zero (hibernated)
Scaling	Provision and manage capacity	Automatic, per-agent
State	External database required	Built-in SQLite
Recovery	You build it (process managers, health checks)	Platform restarts, state survives
Identity / routing	You build it (load balancers, sticky sessions)	Built-in (name → agent)
10,000 agents, each active 1% of the time	10,000 always-on instances	~100 active at any moment

This changes the economics of running agents at scale. Instead of "one expensive agent per power user," you can build "one agent per customer" or "one agent per task" or "one agent per email thread." The marginal cost of spawning a new agent is effectively zero.

Surviving crashes: durable execution with fibers

An LLM call takes 30 seconds. A multi-turn agent loop can run for much longer. At any point during that window, the execution environment can vanish: a deploy, a platform restart, hitting resource limits. The upstream connection to the model provider is severed permanently, in-memory state is lost, and connected clients see the stream stop with no explanation.

runFiber() solves this. A fiber is a durable function invocation: registered in SQLite before execution begins, checkpointable at any point via stash(), and recoverable on restart via onFiberRecovered.

import { Agent } from "agents";

export class ResearchAgent extends Agent {
  async startResearch(topic: string) {
    void this.runFiber("research", async (ctx) => {
      const findings = [];

      for (let i = 0; i < 10; i++) {
        const result = await this.callLLM(`Research step ${i}: ${topic}`);
        findings.push(result);

        // Checkpoint: if evicted, we resume from here
        ctx.stash({ findings, step: i, topic });

        this.broadcast({ type: "progress", step: i });
      }

      return { findings };
    });
  }

  async onFiberRecovered(ctx) {
    if (ctx.name === "research" && ctx.snapshot) {
      const { topic } = ctx.snapshot;
      await this.startResearch(topic);
    }
  }
}

The SDK keeps the agent alive automatically during fiber execution, no special configuration needed. For work measured in minutes, keepAlive() / keepAliveWhile() prevents eviction during active work. For longer operations (CI pipelines, design reviews, video generation) the agent starts the work, persists the job ID, hibernates, and wakes on callback.

Delegating work: sub-agents via Facets

A single agent shouldn't do everything itself. Sub-agents are child Durable Objects colocated with the parent via Facets, each with their own isolated SQLite and execution context:

import { Agent } from "agents";

export class ResearchAgent extends Agent {
  async search(query: string) { /* ... */ }
}

export class ReviewAgent extends Agent {
  async analyze(query: string) { /* ... */ }
}

export class Orchestrator extends Agent {
  async handleTask(task: string) {
    const researcher = await this.subAgent(ResearchAgent, "research");
    const reviewer = await this.subAgent(ReviewAgent, "review");

    const [research, review] = await Promise.all([
      researcher.search(task),
      reviewer.analyze(task)
    ]);

    return this.synthesize(research, review);
  }
}

Sub-agents are isolated at the storage level. Each one gets its own SQLite database, and there’s no implicit sharing of data between them. This is enforced by the runtime where sub-agent RPC latency is a function call. TypeScript catches misuse at compile time.

Conversations that persist: the Session API

Agents that run for days or weeks need more than the typical flat list of messages. The experimental Session API models this explicitly. Available on the Agent base class, conversations are stored as trees, where each message has a parent_id. This enables forking (explore an alternative without losing the original path), non-destructive compaction (summarize older messages rather than deleting them), and full-text search across conversation history via FTS5.

import { Agent } from "agents";
import { Session, SessionManager } from "agents/experimental/memory/session";

export class MyAgent extends Agent {
  sessions = SessionManager.create(this);

  async onStart() {
    const session = this.sessions.create("main");
    const history = session.getHistory();
    const forked = this.sessions.fork(session.id, messageId, "alternative-approach");
  }
}

Session is usable directly with Agent, and it's the storage layer that the Think base class builds on.

From tool calls to code execution

Conventional tool-calling has an awkward shape. The model calls a tool, pulls the result back through the context window, calls another tool, pulls that back, and so on. As the tool surface grows, this gets both expensive and clumsy. A hundred files means a hundred round-trips through the model.

But models are better at writing code to use a system than they are at playing the tool-calling game. This is the insight behind @cloudflare/codemode: instead of sequential tool calls, the LLM writes a single program that handles the entire task.

// The LLM writes this. It runs in a sandboxed Dynamic Worker.
const files = await tools.find({ pattern: "**/*.ts" });
const results = [];
for (const file of files) {
  const content = await tools.read({ path: file });
  if (content.includes("TODO")) {
    results.push({ file, todos: content.match(/\/\/ TODO:.*/g) });
  }
}
return results;

Instead of 100 round-trips to the model, you just run a single program. This leads to fewer tokens used, faster execution, and better results. The Cloudflare API MCP server demonstrates this at scale. We expose only two tools (search() and execute()), which consume ~1,000 tokens, vs. ~1.17 million tokens for the naive tool-per-endpoint equivalent. This is a 99.9% reduction.

The missing primitive: safe sandboxes

Once you accept that models should write code on behalf of users, the question becomes: where does that code run? Not eventually, not after a product team turns it into a roadmap item. Right now, for this user, against this system, with tightly defined permissions.

Dynamic Workers are that sandbox. A fresh V8 isolate spun up at runtime, in milliseconds, with a few megabytes of memory. That's roughly 100x faster and up to 100x more memory-efficient than a container. You can start a new one for every single request, run a snippet of code, and throw it away.

The critical design choice is the capability model. Instead of starting with a general-purpose machine and trying to constrain it, Dynamic Workers begin with almost no ambient authority (globalOutbound: null, no network access) and the developer grants capabilities explicitly, resource by resource, through bindings. We go from asking "how do we stop this thing from doing too much?" to "what exactly do we want this thing to be able to do?"

This is the right question for agent infrastructure.

The execution ladder

This capability model leads naturally to a spectrum of compute environments, an execution ladder that the agent escalates through as needed:

Tier 0 is the Workspace, a durable virtual filesystem backed by SQLite and R2. Read, write, edit, search, grep, diff. Powered by @cloudflare/shell.

Tier 1 is a Dynamic Worker: LLM-generated JavaScript running in a sandboxed isolate with no network access. Powered by @cloudflare/codemode.

Tier 2 adds npm. @cloudflare/worker-bundler fetches packages from the registry, bundles them with esbuild, and loads the result into the Dynamic Worker. The agent writes import { z } from "zod" and it just works.

Tier 3 is a headless browser via Cloudflare Browser Run. Navigate, click, extract, screenshot. Useful when the service doesn't support agents yet via MCP or APIs.

Tier 4 is a Cloudflare Sandbox configured with your toolchains, repos, and dependencies: git clone, npm test, cargo build, synced bidirectionally with the Workspace.

The key design principle: the agent should be useful at Tier 0 alone, where each tier is additive. The user can add capabilities as they go.

Building blocks, not a framework

All of these primitives are available as standalone packages. Dynamic Workers, @cloudflare/codemode, @cloudflare/worker-bundler, and @cloudflare/shell (a durable filesystem with tools) are all usable directly with the Agent base class. You can combine them to give any agent a workspace, code execution, and runtime package resolution without adopting an opinionated framework.

The platform

Here's the complete stack for building agents on Cloudflare:

Capability	What it does	Powered by
Per-agent isolation	Every agent is its own world	Durable Objects (DOs)
Zero cost when idle	$0 until the agent wakes up	DO Hibernation
Persistent state	Queryable, transactional storage	DO SQLite
Durable filesystem	Files that survive restarts	Workspace (SQLite + R2)
Sandboxed code execution	Run LLM-generated code safely	Dynamic Workers + `@cloudflare/codemode`
Runtime dependencies	`import * from react` just works	`@cloudflare/worker-bundler`
Web automation	Browse, navigate, fill forms	Browser Run
Full OS access	git, compilers, test runners	Sandboxes
Scheduled execution	Proactive, not just reactive	DO Alarms + Fibers
Real-time streaming	Token-by-token to any client	WebSockets
External tools	Connect to any tool server	MCP
Agent coordination	Typed RPC between agents	Sub-agents (Facets)
Model access	Connect to an LLM to power the agent	AI Gateway + Workers AI (or Bring Your Own Model)

Each of these is a building block. Together, they form something new: a platform where anyone can build, deploy, and run AI agents as capable as the ones running on your local machine today, but serverless, durable, and safe by construction.

The Think base class

Now that you've seen the primitives, here's what happens when you wire them all together.

Think is an opinionated harness that handles the full chat lifecycle: agentic loop, message persistence, streaming, tool execution, stream resumption, and extensions. You focus on what makes your agent unique.

The minimal subclass looks like this:

import { Think } from "@cloudflare/think";
import { createWorkersAI } from "workers-ai-provider";

export class MyAgent extends Think {
  getModel() {
    return createWorkersAI({ binding: this.env.AI })(
      "@cf/moonshotai/kimi-k2.5"
    );
  }
}

That’s effectively all you need to have a working chat agent with streaming, persistence, abort/cancel, error handling, resumable streams, and a built-in workspace filesystem. Deploy with npx wrangler deploy.

Think makes decisions for you. When you need more control, you can override the ones you care about:

Override	Purpose
`getModel()`	Return the `LanguageModel` to use
`getSystemPrompt()`	System prompt
`getTools()`	AI SDK compatible `ToolSet` for the agentic loop
`maxSteps`	Max tool-call rounds per turn
`configureSession()`	Context blocks, compaction, search, skills

Under the hood, Think runs the complete agentic loop on every turn: it assembles the context (base instructions + tool descriptions + skills + memory + conversation history), calls streamText, executes tool calls (with output truncation to prevent context blowup), appends results, loops until the model is done or the step limit is reached. All messages are persisted after each turn.

Lifecycle hooks

Think gives you hooks at every stage of the chat turn, without requiring you to own the whole pipeline:

beforeTurn()
  → streamText()
    → beforeToolCall()
    → afterToolCall()
  → onStepFinish()
→ onChatResponse()

Switch to a lower cost model for follow-up turns, limit the tools it can use, and pass in client-side context on each turn. Also log every tool call to analytics and automatically trigger one more follow-up turn after the model completes, all without replacing onChatMessage.

Persistent memory and long conversations

Think builds on Session API as its storage layer, giving you tree-structured messages with branching built in.

On top of that, it adds persistent memory through context blocks. These are structured sections of the system prompt that the model can read and update over time, and they persist across hibernation.The model sees "MEMORY (Important facts, use set_context to update) [42%, 462/1100 tokens]" and can proactively remember things.

configureSession(session: Session) {
  return session
    .withContext("soul", {
      provider: { get: async () => "You are a helpful coding assistant." }
    })
    .withContext("memory", {
      description: "Important facts learned during conversation.",
      maxTokens: 2000
    })
    .withCachedPrompt();
}

Sessions are flexible. You can run multiple conversations per agent and fork them to try a different direction without losing the original.

As context grows, Think handles limits with non-destructive compaction. Older messages are summarized instead of removed, while the full history remains stored in SQLite.

Search is built in as well. Using FTS5, you can query conversation history within a session or across all the sessions. The agent is also able to search its own past using search_context tool.

The full execution ladder, wired in

Think integrates the entire execution ladder into a single getTools() return:

import { Think } from "@cloudflare/think";
import { createWorkspaceTools } from "@cloudflare/think/tools/workspace";
import { createExecuteTool } from "@cloudflare/think/tools/execute";
import { createBrowserTools } from "@cloudflare/think/tools/browser";
import { createSandboxTools } from "@cloudflare/think/tools/sandbox";
import { createExtensionTools } from "@cloudflare/think/tools/extensions";

export class MyAgent extends Think {
  extensionLoader = this.env.LOADER;

  getModel() {
    /* ... */
  }

  getTools() {
    return {
      execute: createExecuteTool({
        tools: createWorkspaceTools(this.workspace),
        loader: this.env.LOADER
      }),
      ...createBrowserTools(this.env.BROWSER),
      ...createSandboxTools(this.env.SANDBOX), // configured per-agent: toolchains, repos, snapshots
      ...createExtensionTools({ manager: this.extensionManager! }),
      ...this.extensionManager!.getTools()
    };
  }
}

Self-authored extensions

Think takes code execution one step further. An agent can write its own extensions: TypeScript programs that run in Dynamic Workers, declaring permissions for network access and workspace operations.

{
  "name": "github",
  "description": "GitHub integration: PRs, issues, repos",
  "tools": ["create_pr", "list_issues", "review_pr"],
  "permissions": {
    "network": ["api.github.com"],
    "workspace": "read-write"
  }
}

Think's ExtensionManager bundles the extension (optionally with npm deps via @cloudflare/worker-bundler), loads it into a Dynamic Worker, and registers the new tools. The extension persists in DO storage and survives hibernation. The next time the user asks about pull requests, the agent has a github_create_pr tool that didn't exist 30 seconds ago.

This is the kind of self-improvement loop that makes agents genuinely more useful over time. Not through fine-tuning or RLHF, but through code. The agent is able to write new capabilities for itself, all in sandboxed, auditable, and revocable TypeScript.

Sub-agent RPC

Think also works as a sub-agent, called via chat() over RPC from a parent, with streaming events via callback:

const researcher = await this.subAgent(ResearchSession, "research");
const result = await researcher.chat(`Research this: ${task}`, streamRelay);

Each child gets its own conversation tree, memory, tools, and model. The parent doesn't need to know the details.

Getting started

Project Think is experimental. The API surface is stable but will continue to evolve in the coming days and weeks. We're already using it internally to build our own background agent infrastructure, and we're sharing it early so you can build alongside us.

npm install @cloudflare/think agents ai @cloudflare/shell zod workers-ai-provider

// src/server.ts
import { Think } from "@cloudflare/think";
import { createWorkersAI } from "workers-ai-provider";
import { routeAgentRequest } from "agents";

export class MyAgent extends Think {
  getModel() {
    return createWorkersAI({ binding: this.env.AI })(
      "@cf/moonshotai/kimi-k2.5"
    );
  }
}

export default {
  async fetch(request: Request, env: Env) {
    return (
      (await routeAgentRequest(request, env)) ||
      new Response("Not found", { status: 404 })
    );
  }
} satisfies ExportedHandler;

// src/client.tsx
import { useAgent } from "agents/react";
import { useAgentChat } from "@cloudflare/ai-chat/react";

function Chat() {
  const agent = useAgent({ agent: "MyAgent" });
  const { messages, sendMessage, status } = useAgentChat({ agent });
  // Render your chat UI
}

Think speaks the same WebSocket protocol as @cloudflare/ai-chat, so existing UI components work out of the box. If you've built on AIChatAgent, your client code doesn't change.

The third wave

We see three waves of AI agents:

The first wave was chatbots. They were stateless, reactive, and fragile. Every conversation started from scratch with no memory, no tools, and no ability to act. This made them useful for answering questions, but limited them to only answering questions.

The second wave was coding agents. These are stateful, tool-using and far more capable tools like Pi, Claude Code, OpenClaw, and Codex. These agents can read codebases, write code, execute it, and iterate. These proved that an LLM with the right tools is a general-purpose machine, but they run on your laptop, for one user, with no durability guarantees.

Now we are entering the third wave: agents as infrastructure. Durable, distributed, structurally safe, and serverless. These are agents that run on the Internet, survive failures, cost nothing when idle, and enforce security through architecture rather than behavior. Agents that any developer can build and deploy for any number of users.

This is the direction we’re betting on.

The Agents SDK is already powering thousands of production agents. With Project Think and the the primitives it introduces, we're adding the missing pieces to make those agents dramatically more capable: persistent workspaces, sandboxed code execution, durable long-running tasks, structural security, sub-agent coordination, and self-authored extensions.

It's available today in preview. We're building alongside you, and we'd genuinely love to see what you (and your coding agent) create with it.

^{Think is part of the Cloudflare Agents SDK, available as @cloudflare/think. The features described in this post are in preview. APIs may change as we incorporate feedback. Check the}^{documentation}^and^example^{to get started.}

Introducing Agent Lee - a new interface to the Cloudflare stack

Kylie Czajkowski — Wed, 15 Apr 2026 13:00:00 GMT

While there have been small improvements along the way, the interface of technical products has not really changed since the dawn of the Internet. It still remains: clicking five pages deep, cross-referencing logs across tabs, and hunting for hidden toggles.

AI gives us the opportunity to rethink all that. Instead of complexity spread over a sprawling graphical user interface: what if you could describe in plain language what you wanted to achieve?

This is the future — and we’re launching it today. We didn’t want to just put an agent in a dashboard. We wanted to create an entirely new way to interact with our entire platform. Any task, any surface, a single prompt.

Introducing Agent Lee.

Agent Lee is an in-dashboard AI assistant that understands your Cloudflare account.

It can help you with troubleshooting, which, today, is a manual grind. If your Worker starts returning 503s at 02:00 UTC, finding the root cause: be it an R2 bucket, a misconfigured route, or a hidden rate limit, you’re opening half a dozen tabs and hoping you recognize the pattern. Most developers don't have a teammate who knows the entire platform standing over their shoulder at 2 a.m. Agent Lee does.

But it won’t just troubleshoot for you at 2 a.m. Agent Lee will also fix the problem for you on the spot.

Agent Lee has been running in an active beta during which it has served over 18,000 daily users, executing nearly a quarter of a million tool calls per day. While we are confident in its current capabilities and success in production, this is a system we are continuously developing. As it remains in beta, you may encounter unexpected limitations or edge cases as we refine its performance. We encourage you to use the feedback form below to help us make it better every day.

What Agent Lee can do

Agent Lee is built directly into the dashboard and understands the resources in your account. It knows your Workers, your zones, your DNS configuration, your error rates. The knowledge that today lives across six tabs and two browser windows will now live in one place, and you can talk to it.

With natural language, you can use it to:

Answer questions about your account: "Show me the top 5 error messages on my Worker."
Debug an issue: "I can't access my site with the www prefix."
Apply a change: "Enable Access for my domain."
Deploy a resource: "Create a new R2 bucket for my photos and connect it to my Worker."

Instead of switching between products, you describe what you want to do, and Agent Lee helps you get there with instructions and visualizations. It retrieves context, uses the right tools, and creates dynamic visualizations based on the types of questions you ask. Ask what your error rate looks like over the last 24 hours, and it renders a chart inline, pulling from your actual traffic, not sending you to a separate Analytics page.

Agent Lee isn't answering FAQ questions — it's doing real work, against real accounts, at scale. Today, Agent Lee serves ~18,000 daily users, executing ~250k tool calls per day across DNS, Workers, SSL/TLS, R2, Registrar, Cache, Cloudflare Tunnel, API Shield, and more.

How we built it

Codemode

Rather than presenting MCP tool definitions directly to the model, Agent Lee uses Codemode to convert the tools into a TypeScript API and asks the model to write code that calls it instead.

This works better for a couple of reasons. LLMs have seen a huge amount of real-world TypeScript but very few tool call examples, so they're more accurate when working in code. For multi-step tasks, the model can also chain calls together in a single script and return only the final result, ultimately skipping the round-trips.

The generated code is sent to an upstream Cloudflare MCP server for sandboxed execution, but it goes through a Durable Object that acts as a credentialed proxy. Before any call goes out, the DO classifies the generated code as read or write by inspecting the method and body. Read operations are proxied directly. Write operations are blocked until you explicitly approve them through the elicitation gate. API keys are never present in the generated code — they're held inside the DO and injected server-side when the upstream call is made. The security boundary isn't just a sandbox that gets thrown away; it's a permission architecture that structurally prevents writes from happening without your approval.

The MCP permission system

Agent Lee connects to Cloudflare's own MCP server, which exposes two tools: a search tool for querying API endpoints and an execute tool for writing code that performs API requests. This is the surface through which Agent Lee reads your account and, when you approve, writes to it.

Write operations go through an elicitation system that surfaces the approval step before any code executes. Agent Lee cannot skip this step. The permission model is the enforcement layer, and the confirmation prompt you see is not a UX courtesy. It's the gate.

Built on the same stack you can use

Every primitive Agent Lee is built on is available to all our customers: Agents SDK, Workers AI, Durable Objects, and the same MCP infrastructure available to any Cloudflare developer. We didn't build internal tools that aren't available to you — instead we built it with the same Cloudflare lego blocks that you have access to.

Building Agent Lee on our own primitives wasn't just a design principle. It was the fastest way to find out what works and what doesn't. We built this in production, with real users, against real accounts. That means every limitation we hit is a limitation we can fix in the platform. Every pattern that works is one we can make easier for the next team that builds on top of it.

These are not opinions. They're what quarter of a million tool calls across 18,000 users a day are telling us.

Generative UI

Interacting with a platform should feel like collaborating with an expert. Conversations should transcend simple text. With Agent Lee, as your dialogue evolves, the platform dynamically generates UI components alongside textual responses to provide a richer, more actionable experience.

For example, if you ask about website traffic trends for the month, you won’t just get a paragraph of numbers. Agent Lee will render an interactive line graph, allowing you to visualize peaks and troughs in activity at a glance.

To give you full creative control, every conversation is accompanied within an adaptive grid. Here you can click and drag across the grid to carve out space for new UI blocks, then simply describe what you want to see and let the agent handle the heavy lifting.

Today, we support a diverse library of visual blocks, including dynamic tables, interactive charts, architecture maps, and more. By blending the flexibility of natural language with the clarity of structured UI, Agent Lee transforms your chat history into a living dashboard.

Measuring quality and safety

An agent that can take action on your account needs to be reliable and secure. Elicitations allow agentic systems to actively solicit information, preferences, or approvals from users or other systems mid-execution. When Agent Lee needs to take non-read actions on a user's behalf we use elicitations by requiring an explicit approval action in the user interface. These guardrails allow Agent Lee to truly be a partner alongside you in managing your resource safely.

In addition to safety, we continuously measure quality.

Evals to measure conversation success rate and information accuracy.
Feedback signals from user interactions (thumbs up / thumbs down).
Tool call execution success rate and hallucination scorers.
Per-product breakdown of conversation performance.

These systems help us improve Agent Lee over time while keeping users in control.

Our vision ahead

Agent Lee in the dashboard is only the beginning.

The bigger vision is Agent Lee as the interface to the entire Cloudflare platform — from anywhere. The dashboard today, the CLI next, your phone when you're on the go. The surface you use shouldn't matter. You should be able to describe what you need and have it done, regardless of where you are.

From there, Agent Lee gets proactive. Rather than waiting to be asked, it watches what matters to you, your Workers, your traffic, your error thresholds and reaches out when something warrants attention. An agent that only responds is useful. One that notices things first is something different.

Underlying all of this is context. Agent Lee already knows your account configuration. Over time, it will know more, what you've asked before, what page you're on, what you were debugging last week. That accumulated context is what makes a platform feel less like a tool and more like a collaborator.

We're not there yet. Agent Lee today is the first step, running in production, doing real work at scale. The architecture is built to get to the rest.

Try it out

Agent Lee is available in beta for Free plan users. Log in to your Cloudflare dashboard and click Ask AI in the upper right corner to get started.

We'd love to know what you build and what you’d like to see in Agent Lee. Please share your feedback here.

Watch on Cloudflare TV

Register domains wherever you build: Cloudflare Registrar API now in beta

Ankit Shah — Wed, 15 Apr 2026 13:00:00 GMT

Today we're launching the next chapter of Cloudflare Registrar: the Registrar API in beta.

The Registrar API makes it possible to search for domains, check availability, and register them programmatically. Now, buying a domain the moment an idea starts to feel real no longer has to pull you out of the agentic workflow.

A Registrar API has been one of the clearest asks from builders using Cloudflare. As more of the agentic workflow has moved into editors, terminals, and agent-driven tools, domain registration became the obvious gap to close.

When we launched Cloudflare Registrar seven years ago, the idea was simple. Domains should be offered at cost, with no markup and no games. Since then, Cloudflare Registrar has become one of the fastest growing registrars in the world as more people choose Cloudflare as the place to build their next project.

^{Prompting an agent inside an AI code editor to generate name ideas, search, check, and purchase a domain.}

Built for agents and automation

The Registrar API is designed to work well anywhere software is already being built: inside editors, deployment pipelines, backend services, and agent-driven workflows.

The workflow is intentionally simple and machine-friendly. Search returns candidate names. Check returns real-time availability and pricing. Register takes a minimal request and returns a workflow-shaped response that can complete immediately or be polled if it takes longer. That makes it straightforward to use for traditional API clients and for AI agents acting on a user's behalf.

In practice, all this means that an agent can help with the full flow: suggest names, confirm which one is actually registrable, surface the price for approval, and then complete the purchase without forcing the user out of the tool they are already using.

The Registrar API

At its core, this first release of the Registrar API does three things

Search for domains
Check availability
Register domains

For a curated set of popular TLDs to start, see the Registrar API docs. When supported, premium domains can also be registered, but they require explicit fee acknowledgement.

The Registrar API is part of the full Cloudflare API, which means agents already have access to it today through the Cloudflare MCP. It does not require a separate integration or a custom tool definition. An agent working in Cursor, Claude Code, or any MCP-compatible environment can discover and call Registrar endpoints using the same search() and execute() pattern that covers the entire Cloudflare API surface. The moment the API was part of our spec, it was ready for agents.

What it looks like in practice:

You're building a new project in your favorite AI code editor. Halfway through scaffolding, you ask your agent: "Find me a good .dev domain for this project and register it."

The agent searches for candidate names based on your project. It checks real-time availability for the one you pick and confirms the price. You say yes. It registers the domain, using your account's default contact info and payment method automatically. By the time you've read the response, the domain is registered, and privacy is on.

Three API calls. A few seconds.

What it looks like in code:

Step 1: `Search` for domain names

Use the search endpoint to submit a domain query, with or without a domain extension.

async () => {
  return cloudflare.request({
    method: "GET",
    path: `/accounts/${accountId}/registrar/domain-search`,
    query: { q: "acme corp", limit: 3 },
  });
}

{
  "success": true,
  "errors": [],
  "messages": [],
  "result": {
    "domains": [
      {
        "name": "acmecorp.com",
        "registrable": true,
        "tier": "standard",
        "pricing": {
          "currency": "USD",
          "registration_cost": "8.57",
          "renewal_cost": "8.57"
        }
      },
      {
        "name": "acmecorp.dev",
        "registrable": true,
        "tier": "standard",
        "pricing": {
          "currency": "USD",
          "registration_cost": "10.11",
          "renewal_cost": "10.11"
        }
      },
      {
        "name": "acmecorp.app",
        "registrable": true,
        "tier": "standard",
        "pricing": {
          "currency": "USD",
          "registration_cost": "11.00",
          "renewal_cost": "11.00"
        }
      }
    ]
  }
}

Step 2: `Check` availability and pricing

Search results are fast but non-authoritative; they're based on cached data, and availability can change in seconds for popular names. Check queries the registry directly. Call it immediately before registering, and use its price response as the source of truth.

async () => {
  return cloudflare.request({
    method: "POST",
    path: `/accounts/${accountId}/registrar/domain-check`,
    body: { domains: ["acmecorp.dev"] },
  });
}

{
  "success": true,
  "errors": [],
  "messages": [],
  "result": {
    "domains": [
      {
        "name": "acmecorp.dev",
        "registrable": true,
        "tier": "standard",
        "pricing": {
          "currency": "USD",
          "registration_cost": "10.11",
          "renewal_cost": "10.11"
        }
      }
    ]
  }
}

Step 3: `Register` the domain

The only required field is the domain name. WHOIS privacy protection is enabled by default at no extra charge. If your account has a default registrant contact, the API uses it automatically; otherwise you can provide contact details inline in the request. Your default payment method is used automatically.

async () => {
  return cloudflare.request({
    method: "POST",
    path: `/accounts/${accountId}/registrar/registrations`,
    body: { domain_name: "acmecorp.dev" },
  });
}

{
  "success": true,
  "errors": [],
  "messages": [],
  "result": {
    "domain_name": "acmecorp.dev",
    "state": "succeeded",
    "completed": true,
    "created_at": "2025-10-27T10:00:00Z",
    "updated_at": "2025-10-27T10:00:03Z",
    "context": {
      "registration": {
        "domain_name": "acmecorp.dev",
        "status": "active",
        "created_at": "2025-10-27T10:00:00Z",
        "expires_at": "2026-10-27T10:00:00Z",
        "auto_renew": true,
        "privacy_enabled": true,
        "locked": true
      }
    },
    "links": {
      "self": "/accounts/abc/registrar/registrations/acmecorp.dev/registration-status",
      "resource": "/accounts/abc/registrar/registrations/acmecorp.dev"
    }
  }
}

Registration typically completes synchronously within seconds. If it takes longer, the API returns a 202 Accepted with a workflow URL to poll. The response shape is the same either way, no special-casing needed. For premium domains, the Check response returns the exact registry-set price, and the Register request echoes that back as an explicit fee acknowledgement.

A note on agents and non-refundable purchases

When an agent registers a domain on your behalf, it charges your default payment method. Domain registrations are non-refundable once complete. A well-designed agent flow should confirm the domain name and price with the user before calling the registration endpoint. The Check step exists precisely to make that confirmation step explicit and unambiguous. The API gives you the tools to build it correctly; the responsibility to do so belongs in your agent's logic.

By default, our API docs have explicit agent-facing instructions to seek permission from the user during the register API call. Still, it is the responsibility of the human to design an agent flow that will not buy domains without your approval.

Why Cloudflare can do this differently

What makes Cloudflare different from many developer platforms now adding domain workflows is that Cloudflare operates the registrar itself. That means the same platform where a project is built and deployed can also search for, register, and manage the domain — without adding markup on top.

At-cost pricing is at the core of Cloudflare’s registrar model. We charge exactly what the registry charges. That holds true whether you're registering a domain through the dashboard, calling the API directly, or asking an agent to do it on your behalf.

Where the API goes next

This beta focuses on the first critical moment in the domain lifecycle: search, check, and registration. We are actively working on expanding the API to cover more of the core Registrar experience, so domains can be managed programmatically after they are purchased, not just at the moment they are created. This will include lifecycle elements like transfers, renewals, contact updates, and more.

The API is the first step toward a broader registrar-as-a-service offering. Development of that service is underway now, and we’re aiming to launch it later this year. As the API expands, platforms like website builders, hosting providers, AI products, and other multi-tenant applications will be able to make domain registration part of their own user experience. Users can search for a domain, buy it, and provision it without ever leaving the service or agent-driven workflow they are already building in.

Start building today

The Registrar API exists because builders asked for it. Now that it’s available as a beta, we’d love to see what you build, in the Cloudflare Community or on X, or on Discord. To get started:

Review the Registrar API guide
Check out the API reference

Please let us know if something is missing, if a workflow breaks down, or if you are building toward a larger platform use case. We’re working quickly to expand the functionality of the API to support domain renewals, transfers, and more.

We can’t wait to see what you build!

Special thanks to Lucy Dryaeva and Fred Pinto for their valuable contributions to delivering the Registrar API beta.

Browser Run: give your agents a browser

Kathy Liao — Wed, 15 Apr 2026 13:00:00 GMT

AI agents need to interact with the web. To do that, they need a browser. They need to navigate sites, read pages, fill forms, extract data, and take screenshots. They need to observe whether things are working as expected, with a way for their humans to step in if needed. And they need to do all of this at scale.

Today, we’re renaming Browser Rendering to Browser Run, and shipping key features that make it the browser for AI agents. The name Browser Rendering never fully captured what the product does. Browser Run lets you run full browser sessions on Cloudflare's global network, drive them with code or AI, record and replay sessions, crawl pages for content, debug in real time, and let humans intervene when your agent needs help.

Here’s what’s new:

Live View: see what your agent sees and is doing, in real time. Know instantly if things are working, and when they’re not, see exactly why.
Human in the Loop: when your agent hits a snag like a login page or unexpected edge case, it can hand off to a human instead of failing. The human steps in, resolves, then hands back control.
Chrome DevTools Protocol (CDP) Endpoint: the Chrome DevTools Protocol is how agents control browsers. Browser Run now exposes it directly, so agents get maximum control over the browser and existing CDP scripts work on Cloudflare.
MCP Client Support: AI coding agents like Claude Desktop, Cursor, and OpenCode can now use Browser Run as their remote browser.
WebMCP Support: agents will outnumber humans using the web. WebMCP allows websites to declare what actions are available for agents to discover and call, making navigation more reliable.
Session Recordings: capture every browser session for debugging purposes. When something goes wrong, you have the full recording with DOM changes, user interactions, and page navigation.
Higher limits: run more tasks at once with 120 concurrent browsers, up from 30.

^{An AI agent searching Google Hotels for up to date pricing for stays in Kyoto}

Everything an agent needs

Let’s think about what agents need when browsing the web and how each feature fits in:

What an agent needs	Browser Run (formerly Browser Rendering)
1) Browsers on-demand	Chrome browser on Cloudflare’s global network
2) A way to control the browser	Take actions like navigate, click, fill forms, screenshot, and more with Puppeteer, Playwright, CDP (new), MCP Client Support (new) and WebMCP (new)
3) Observability	Live View (new), Session Recordings (new), and Dashboard redesign (new)
4) Human intervention	Human in the Loop (new)
5) Scale	10 requests/second for Quick Actions, 120 concurrent browsers (4x increase)

1) Open a browser

First, an agent needs a browser. With Browser Run, agents can spin up a headless Chrome instance on Cloudflare’s global network, on demand. No infrastructure to manage, no Chrome versions to maintain. Browser sessions open near users for low latency, and scale up and down as needed. Pair Browser Run with the Agents SDK to build long-running agents that browse the web, remember everything, and act on their own.

2) Take actions

Once your agent has a browser, it needs ways to control it. Browser Run supports multiple approaches: new low-level protocol access with the Chrome DevTools Protocol (CDP) and WebMCP, in addition to existing higher-level automation using Puppeteer and Playwright, and Quick Actions for simple tasks. Let’s look at the details.

Chrome DevTools Protocol (CDP) endpoint

The Chrome DevTools Protocol (CDP) is the low-level protocol that powers browser automation. Exposing CDP directly means the growing ecosystem of agent tools and existing CDP automation scripts can use Browser Run. When you open Chrome DevTools and inspect a page, CDP is what's running underneath. Puppeteer, Playwright, and most agent frameworks are built on top of it.

Every way that you have been using Browser Run has actually been through CDP already. What’s new is that we're now exposing CDP directly as an endpoint. This matters for agents because CDP gives agents the most control possible over the browser. Agent frameworks already speak CDP natively, and can now connect to Browser Run directly. CDP also unlocks browser actions that aren't available through Puppeteer or Playwright, like JavaScript debugging. And because you're working with raw CDP messages instead of going through higher-level libraries, you can pass messages directly to models for more token-efficient browser control.

If you already have CDP automation scripts running against self-hosted Chrome, they work on Browser Run with a one-line config change. Point your WebSocket URL at Browser Run and stop managing your own browser infrastructure.

// Before: connecting to self-hosted Chrome
const browser = await puppeteer.connect({
  browserWSEndpoint: 'ws://localhost:9222/devtools/browser'
});

// After: connecting to Browser Run
const browser = await puppeteer.connect({
  browserWSEndpoint: 'wss://api.cloudflare.com/client/v4/accounts//browser-rendering/devtools/browser',
  headers: { 'Authorization': 'Bearer ' }
});

The CDP endpoint also makes Browser Run more accessible. You can now connect from any language, any environment, without needing to write a Cloudflare Worker. (If you're already using Workers, nothing changes.)

Using Browser Run with MCP Clients

Now that Browser Run exposes the Chrome DevTools Protocol (CDP), MCP clients including Claude Desktop, Cursor, Codex, and OpenCode can use Browser Run as their remote browser. The chrome-devtools-mcp package from the Chrome DevTools team is an MCP server that gives your AI coding assistant access to the full power of Chrome DevTools for reliable automation, in-depth debugging, and performance analysis.

Here’s an example of how to configure Browser Run for Claude Desktop:

{
  "mcpServers": {
    "browser-rendering": {
      "command": "npx",
      "args": [
        "-y",
        "chrome-devtools-mcp@latest",
        "--wsEndpoint=wss://api.cloudflare.com/client/v4/accounts//browser-rendering/devtools/browser?keep_alive=600000",
        "--wsHeaders={\"Authorization\":\"Bearer \"}"
      ]
    }
  }
}

For other MCP clients, see documentation for using Browser Run with MCP clients.

WebMCP support

The Internet was built for humans, so navigating as an AI agent today is unreliable. We’re betting on a future where more agents use the web than humans. In that world, sites need to be agent-friendly.

That’s why we’re launching support for WebMCP, a new browser API from the Google Chrome team that landed in Chromium 146+. WebMCP lets websites expose tools directly to AI agents, declaring what actions are available for agents to discover and call on each page. This helps agents navigate the web more reliably. Instead of agents needing to figure out how to use a site, websites can expose their tools for agents to discover and call

Two APIs make this work:

navigator.modelContext allows websites to register their tools
navigator.modelContextTesting allows agents to discover and execute those tools

Today, an agent visiting a travel booking site has to figure out the UI by looking at it. With WebMCP, the site declares “here’s a search_flights tool that takes an origin, destination, and date.” The agent calls it directly, without having to loop through slow screenshot-analyze-click loops. This makes navigation more reliable regardless of potential changes to the UI.

Tools are discovered on the page rather than preloaded. This matters for the long tail of the web, where preloading an MCP server for every possible site is not feasible and would bloat the context window.

^{Using WebMCP to book a hotel through the Chrome DevTools console, discovering available tools with listTools()}

We have an experimental pool with browser instances running Chrome beta so you can test emerging browser features before they reach stable Chrome. We also just shipped Wrangler browser commands that let you manage browser sessions directly from the CLI, letting you create, manage, and view browser sessions directly from your terminal. To access WebMCP-enabled browsers, use the following Wrangler command to create a session in the experimental pool:

npm i -g wrangler@latest
wrangler browser create --lab --keepAlive 300

Existing ways to use Browser Run

While CDP and WebMCP are new, you could already use Puppeteer, Playwright, or Stagehand for full browser automation through Browser Run. And for simple tasks like capturing screenshots, generating PDFs, and extracting markdown, there are the Quick Action endpoints.

/crawl endpoint — crawl web content

We also recently shipped a /crawl endpoint that lets you crawl entire sites with a single API call. Give it a starting URL and pages are automatically discovered and scraped, then returned in your preferred format (HTML, Markdown, and structured JSON), with additional parameters to control crawl depth and scope, skip pages that haven’t changed, and specify certain paths to include or exclude.

We intentionally built /crawl to be a well-behaved crawler. That means it respects site owner’s preferences out of the box, is a signed agent with a distinct bot ID that is cryptographically signed using Web Bot Auth, a non-customizable User-Agent, and follows robots.txt and AI Crawl Control. It does not bypass Cloudflare’s bot protections or CAPTCHAs. Site owners choose whether their content is accessible and /crawl respects it.

# Initiate a crawl
curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \
  -H 'Authorization: Bearer ' \
  -H 'Content-Type: application/json' \
  -d '{
    "url": "https://blog.cloudflare.com/"
  }'

3) Observe

Things don’t always go right the first try. We kept hearing from customers that when their automations failed, they had no idea why. That’s why we’ve added multiple ways to observe what’s happening, so you can see exactly what your agent sees, both live and after the fact.

Live View

Live View lets you watch your agent’s browser session in real time. Whether you’re debugging an agent or running a long automation script, you see exactly what’s happening as it happens. This includes the page itself, as well as the DOM, console, and network requests. When something goes wrong — the expected button isn't there, the page needs authentication, or a CAPTCHA appears — you can catch it immediately.

There are two ways to access Live View. From code, obtain the session_id of the browser you want to inspect and open the devtoolsFrontendURL from the response in Chrome. Or from the Cloudflare dashboard, open the new Live Sessions tab in the Browser Run section and click into any active session.

^{Live View of an AI agent booking a hotel, showing real-time browser activity}

Session Recordings

Live View is great when you’re available, but you can’t watch every session. Session Recordings captures DOM changes, mouse and keyboard events, and page navigation as structured JSON so you can replay any session after it ends.

Enable Session Recordings by passing recording:true when launching a browser. After the session closes, you can access the recording in the Cloudflare dashboard from the Runs tab or retrieve recordings via API and replay them with the rrweb-player. Next, we’re adding the ability to inspect DOM state and console output at any point during the recording.

^{Session recording replay of a browser automation browsing the Sentry Shop and adding a bomber jacket to the cart}

Dashboard Redesign

Previously, the Browser Run dashboard only showed logs from browser sessions. Requests for screenshots, PDFs, markdown, and crawls were not visible. The redesigned dashboard changes that. The new Runs tab shows every request. You can filter by endpoint and view details including target URLs, status, and duration.

^{The Browser Run dashboard Runs tab showing browser sessions and quick actions like PDF, Screenshot, and Crawl in a single view, with a crawl job expanded to show its progress}

4) Intervene

Agents are good, but they’re not perfect. Sometimes they need their human to step in. Browser Run supports Human in the Loop workflows where a human can take control of a live browser session, handle what the automation cannot, then let the session continue.

Human in the Loop

When automation hits a wall, you don't have to restart. With Human in the Loop, you can step in and interact with the page directly to click, type, navigate, enter credentials, or submit forms. This unlocks workflows that agents cannot handle.

Today, you can step in by opening the Live View URL for any active session. Next, we’re adding a handoff flow where the agent can signal that it needs help, notify a human to step in, then hand control back to the agent once the issue is resolved.

^{An AI agent searching Amazon for an orange lava lamp, comparing options, and handing off to a human when sign-in is required to complete the purchase}

5) Scale

Customers have asked us to raise limits so that they can do more, faster.

Higher limits

We've quadrupled the default concurrent browser limit from 30 to 120. Every session gives you instant access to a browser from a global pool of warm instances, so there's no cold start waiting for a browser to spin up. In March, we also increased limits for Quick Actions to 10 requests per second. If you need higher limits, they're available by request.

What's next

Human in the Loop Handoff: today you can intervene in a browser session through Live View. Soon, the agent will be able to signal when it needs help, so you can build in notifications to alert a human to step in.
Session Recordings Inspection: you can already scrub through the timeline and replay any session. Soon, you’ll be able to inspect DOM state and console output as well.
Traces and Browser Logs: access debugging information without instrumenting your code. Console logs, network requests, timing data. If something broke, you'll know where.
Screenshot, PDF, and markdown directly from Workers: the same simple tasks available through the REST API are coming to Workers Bindings. env.BROWSER.screenshot() just works, with no API tokens needed.

Get started

Browser Run is available today on both the Workers Free and Workers Paid plans. Everything we shipped today — Live View, Human in the Loop, Session Recordings, and higher concurrency limits — is ready to use.

If you were already using Browser Rendering, everything works the same, just with a new name and more features.

Check out the documentation to get started.

Rearchitecting the Workflows control plane for the agentic era

Luís Duarte — Wed, 15 Apr 2026 13:00:00 GMT

When we originally built Workflows, our durable execution engine for multi-step applications, it was designed for a world in which workflows were triggered by human actions, like a user signing up or placing an order. For use cases like onboarding flows, workflows only had to support one instance per person — and people can only click so fast.

Over time, what we’ve actually seen is a quantitative shift in the workload and access pattern: fewer human-triggered workflows, and more agent-triggered workflows, created at machine speed.

As agents become persistent and autonomous infrastructure, operating on behalf of users for hours or days, they need a durable, asynchronous execution engine for the work they are doing. Workflows provides exactly that: every step is independently retryable, the workflow can pause for human-in-the-loop approval, and each instance survives failures without losing progress.

Moreover, workflows themselves are being used to implement agent loops and serve as the durable harnesses that manage and keep agents alive. Our Agents SDK integration accelerated this, making it easy for agents to spawn workflow instances and get real-time progress back. A single agent session can now kick off dozens of workflows, and many agents running concurrently means thousands of instances created in seconds. With Project Think now available, we anticipate that velocity will only increase.

To help developers scale their agents and applications on Workflows, we are excited to announce that we now support:

50,000 concurrent instances (number of workflow executions running in parallel), originally 4,500
300 instances/second created per account, previously 100
2 million queued instances (meaning instances that have been created or awoken and are waiting for a concurrency slot) per workflow, up from 1 million

We redesigned the Workflows control plane from usage data and first principles to support these increases. For V1 of the control plane, a single Durable Object (DO) could serve as the central registry and coordinator of an entire account. For V2, we built two new components to help horizontally scale the system and alleviate the bottlenecks that V1 introduced, before migrating all customers — with live traffic — seamlessly onto the new version.

V1: initial architecture of Workflows

As described in our public beta blog post, we built Workflows entirely on our own developer platform. Fundamentally, a workflow is a series of durable steps, each independently retryable, that can execute tasks, wait for external events, or sleep until a predetermined time.

export class MyWorkflow extends WorkflowEntrypoint {

  async run(event, step) {
    const data = await step.do("fetch-data", async () => {
      return fetchFromAPI();
    });

    const approval = await step.waitForEvent("approval", {
      type: "approval",
      timeout: "24 hours",
    });

    await step.do("process-and-save", async () => {
      return store(transform(data));
    });
  }
}

To trigger each instance, execute its logic, and store its metadata, we leverage SQLite-backed Durable Objects, which are a simple but powerful primitive for coordination and storage within a distributed system.

In the control plane, some Durable Objects — like the Engine, which executes the actual workflow instance, including its step, retry, and sleep logic — are spun up at a ratio of 1:1 per instance. On the other hand, the Account is an account-level Durable Object that manages all workflows and workflow instances for that account.

To learn more about the V1 control plane, refer to our Workflows announcement blog post.

After we launched Workflows into beta, we were thrilled to see customers quickly scaling their use of the product, but we also realized that having a single Durable Object to store all that account-level information introduced a bottleneck. Many customers needed to create and execute hundreds or even thousands of Workflow instances per minute, which could quickly overwhelm the Account in our original architecture. The original rate limits — 4,500 concurrency slots and 100 instance creations per 10 seconds — were a result of this limitation.

On the V1 control plane, these limits were a hard cap. Any and all operations depending on Account, including create, update, and list, had to go through that single DO. Users with high concurrency workloads could have thousands of instances starting and ending at any given moment, building up to thousands of requests per second to Account. To solve for this, we rearchitected the workflow control plane such that it horizontally scales to higher concurrency and creation rate limits.

V2: horizontal scale for higher throughput

For the new version, we rethought every single operation from the ground up with the goal of optimizing for high-volume workflows. Ultimately, Workflows should scale to support whatever developers need – whether that is thousands of instances created per second or millions of instances running at a time. We also wanted to ensure that V2 allowed for flexible limits, which we can toggle and continue increasing, rather than the hard cap which V1 limits imposed. After many design iterations, we settled on the following pillars for our new architecture:

The source of truth for the existence of a given instance should be its Engine and nothing else.
- In the V1 control plane architecture, we lacked a check before queuing the instance as to whether its Engine actually existed. This allowed for a bad state where an instance may have been queued without its corresponding Engine having spun up.
- Instance lifecycle and liveness mechanisms must be horizontally scalable per-workflow and distributed throughout many regions.
The new Account singleton should only store the minimum necessary metadata and have an invariant maximum amount of concurrent requests.

There are two new, critical components in the V2 control plane which allowed us to improve the scalability of Workflows: SousChef and Gatekeeper. The first component, SousChef, is a “second in command” to the Account. Recall that previously, the Account managed the metadata and lifecycle for all of the instances across all of the workflows within a given account. SousChef was introduced to keep track of metadata and lifecycle on a subset of instances in a given workflow. Within an account, a distribution of SousChefs can then report back to Account in a more efficient and manageable way. (An added benefit of this design: not only did we already have per-account isolation, but we also inadvertently gained “per-workflow” isolation within the same account, since each SousChef only takes care of one specific workflow).

The second component, Gatekeeper, is a mechanism to distribute concurrency “slots” (derived from concurrency limits) across all SousChefs within the account. It acts as a leasing system. When an instance is created, it is randomly assigned to one of the SousChefs within that account. Then the SousChef makes a request to Account to trigger that instance. Either a slot is granted, or the instance is queued. Once the slot is granted, the SousChef triggers execution of the instance and assumes responsibility that the instance never gets stuck.

Gatekeeper was needed to make sure that Engines never overloaded their Account (a pressing risk on V1) so every communication between SousChefs and their Account happens on a periodic cycle, once per second — each cycle will also batch all slot requests, ensuring that only one JSRPC call is made. This ensures the instance creation rate can never overload or influence the most important component, Account (as an aside: if the SousChef count is too high, we rate-limit calls or spread across different SousChefs throughout different time periods). Also, this periodic property allows us to preserve fairness on older instances and to ensure max-min fairness through the many SousChefs, allowing them all to progress. For example, if an instance wakes up, it should be prioritized for a slot over a newly created instance, but each SousChef ensures that its own instances do not get stuck.

This architecture is more distributed, and therefore, more scalable. Now, when an instance is created, the request path is:

Check control plane version
Check if a cached version of the workflow and version details is available in that location
1. If not, check Account to get workflow name, unique ID, and version, and cache that information
Store only necessary metadata (instance payload, creation date) onto its own Engine

So, how does Engine tell the control plane that it now exists? That happens in the background after instance metadata is set. As background operations on a Durable Object can fail, due to eviction or server failure, we also set an “alarm” on Engine in the creation hot-path. That way, if the background task does not finish, the alarm ensures that the instance will begin.

A Durable Object alarm allows a Durable Object instance to be awakened at a fine-grained time in the future with an at-least-once execution model, with automatic retries built in. We extensively use this combination of background “tasks” and alarms to remove operations off the hot-path while still ensuring that everything will happen as planned. That’s how we keep critical operations like creating an instance fast without ever compromising on reliability.

Other than unlocking scale, this version of the control plane means that:

Instance listing performance is faster, and actually consistent with cursor pagination;
Any operation on an instance does exactly one network hop (as it can go directly to its Engine, ensuring that eyeball request latency is as small as we can manage);
We can ensure that more instances are actually behaving correctly (by running on time) concurrently (and correct them if not, making sure that Engines are never late to continue execution).

V1 → V2 migration

Now that we had a new version of the Workflows control plane that can handle a higher volume of user load, we needed to do the “boring” part: migrating our customers and instances to the new system. At Cloudflare’s scale, this becomes a problem in and of itself, so the “boring” part becomes the biggest challenge. Well before its one-year mark, Workflows had already racked up millions of instances and thousands of customers. Also, some tech debt on V1’s control plane meant that a queued instance might not have its own Engine Durable Object created yet, complicating matters further.

Such a migration is tricky because customers might have instances running at any given moment; we needed a way to add the SousChef and Gatekeeper components into older accounts without causing any disruption or downtime.

We ultimately decided that we would migrate existing Accounts (which we’ll refer to as AccountOlds) to behave like SousChefs. By persisting the Account DOs, we maintained the instance metadata, and simply converted the DO into a SousChef “DO”:

// You might be wondering what's this SousChef class? This is the SousChef DO class!
import { SousChef } from "@repo/souschef";

class AccountOld extends DurableObject {
  constructor(state: DurableObjectState, env: Env) {
    // We added the following snippet to the end of our AccountOld DO's
    // constructor. This ensures that if we want, we can use any primitive
    // that is available on SousChef DO
    if (this.currentVersion === ControlPlaneVersions.SOUS_CHEFS) {
      this.sousChef = new SousChef(this.ctx, this.env);
      await this.sousChef.setup()
    }
  }

  async updateInstance(params: UpdateInstanceParams) {
    if (this.currentVersion === ControlPlaneVersions.SOUS_CHEFS) {
      assert(this.sousChef !== undefined, 'SousChef must exist on v2');
      return this.sousChef.updateInstance(params);
    }

    // old logic remains the same
  }

  @RequiresVersion(ControlPlaneVersions.V1)
  async getMetadata() {
    // this method can only be run if 
    // this.currentVersion === ControlPlaneVersions.V1
  }
}

We can instantiate the SousChef class within the AccountOld because the SQL tables that track instance metadata, on both SousChefs and AccountOld DOs, are the same on both. As such, we could just decide which version of the code to use. If this hadn’t been the case, we would have been forced to migrate the metadata of millions of instances, which would have made the migration more difficult and longer running for each account. So, how did the migration work?

First, we prepared AccountOld DOs to be switched to behave as SousChefs (which meant creating a release with a version of the snippet above). Then, we enabled control plane V2 per account, which triggered the next three steps roughly at the same time:

All new instance creation requests are now routed to the new SousChefs (SousChefs are created when they receive the first request), new instances never go to AccountOld again;
AccountOld DOs start migrating themselves to behave like SousChefs;
The new Account DO is spun up with the corresponding metadata.

After all accounts were migrated to the new control plane version, we were able to sunset AccountOld DOs as their instance retention periods expired. Once all instances on all accounts on AccountOlds were migrated, we could spin down those DOs permanently. The migration was completed with no downtime in a process that truly felt like changing a car’s wheels while driving.

Try it out

If you are new to Workflows, try our Get Started guide or build your first durable agent with Workflows.

If your use case requires higher limits than our new defaults — a concurrency limit of 50,000 slots and account-level creation rate limit of 300 instances per second, 100 per workflow — reach out via your account team or the Workers Limit Request Form. You can also reach out with feedback, feature requests, or just to share how you are using Workflows on our Discord server.

Securing non-human identities: automated revocation, OAuth, and scoped permissions

Justin Hutchings — Tue, 14 Apr 2026 13:00:10 GMT

Agents let you build software faster than ever, but securing your environment and the code you write — from both mistakes and malice — takes real effort. Open Web Application Security Project (OWASP) details a number of risks present in agentic AI systems, including the risk of credential leaks, user impersonation, and elevation of privilege. These risks can result in extreme damage to your environments including denial of service, data loss, or data leaks — which can do untold financial and reputational damage.

This is an identity problem. In modern development, "identities" aren't just people — they are the agents, scripts, and third-party tools that act on your behalf. To secure these non-human identities, you need to manage their entire lifecycle: ensuring their credentials (tokens) aren't leaked, seeing which applications have access via OAuth, and narrowing their permissions using granular RBAC.

Today, we are introducing updates to address these needs: scannable tokens to protect your credentials, OAuth visibility to manage your principals, and resource-scoped RBAC to fine-tune your policies.

Understanding identity: Principals, Credentials, and Policies

To secure the Internet in an era of autonomous agents, we have to rethink how we handle identity. Whether a request comes from a human developer or an AI agent, every interaction with an API relies on three core pillars:

The Principal (The Traveler): This is the identity itself — the "who." It might be you logging in via OAuth, or a background agent using an API token to deploy code.
The Credential (The Passport): This is the proof of that identity. In this world, your API token is your passport. If it’s stolen or leaked, anyone can "wear" your identity.
The Policy (The Visa): This defines what that identity is allowed to do. Just because you have a valid passport doesn't mean you have a visa to enter every country. A policy ensures that even a verified identity can only access the specific resources it needs.

When these three pillars aren't managed together, security breaks down. You might have a valid Principal using a stolen Credential, or a legitimate identity with a Policy that is far too broad.

Leaked token detection

Agents and other third-party applications use API tokens to access the Cloudflare API. One of the simplest ways that we see people leaking their secrets is by accidentally pushing them to a public GitHub repository. GitGuardian reports that last year more than 28 million secrets were published to public GitHub repositories, and that AI is causing leaks to happen 5x faster than before.

If an API token is a digital passport, then leaking it on a public repository is like leaving your passport on a park bench. Anyone who finds it can impersonate that identity until the document is canceled. Our partnership with GitHub acts like a global "lost and found" for these credentials. By the time you realize your passport is missing, we’ve already identified the document, verified its authenticity via the checksum, and voided it to prevent misuse.

We’re partnering with several leading credential scanning tools to help proactively find your leaked tokens and revoke them before they could be used maliciously. We know it’s not a matter of if, but rather when, before you, an employee, or one of your agents makes a mistake and pushes a secret somewhere it shouldn’t be.

GitHub

We’ve partnered with GitHub and are participating in their Secret Scanning program to find your tokens in both public and private repositories. If we are notified that a token has leaked to a public repository, we will automatically revoke the token to prevent it from being used maliciously. For private repositories, GitHub will notify you about any leaked Cloudflare tokens and you can clean these up.

How it works

We’ve shared the new token formats (below!) with GitHub, and they now scan for them on every commit. If they find something that looks like a leaked Cloudflare token, they verify the token is real (using the checksum), send us a webhook to revoke it, and then we notify you via email so you can generate a new one in Dashboard settings.

This means we plug the hole as soon as it’s found. By the time you realize you made a mistake, we've already fixed it.

We hope this is the kind of feature you don’t need to use, but our partners are on the lookout for leaks to help keep you secure.

Cloudflare One

Cloudflare One customers are also protected from these leaks. By configuring the Credentials and Secrets DLP profile, organizations can activate prevention everywhere a credential can travel:

Network Traffic (Cloudflare Gateway): Apply these entries to a policy to detect and block Cloudflare API tokens moving across your network. A token in a file upload, an outbound request, or a download is stopped before it reaches its destination.
Outbound Email (Cloudflare Email Security): Microsoft 365 customers can extend this same prevention to Outlook. The DLP Assist add-in scans messages before delivery, catching a token before it’s sent externally.
Data at Rest (Cloudflare CASB): Cloudflare’s Cloud Access Security Broker applies the same profile to scan files across connected SaaS applications, catching tokens saved or shared in Google Drive, OneDrive, Dropbox, and other integrated services.

The most novel exposure vector, though, is AI traffic. Cloudflare AI Gateway integrates with the same DLP profiles to scan and block both incoming prompts and outgoing AI model responses in real time.

Other credential scanners

The only way credential scanning works is if we meet you where you are, so we are working with several open source and commercial credential scanners to ensure you are protected no matter what secret scanner you use.

How it works

Until now, Cloudflare’s API tokens were pretty generic looking, so they were hard for credential scanners to identify with high confidence. These automated security tools scan your code repositories looking for exposed credentials like API keys, tokens or passwords. The “cf” prefix makes Cloudflare tokens instantly recognizable with greater confidence, and the checksum makes it easy for tools to statically validate them. Your existing tokens will continue to work, but every new token you generate will use the scannable format so it’s easily detected with high confidence.

Credential Type	What it's for	New Format
User API Key	Legacy global API key tied to your user account (full access)	cfk_[40 characters][checksum]
User API Token	Scoped token you create for specific permissions	cfut_[40 characters][checksum]
Account API Token	Token owned by the account (not a specific user)	cfat_[40 characters][checksum]

Getting started

If you have existing API tokens, you can roll the token to create a new, scannable API token. This is optional, but recommended to ensure that your tokens are easily discoverable in case they leak.

While API tokens are generally used by your own scripts and agents, OAuth is how you manage access for third-party platforms. Both require clear visibility to prevent unauthorized access and ensure you know exactly who — or what — has access to your data.

Improving the OAuth consent experience

When you connect third-party applications like Wrangler to your Cloudflare Account using OAuth, you're granting that application access to your account’s data. Over time, you may forget why you granted a third party application access to your Account in the first place. Previously, there was no central place to view & manage those applications. Starting today, there is.

Going forward, when a third party application requests access to your Cloudflare account, you’ll be able to review:

Which third-party application is requesting access, along with information about the application like Name, Logo, and the Publisher.
Which scopes the third-party application is requesting access to.
Which accounts to grant the third party application access to.

Before	After

Not all applications require the same permissions; some only need to read data, others may need to make changes to your Account. Understanding these scopes before you grant access helps you maintain least-privilege.

We also added a Connected Applications experience so you can see which applications have access to which accounts, what scopes/permissions are associated with that application, and easily revoke that access as needed.

Getting started

The OAuth consent and revocation improvements are available now. Check which apps currently have access to your accounts by visiting My Profile > Access Management > Connected Applications.

For developers building integrations with Cloudflare, keep an eye on the Cloudflare Changelog for more announcements around how you can register your own OAuth apps soon!

Fine-grained resource-level permissioning

If the token is the passport, then resource-scoped permissions are the visas inside it. Having a valid passport gets you through the front door, but it shouldn't give you access to every room in the building. By narrowing the scope to specific resources — like a single Load Balancer pool or a specific Gateway policy — you are ensuring that even if an identity is verified, it only has the "visa" to go where it’s strictly necessary.

Last year, we announced support for resource scoped permissions in Cloudflare’s role-based access control (RBAC) system for several of our Zero Trust products. This enables you to right size permissions for both users and agents to minimize security risks. We’ve expanded this capability to several new resources-level permissions. The resource scope is now supported for:

Access Applications
Access Identity Providers
Access Policies
Access Service Tokens
Access Targets

We’ve also completely overhauled the API Token creation experience, making it easier for customers to provision and manage Account API Tokens right from the Cloudflare Dashboard.

How it works

When you add a member to your Cloudflare account or create an API Token, you typically assign that principal a policy. A Permission Policy is what gives a principal permission to take an action, whether that’s managing Cloudflare One Access Applications, or DNS Records. Without a policy, a principal can authenticate, but they are unauthorized to do any actions within an account.

Policies are made up of three components: a Principal, a Role, and a Scope. The Principal is who or what you're granting access to, whether that's a human user, a Non-Human Identity (NHI) like an API Token, or increasingly, an Agent acting on behalf of a user. The Role defines what actions they're permitted to take. The Scope determines where those permissions apply, and historically, that's been restricted to the entire account, or individual zones.

New permission roles

We’re also expanding the role surface more broadly at both the Account & Zone level with the introduction of a number of new roles for many products.

Account scope
- CDN Management
- MCP Portals
- Radar
- Request Tracer
- SSL/TLS Management
Zone scope
- Analytics
- Logpush
- Page Rules
- Security Center
- Snippets
- Zone Settings

Getting started

The resource scope and all new account and zone-level roles are available today for all Cloudflare customers. You can assign account, zone, or resource-scoped policies through the Cloudflare Dashboard, the API, or Terraform.

For a full breakdown of all available roles and how scopes work, visit our roles and scope documentation.

Secure your accounts

These updates provide the granular building blocks needed for a true least-privilege architecture. By refining how we manage permissions and credentials, developers and enterprises can have greater confidence in their security posture across the users, apps, agents, and scripts that access Cloudflare. Least privilege isn’t a new concept, and for enterprises, it’s never been optional. Whether a human administrator is managing a zone or an agent is programmatically deploying a Worker, the expectation is the same, they should only be authorized to do the job it was given, and nothing else.

Following today’s announcement, we recommend customers:

Review your API tokens, and reissue with the new, scannable API tokens as soon as possible.
Review your authorized OAuth apps, and revoke any that you are no longer using
Review member & API Token permissions in your accounts and ensure that users are taking advantage of the new account, zone, or resource scoped permissions as needed to reduce your risk area.

The Cloudflare Blog

Browser Run: now running on Cloudflare Containers, it’s faster and more scalable

Remind me: what is Browser Run?

Outgrowing our bunk bed

The migration: Containers

Challenges: performance and scale bottlenecks

Migrating from KV to D1 + Queues

Additional perks for quick actions

Results: massive performance boost and increased limits

Better browser flexibility

Get started

Introducing Dynamic Workflows: durable execution that follows the tenant

The gap between durable and dynamic execution

Dynamic Workflows

How it works

The escape hatch

Dynamic Workers are the primitive

What this unlocks

Agent platforms that plan like engineers

SDKs and frameworks where the user brings the logic

CI/CD at primitive speed

Try it

Agents can now create Cloudflare accounts, buy domains, and deploy

How it works: zero to production without any setup or manual steps

How the protocol and integration works

Discovery: how agents find services they can provision themselves

Authorization: instant account creation for new users

Payment: give your agent a budget it can spend, without giving it your credit card info

Any platform with signed-in users can integrate with Cloudflare in the same way Stripe does

Give your agent the power to provision and pay

Making Rust Workers reliable: panic and abort recovery in wasm‑bindgen

Initial recovery mitigations

Implementing panic=unwind with WebAssembly Exception Handling

Abort recovery

Extension: abort reinitialization for wasm-bindgen libraries

Maturing the Rust Wasm Exception Handling ecosystem

Using panic unwind in Rust Workers

Committing to Rust Workers stability

Building the agentic cloud: everything we launched during Agents Week 2026

Compute

Security

Agent Toolbox

Prototype to production

Agentic Web

That’s a wrap

The AI engineering stack we built internally — on the platform we ship

The architecture at a glance

Act 1: The platform layer

How AI Gateway helped us stay secure and improve the developer experience

How we increasingly leverage Workers AI

How it works: one URL to configure everything

The MCP Server Portal: one OAuth, multiple MCP tools

Code Mode at the portal layer

Act 2: The knowledge layer

Backstage: the knowledge graph underneath all of it

AGENTS.md: getting thousands of repos ready for AI

What AGENTS.md looks like

How we generate them at scale

Act 3: The enforcement layer

The AI Code Reviewer

The output format

Engineering Codex: engineering standards as agent skills

The scoreboard

What's next: background agents

Start building

Orchestrating AI Code Review at scale

The architecture: plugins all the way to the moon

How we use OpenCode under the hood

What’s JSONL, and what do we use it for?

The streaming pipeline

Specialised agents instead of one big prompt

The models we use

Prompt injection prevention

Saving tokens with shared context

The coordinator helps keep things focused

Risk tiers: don't send the dream team to review a typo fix

Diff filtering: getting rid of the noise

The spawn_reviewers tool: concurrent orchestration

Resilience: circuit breakers and failback chains

Error classification

Implementing `panic=unwind` with WebAssembly Exception Handling