The Cloudflare Blog

Announcing Claude Managed Agents on Cloudflare

Mike Nomitch — Tue, 19 May 2026 13:00:00 GMT

Cloudflare and Anthropic have collaborated to integrate Claude Managed Agents with Cloudflare Sandboxes. Our new integration gives you more control over your agent sandboxes, secures connections to private services, and improves observability.

In the past year, Cloudflare’s Developer Platform has expanded to give more developers the tools they need to run agents at scale. This includes:

Sandboxes for full stateful Linux microVMs at scale
Agents SDK, providing simple and customizable agent framework
Browser Run, which gives agents fully programmable and observable browsers
Dynamic Workers, allowing for dynamic sandboxed code execution at massive scale

Our goal is to make Cloudflare the simplest, most secure, and most programmable cloud for agents.

Integrating with Claude Managed Agents is another step in this direction. You can run your agent loop on the Claude Platform, while using Cloudflare to execute code, secure connections, and run custom tool calls.

To get going in just minutes, we’ve created a default deployment template that gives you the following:

Enhanced security - Run all agent traffic through customizable proxies. This allows you to securely inject credentials, prevent data exfiltration, and better observe how your agents interact with the outside world.
Sandbox control and observability - Get detailed sandbox metrics and logs. SSH into running machines. Customize sandbox images.
Lightweight sandboxes - Writing and executing untrusted code can be done in a traditional microVM or a lightweight isolate. This lets you hit massive scale, boot sandboxes in milliseconds, and minimize infrastructure spend.
Private service connectivity - Connect agents to private internal services without ever exposing them to the Internet.
Browser Control and Observability - Get an audit trail of every agent’s browser sessions, including session recording and human-in-the-loop flows.
Email - Give each of your agents its own email address and ability to send emails.
Custom tools - Extend your agents with tools without needing additional infrastructure. Just write functions and deploy.

You get all of this out of the box when deploying the integration, and you can easily customize if you need more.

Let’s take a brief look at Claude Managed Agents, see how to integrate a Cloudflare-based environment, then explore how to get the most out of Claude on Cloudflare.

An overview of Claude Managed Agents

Claude Managed Agents allow developers to easily define and run agents on the Anthropic platform. In these managed environments, Claude can read files, run commands, browse the web, and execute code. The harness supports built-in prompt caching, compaction, and various agent-first performance optimizations.

Until now, using Claude Managed Agents has meant running the entire stack on Anthropic-provided infrastructure. While this is great for some developers, others may need more control over their infrastructure choice, whether this is for security, compliance, or performance reasons. Self-managed environments for Claude Agents provide just that.

Anthropic describes this as “decoupling the brain from the hands.” The core agent loop runs in Anthropic (the “brain”), but the infrastructure for running and executing code (the “hands”) can be run anywhere, including Cloudflare.

The Cloudflare environment

Our new integration gives your agents a Cloudflare-based environment for running and executing code within minutes.

Follow the onboarding guide to get started. Then fork the repo and customize your integration as you see fit.

After setup, when a Claude Agent starts a session, it sends a message to your new Cloudflare-based control plane. The Workers-based control plane gives each agent session a sandboxed environment for executing code, developing applications, running CLI tools, and more. State is automatically persisted across session sleeps.

^{Sandboxes write files and execute code in response to the Claude-based Agent loop}

You can optionally configure sandbox instance sizes or customize the container image that runs within VM-based sandboxes. Each sandbox can be observed in the Cloudflare dashboard, sandbox logs can be queried or shipped to external providers like Datadog or Splunk, and the control plane ships with a built-in UI, making it easy to track the state of sandboxes or SSH into specific machines.

^{Get interactive shell sessions into your agent’s sandbox}

Enabling agents at Internet scale

What if your agent backend booted in a few milliseconds, and you didn’t have to pay for the resources of a full VM when running the agent?

The industry needs a lightweight primitive for sandboxing as we adopt agents at scale, and we’re building just that.

But as models get better, we expect more and more workflows to be managed by agents. Each of your customers should be able to run many agents simultaneously; each of your employees should have tens of agents running at once. If we’re constantly running a full microVM per agent, we’ll be unnecessarily burning a ton of resources and money to enable this scale.

That’s why we’re providing a faster and cheaper sandbox for your Claude Agents. This sandbox is based on the AgentsSDK. You can execute arbitrary code in Dynamic Workers using Codemode, and you still get a file system, but your agent is doing all of this within a V8 isolate instead of a microVM.

If you need agents to act as a developer, building full applications and running Linux-based tools, you can still reach for a microVM-based sandbox. For this, we provide Cloudflare Containers, which Claude Managed Agents can also use.

But if you want a faster, cheaper, and more scalable alternative you can use isolates instead of microVMs easily. Just select “isolate” for backend type when setting up an Agent.

^{Setting up an “isolate” backend gives you a lightweight V8 isolate sandbox instead of a microVM}

If you want to handle bursts of tens of thousands of concurrent agents or more, running with isolates will allow you to scale in a way that no VM-based solution allows.

Securing your agentic workloads

Agents are far more powerful when they connect to your organization’s context. This usually means accessing private services and data.

As we’ve written before, sandboxed workloads on Cloudflare can use an outbound proxy for fully dynamic, customizable, and zero-trust authentication between sandboxes and external services. This lets you inject secrets into requests outside the sandbox, so the agent never has access to them. This protects against exfiltration attacks.

And sometimes internal services shouldn’t ever be exposed to the open Internet. We recently launched Cloudflare Mesh and Cloudflare Workers VPC to better connect to these private services, whether they’re running on a cloud provider like AWS or on-premises. This allows you to connect to internal services using post-quantum encrypted networking without a VPN or bastion host.

Claude Managed Agents can easily connect to private services with header injection or private VPC/Mesh tunnels. This is done via customizable outbound proxies. You can define egress policies that expose only the services you choose to the agent sandboxes that you choose. You can allowlist specific endpoints, perform zero-trust injection of encrypted credentials, access private services via Cloudflare Mesh, and even write custom proxy middleware.

^{The integration uses outbound Workers to handle egress however you see fit}

You’re able to apply policies per tenant, per agent, or based on whatever metadata is useful. This gives you full control over how your agents connect to external services.

Doing more with the Cloudflare Developer Platform

Agents need more than just a code execution environment. Cloudflare’s Developer Platform provides the tools you need by default to let your agents do more.

^{Sandboxes can make tool calls on Cloudflare and safely access external services.}

Here are a few of the tools you’ll find most useful as you deploy agents on Cloudflare:

Browser Run via Claude

One of the most common tools agents need is a browser. While curl can get you pretty far, when you want an agent to act like a human, this often means interacting with the web like one: rendering JS-heavy applications, taking screenshots for QA validation, filling out forms, etc. Browser Run is Cloudflare’s tool to give agents browsers.

^{A Browser Run session recording lets you watch how your agents used a browser. One of many built-in tools.}

The Claude Managed Agents integration ships with multiple browser-related tools that can be enabled immediately. These include browser_search, browser_execute, screenshot, browse, fetch_to_markdown, and a Cloudflare-specific implementation of web_fetch allows your agent to control a browser that runs on Cloudflare infrastructure. This not only lets your agent do more, but it also makes it easy to audit every action your agent’s browser is taking on the web, apply allowlists and denylist to browser sessions, and save recordings of browser sessions for future debugging.

Agent inboxes

The integration also comes with built-in support for email with the send_email, email_read, and email_list tools.

You can also kick off new sessions via email, or configure the agent to send emails using any domain and address configured with the Cloudflare Email Service. This allows the agent to act on your behalf when it needs to, reply to context in forwarded emails, and autonomously interact with others via email.

Custom tools and more

Other built-in tools include call_service, which uses Cloudflare Mesh or Workers VPC to connect to private services, and image_generate, which uses Workers AI to generate images on Cloudflare. This pairs well with Claude providing text-based inference.

Additionally, we encourage forking the repo to easily add customized tools. For example, you could add a custom tool to host a public file on Cloudflare’s R2 object storage. Just add the relevant binding in wrangler config, write a zod definition, and short function in custom-tools.js:

defineTool({
  name: "r2_host_file",
  description: "Upload from sandbox to R2 and get a public URL.",
  inputSchema: z.object({
    key: z.string().describe("Object key"),
    content: z.string().describe("UTF-8 file body"),
    contentType: z.string().describe("MIME type"),
  }),
  run: async ({ key, content, contentType }, { env }) => {
    await env.PUBLIC_BUCKET.put(
      key, content, { httpMetadata: { contentType }}
    );
    return `${env.PUB_R2_URL.replace(/\/$/, "")}/${encodeURI(key)}`;
  }
}),

The Cloudflare Developer Platform provides all sorts of possibilities for extending your agents: give each agent session a git-backed repo with Artifacts, run edge inference with Workers AI, host applications written on the fly with Dynamic Workers, and more.

You don’t have to worry about infrastructure or scaling – just write a few lines of code and hit deploy.

Claude + Cloudflare

We’re excited to be working together with Anthropic to bring Cloudflare’s flexibility, scale, and security to more users. Whether you want to run tens of millions of agents using isolates, securely connect to private services with Workers VPC, or write custom tools that take advantage of all of Cloudflare, our new integration makes it easy.

See the Getting Started with Managed Agents guide to get Claude Managed Agents set up with Cloudflare in just minutes.

Agents have their own computers with Sandboxes GA

Kate Reznykova — Mon, 13 Apr 2026 13:08:35 GMT

When we launched Cloudflare Sandboxes last June, the premise was simple: AI agents need to develop and run code, and they need to do it somewhere safe.

If an agent is acting like a developer, this means cloning repositories, building code in many languages, running development servers, etc. To do these things effectively, they will often need a full computer (and if they don’t, they can reach for something lightweight!).

Many developers are stitching together solutions using VMs or existing container solutions, but there are lots of hard problems to solve:

Burstiness - With each session needing its own sandbox, you often need to spin up many sandboxes quickly, but you don’t want to pay for idle compute on standby.
Quick state restoration - Each session should start quickly and re-start quickly, resuming past state.
Security - Agents need to access services securely, but can’t be trusted with credentials.
Control - It needs to be simple to programmatically control sandbox lifecycle, execute commands, handle files, and more.
Ergonomics - You need to give a simple interface for both humans and agents to do common operations.

We’ve spent time solving these issues so you don’t have to. Since our initial launch we’ve made Sandboxes an even better place to run agents at scale. We’ve worked with our initial partners such as Figma, who run agents in containers with Figma Make:

“Figma Make is built to help builders and makers of all backgrounds go from idea to production, faster. To deliver on that goal, we needed an infrastructure solution that could provide reliable, highly-scalable sandboxes where we could run untrusted agent- and user-authored code. Cloudflare Containers is that solution.”
- Alex Mullans, AI and Developer Platforms at Figma

We want to bring Sandboxes to even more great organizations, so today we are excited to announce that Sandboxes and Cloudflare Containers are both generally available.

Let’s take a look at some of the recent changes to Sandboxes:

Secure credential injection lets you make authenticated calls without the agent ever having credential access
PTY support gives you and your agent a real terminal
Persistent code interpreters give your agent a place to execute stateful Python, JavaScript, and TypeScript out of the box
Background processes and live preview URLs provide a simple way to interact with development servers and verify in-flight changes
Filesystem watching improves iteration speed as agents make changes
Snapshots let you quickly recover an agent's coding session
Higher limits and Active CPU Pricing let you deploy a fleet of agents at scale without paying for unused CPU cycles

Sandboxes 101

Before getting into some of the recent changes, let’s quickly look at the basics.

A Cloudflare Sandbox is a persistent, isolated environment powered by Cloudflare Containers. You ask for a sandbox by name. If it's running, you get it. If it's not, it starts. When it's idle, it sleeps automatically and wakes when it receives a request. It’s easy to programmatically interact with the sandbox using methods like exec, gitClone, writeFile and more.

import { getSandbox } from "@cloudflare/sandbox";
export { Sandbox } from "@cloudflare/sandbox";

export default {
  async fetch(request: Request, env: Env) {
    // Ask for a sandbox by name. It starts on demand.
    const sandbox = getSandbox(env.Sandbox, "agent-session-47");

    // Clone a repository into it.
    await sandbox.gitCheckout("https://github.com/org/repo", {
      targetDir: "/workspace",
      depth: 1,
    });

    // Run the test suite. Stream output back in real time.
    return sandbox.exec("npm", ["test"], { stream: true });
  },
};

As long as you provide the same ID, subsequent requests can get to this same sandbox from anywhere in the world.

Secure credential injection

One of the hardest problems in agentic workloads is authentication. You often need agents to access private services, but you can't fully trust them with raw credentials.

Sandboxes solve this by injecting credentials at the network layer using a programmable egress proxy. This means that sandbox agents never have access to credentials and you can fully customize auth logic as you see fit:

class OpenCodeInABox extends Sandbox {
  static outboundByHost = {
    "my-internal-vcs.dev": (request, env, ctx) => {
      const headersWithAuth = new Headers(request.headers);
      headersWithAuth.set("x-auth-token", env.SECRET);
      return fetch(request, { headers: headersWithAuth });
    }
  }
}

For a deep dive into how this works — including identity-aware credential injection, dynamically modifying rules, and integrating with Workers bindings — read our recent blog post on Sandbox auth.

A real terminal, not a simulation

Early agent systems often modeled shell access as a request-response loop: run a command, wait for output, stuff the transcript back into the prompt, repeat. It works, but it is not how developers actually use a terminal.

Humans run something, watch output stream in, interrupt it, reconnect later, and keep going. Agents benefit from that same feedback loop.

In February, we shipped PTY support. A pseudo-terminal session in a Sandbox, proxied over WebSocket, compatible with xterm.js.

Just call sandbox.terminal to serve the backend:

// Worker: upgrade a WebSocket connection into a live terminal session
export default {
  async fetch(request: Request, env: Env) {
    const url = new URL(request.url);
    if (url.pathname === "/terminal") {
      const sandbox = getSandbox(env.Sandbox, "my-session");
      return sandbox.terminal(request, { cols: 80, rows: 24 });
    }
    return new Response("Not found", { status: 404 });
  },
};

And use xterm addon to call it from the client:

// Browser: connect xterm.js to the sandbox shell
import { Terminal } from "xterm";
import { SandboxAddon } from "@cloudflare/sandbox/xterm";

const term = new Terminal();
const addon = new SandboxAddon({
  getWebSocketUrl: ({ origin }) => `${origin}/terminal`,
});

term.loadAddon(addon);
term.open(document.getElementById("terminal-container")!);
addon.connect({ sandboxId: "my-session" });

This allows agents and developers to use a full PTY to debug those sessions live.

Each terminal session gets its own isolated shell, its own working directory, its own environment. Open as many as you need, just like you would on your own machine. Output is buffered server-side, so reconnecting replays what you missed.

A code interpreter that remembers

For data analysis, scripting, and exploratory workflows, we also ship a higher-level abstraction: a persistent code execution context.

The key word is “persistent.” Many code interpreter implementations run each snippet in isolation, so state disappears between calls. You can't set a variable in one step and read it in the next.

Sandboxes allow you to create “contexts” that persist state. Variables and imports persist across calls the same way they would in a Jupyter notebook:

// Create a Python context. State persists for its lifetime.
const ctx = await sandbox.createCodeContext({ language: "python" });

// First execution: load data
await sandbox.runCode(`
  import pandas as pd
  df = pd.read_csv('/workspace/sales.csv')
  df['margin'] = (df['revenue'] - df['cost']) / df['revenue']
`, { context: ctx });

// Second execution: df is still there
const result = await sandbox.runCode(`
  df.groupby('region')['margin'].mean().sort_values(ascending=False)
`, { context: ctx, onStdout: (line) => console.log(line.text) });

// result contains matplotlib charts, structured json output, and Pandas tables in HTML

Start a server. Get a URL. Ship it.

Agents are more useful when they can build something and show it to the user immediately. Sandboxes support background processes, readiness checks, and preview URLs. This lets an agent start a development server and share a live link without leaving the conversation.

// Start a dev server as a background process
const server = await sandbox.startProcess("npm run dev", {
  cwd: "/workspace",
});

// Wait until the server is actually ready — don't just sleep and hope
await server.waitForLog(/Local:.*localhost:(\d+)/);

// Expose the running service with a public URL
const { url } = await sandbox.exposePort(3000);

// url is a live public URL the agent can share with the user
console.log(`Preview: ${url}`);

With waitForPort() and waitForLog(), agents can sequence work based on real signals from the running program instead of guesswork. This is much nicer than a common alternative, which is usually some version of sleep(2000) followed by hope.

Watch the file system and react immediately

Modern development loops are event-driven. Save a file, rerun the build. Edit a config, restart the server. Change a test, rerun the suite.

We shipped sandbox.watch() in March. It returns an SSE stream backed by native inotify, the kernel mechanism Linux uses for filesystem events.

import { parseSSEStream, type FileWatchSSEEvent } from '@cloudflare/sandbox';

const stream = await sandbox.watch('/workspace/src', {
  recursive: true,
  include: ['*.ts', '*.tsx']
});

for await (const event of parseSSEStream(stream)) {
  if (event.type === 'modify' && event.path.endsWith('.ts')) {
    await sandbox.exec('npx tsc --noEmit', { cwd: '/workspace' });
  }
}

This is one of those primitives that quietly changes what agents can do. An agent that can observe the filesystem in real time can participate in the same feedback loops as a human developer.

Waking up quickly with snapshots

Imagine a (human) developer working on their laptop. They git clone a repo, run npm install, write code, push a PR, then close their laptop while waiting for code review. When it’s time to resume work, they just re-open the laptop and continue where they left off.

If an agent wants to replicate this workflow on a naive container platform, you run into a snag. How do you resume where you left off quickly? You could keep a sandbox running, but then you pay for idle compute. You could start fresh from the container image, but then you have to wait for a long git clone and npm install.

Our answer is snapshots, which will be rolling out in the coming weeks.

A snapshot preserves a container's full disk state, OS config, installed dependencies, modified files, data files and more. Then it lets you quickly restore it later.

You can configure a Sandbox to automatically snapshot when it goes to sleep.

class AgentDevEnvironment extends Sandbox {
  sleepAfter = "5m";
  persistAcrossSessions = {type: "disk"}; // you can also specify individual directories
}

You can also programmatically take a snapshot and manually restore it. This is useful for checkpointing work or forking sessions. For instance, if you wanted to run four instances of an agent in parallel, you could easily boot four sandboxes from the same state.

class AgentDevEnvironment extends Sandbox {}

async forkDevEnvironment(baseId, numberOfForks) {
  const baseInstance = await getSandbox(baseId);
  const snapshotId = await baseInstance.snapshot();

  const forks = Array.from({ length: numberOfForks }, async (_, i) => {
    const newInstance = await getSandbox(`${baseId}-fork-${i}`);
    return newInstance.start({ snapshot: snapshotId });
  });

  await Promise.all(forks);
}

Snapshots are stored in R2 within your account, giving you durability and location-independence. R2's tiered caching system allows for fast restores across all of Region: Earth.

In future releases, live memory state will also be captured, allowing running processes to resume exactly where they left off. A terminal and an editor will reopen in the exact state they were in when last closed.

If you are interested in restoring session state before snapshots go live, you can use the backup and restore methods today. These also persist and restore directories using R2, but are not as performant as true VM-level snapshots. Though they still can lead to considerable speed improvements over naively recreating session state.

^{Booting a sandbox, cloning ‘axios’, and npm installing takes 30 seconds. Restoring from a backup takes two seconds.}

Stay tuned for the official snapshot release.

Higher limits and Active CPU Pricing

Since our initial launch, we’ve been steadily increasing capacity. Users on our standard pricing plan can now run 15,000 concurrent instances of the lite instance type, 6,000 instances of basic, and over 1,000 concurrent larger instances. Reach out to run even more!

We also changed our pricing model to be more cost effective running at scale. Sandboxes now only charge for actively used CPU cycles. This means that you aren’t paying for idle CPU while your agent is waiting for an LLM to respond.

This is what a computer looks like

Nine months ago, we shipped a sandbox that could run commands and access a filesystem. That was enough to prove the concept.

What we have now is different in kind. A Sandbox today is a full development environment: a terminal you can connect a browser to, a code interpreter with persistent state, background processes with live preview URLs, a filesystem that emits change events in real time, egress proxies for secure credential injection, and a snapshot mechanism that makes warm starts nearly instant.

When you build on this, a satisfying pattern emerges: agents that do real engineering work. Clone a repo, install it, run the tests, read the failures, edit the code, run the tests again. The kind of tight feedback loop that makes a human engineer effective — now the agent gets it too.

We're at version 0.8.9 of the SDK. You can get started today:

npm i @cloudflare/sandbox@latest

Dynamic, identity-aware, and secure Sandbox auth

Mike Nomitch — Mon, 13 Apr 2026 13:00:00 GMT

As AI Large Language Models and harnesses like OpenCode and Claude Code become increasingly capable, we see more users kicking off sandboxed agents in response to chat messages, Kanban updates, vibe coding UIs, terminal sessions, GitHub comments, and more.

The sandbox is an important step beyond simple containers, because it gives you a few things:

Security: Any untrusted end user (or a rogue LLM) can run in the sandbox and not compromise the host machine or other sandboxes running alongside it. This is traditionally (but not always) accomplished with a microVM.
Speed: An end user should be able to pick up a new sandbox quickly and restore the state from a previously used one quickly.
Control: The trusted platform needs to be able to take actions within the untrusted domain of the sandbox. This might mean mounting files in the sandbox, or controlling which requests access it, or executing specific commands.

Today, we’re excited to add another key component of control to our Sandboxes and all Containers: outbound Workers. These are programmatic egress proxies that allow users running sandboxes to easily connect to different services, add observability, and, importantly for agents, add flexible and safe authentication.

How it works

Here’s a quick look at adding a secret key to a header using an outbound Worker:

class OpenCodeInABox extends Sandbox {
  static outboundByHost = {
    "github.com": (request, env, ctx) => {
      const headersWithAuth = new Headers(request.headers);
      headersWithAuth.set("x-auth-token", env.SECRET);
      return fetch(request, { headers: headersWithAuth });
    }
  }
}

Any time code running in the sandbox makes a request to “github.com”, the request is proxied through the handler. This allows you to do anything you want on each request, including logging, modifying, or cancelling it. In this case, we’re safely injecting a secret (more on this later). The proxy runs on the same machine as any sandbox, has access to distributed state, and can be easily modified with simple JavaScript.

We’re excited about all the possibilities this adds to Sandboxes, especially around authentication for agents. Before going into details, let’s back up and take a quick tour of traditional forms of auth, and why we think there’s something better.

Common auth for agentic workloads

The core issue with agentic auth is that we can’t fully trust the workload. While our LLMs aren’t nefarious (at least not yet), we still need to be able to apply protections to ensure they don’t use data inappropriately or take actions they shouldn’t.

A few common methods exist to provide auth to agents, and each has downsides:

Standard API tokens are the most basic method of authentication, typically injected into applications via environment variables or in mounted secrets files. This is the arguably simplest method, but least secure. You have to trust that the sandbox won’t somehow be compromised or accidentally exfiltrate the token while making a request. Since you can’t fully trust the agent, you’ll need to set up token expiry and rotation, which can be a hassle.

Workload identity tokens, such as OIDC tokens, can solve some of this pain. Rather than granting the agent a token with general permissions, you can grant it a token that attests its identity. Now, rather than the agent having direct access to some service with a token, it can exchange an identity token for a very short-lived access token. The OIDC token can be invalidated after a specific agent’s workflow completes, and expiry is easier to manage. One of the biggest downsides of workload identity tokens is the potential inflexibility of integrations. Many services don’t have first-class support for OIDC, so in order to get working integrations with upstream services, platforms will need to roll their own token-exchanging services. This makes adoption difficult.

Custom proxies provide maximum flexibility, and can be paired with workload identity tokens. If you can pass some or all of your sandbox egress through a trusted piece of code, you can insert whatever rules you need. Maybe the upstream service your agent is communicating with has a bad RBAC story, and it can’t provide granular permissions. No problem, just write the controls and permissions yourself! This is a great option for agents that you need to lock down with granular controls. However, how do you intercept all of a sandbox’s traffic? How do you set up a proxy that is dynamic and easily programmable? How do you proxy traffic efficiently? These aren’t easy problems to solve.

With those imperfect methods in mind, what does an ideal auth mechanism look like?

Ideally, it is:

Zero trust. No token is ever granted to an untrusted user for any amount of time.
Simple. Easy to author. Doesn’t involve a complex system of minting, rotating, and decrypting tokens.
Flexible. We don’t rely on the upstream system to provide the granular access we need. We can apply whatever rules we want.
Identity-aware. We can identify the sandbox making the call and apply specific rules for it.
Observable. We can easily gather information about what calls are being made.
Performant. We aren’t round-tripping to a centralized or slow source of truth.
Transparent. The sandboxed workload doesn’t have to know about it. Things just work.
Dynamic. We can change rules on the fly.

We believe outbound Workers for Sandboxes fit the bill on all of these. Let’s see how.

Outbound Workers in practice

Basics: restriction and observability

First, we’ll look at a very basic example: logging requests and denying specific actions.

In this case, we’ll use the outbound function, which intercepts all outgoing HTTP requests from the sandbox. With a few lines of JavaScript, it’s easy to ensure only GETs are made and log then deny any disallowed methods.

class MySandboxedApp extends Sandbox {
  static outbound = (req, env, ctx) => {
    // Deny any non-GET action and log
    if (req.method !== 'GET') {
      console.log(`Container making ${req.method} request to: ${req.url}`);
      return new Response('Not Allowed', { status: 405, statusText: 'Method Not Allowed'});
    }

    // Proceed if it is a GET request
    return fetch(req);
  };
}

This proxy runs on Workers and runs on the same machine as the sandboxed VM. Workers were built for quick response times, often sitting in front of cached CDN traffic, so additional latency is extremely minimal.

Because this is running on Workers, we get observability out of the box. You can view logs and outbound requests in the Workers dashboard or export them to your application performance monitoring tool of choice.

Zero trust credential injection

How would we use this to enforce a zero trust environment for our agent? Let’s imagine we want to make a request to a private GitHub instance, but we never want our LLM to access a private token.

We can use outboundByHost to define functions for specific domains or IPs. In this case, we’ll inject a protected credential if the domain is “my-internal-vcs.dev”. The sandboxed agent never has access to these credentials.

class OpenCodeInABox extends Sandbox {
  static outboundByHost = {
    "my-internal-vcs.dev": (request, env, ctx) => {
      const headersWithAuth = new Headers(request.headers);
      headersWithAuth.set("x-auth-token", env.SECRET);
      return fetch(request, { headers: headersWithAuth });
    }
  }
}

It is also easy to conditionalize the response based on the identity of the container. You don’t have to inject the same tokens for every sandbox instance.

 static outboundByHost = {
  "my-internal-vcs.dev": (request, env, ctx) => {
    // note: KV is encrypted at rest and in transit
    const authKey = await env.KEYS.get(ctx.containerId);

    const requestWithAuth = new Request(request);
    requestWithAuth.headers.set("x-auth-token", authKey);
    return fetch(requestWithAuth);
  }
}

Using the Cloudflare Developer Platform

As you may have noticed in the last example, another major advantage of using outbound Workers is that it makes integration into the Workers ecosystem easier. Previously, if a user wanted to access R2, they would have to inject an R2 credential, then make a call from their container to the public R2 API. Same for KV, Agents, other Containers, other Worker services, etc.

Now, you just call any binding from your outbound Workers.

class MySandboxedApp extends Sandbox {
  static outboundByHost = {
    "my.kv": async (req, env, ctx) => {
      const key = keyFromReq(req);
      const myResult = await env.KV.get(key);
      return new Response(myResult);
    },
    "objects.cf": async (req, env, ctx) => {
      const prefix = ctx.containerId
      const path = pathFromRequest(req);
      const object = await env.R2.get(`${prefix}/${path}`);
      const myResult = await env.KV.get(key);
      return new Response(myResult);
    },
  };
}

Rather than parsing tokens and setting up policies, we can easily conditionalize access with code and whatever logic we want. In the R2 example, we also were able to use the sandbox’s ID to further scope access with ease.

Making controls dynamic

Networking control should also be dynamic. On many platforms, config for Container and VM networking is static, looking something like this:

{
  defaultEgress: "block",
  allowedDomains: ["github.com", "npmjs.org"]
}

This is better than nothing, but we can do better. For many sandboxes, we might want to apply a policy on start, but then override it with another once specific operations have been performed.

For instance, we can boot a sandbox, grab our dependencies via NPM and Github, and then lock down egress after that. This ensures that we open up the network for as little time as possible.

To achieve this, we can use outboundHandlers, which allows us to define arbitrary outbound handlers that can be applied programmatically using the setOutboundHandler method. Each of these also takes params, allowing you to customize behavior from code. In this case, we will allow some hostnames with the custom “allowHosts” policy, then turn off HTTP.

class MySandboxedApp extends Sandbox {
  static outboundHandlers = {
    async allowHosts(req, env, { params }) {
     const url = new URL(request.url);
     const allowedHostname = params.allowedHostnames.includes(url.hostname);

      if (allowedHostname) {
        return await fetch(newRequest);
      } else {
        return new Response(null, { status: 403, statusText: "Forbidden" });
      }
    }
    
    async noHttp(req) {
      return new Response(null, { status: 403, statusText: "Forbidden" });
    }
  }
}

async setUpSandboxes(req, env) {
  const sandbox = await env.SANDBOX.getByName(userId);
  await sandbox.setOutboundHandler("allowHosts", {
    allowedHostnames: ["github.com", "npmjs.org"]
  });
  await sandbox.gitClone(userRepoURL)
  await sandbox.exec("npm install")
  await sandbox.setOutboundHandler("noHttp");
}

This could be extended even further. Your agent might ask the end user a question like “Do you want to allow POST requests to cloudflare.com?” based on whatever tools it needs at that time. With dynamic outbound Workers, you can easily modify the sandbox rules on the fly to provide this level of control.

TLS support with MITM Proxying

To do anything useful with requests beyond allowing or denying them, you need to have access to the content. This means that if you’re making HTTPS requests, they need to be decrypted by the Workers proxy.

To achieve this, a unique ephemeral certificate authority (CA) and private key are created for each Sandbox instance, and the CA is placed into the sandbox. By default, sandbox instances will trust this CA, while standard container instances can opt into trusting it, for instance by calling sudo update-ca-certificates.

export class MyContainer extends Container {
  interceptHttps = true;
}

MyContainer.outbound = (req, env, ctx) => {
  // All HTTP(S) requests will trigger this hook.
  return fetch(req);
};

TLS traffic is proxied by a Cloudflare isolated network process by performing a TLS handshake. It creates a leaf CA from an ephemeral and unique private key and uses the SNI extracted in the ClientHello. It will then invoke in the same machine the configured Worker to handle the HTTPS request.

Our ephemeral private key and CA will never leave our container runtime sidecar process, and is never shared across other container sidecar processes.

With this in place, outbound Workers act as a truly transparent proxy. The sandbox doesn't need any awareness of specific protocols or domains — all HTTP and HTTPS traffic flows through the outbound handler for filtering or modification.

Under the hood

To enable the functionality shown above in both Container and Sandbox, we added new methods to the ctx.container object: interceptOutboundHttp and interceptOutboundHttps, which intercept outgoing requests on specific hostnames (with basic glob matching), IP ranges, and it can be used to intercept all outbound requests. These methods are called with a WorkerEntrypoint, which gets set up as the front door to the outbound Worker.

export class MyWorker extends WorkerEntrypoint {
 fetch() {
   return new Response(this.ctx.props.message);
 }
}

// ... inside your container DurableObject ...
this.ctx.container.start({ enableInternet: false });
const outboundWorker = this.ctx.exports.MyWorker({ props: { message: 'hello' } });
await this.ctx.container.interceptOutboundHttp('15.0.0.1:80', outboundWorker);

// From now on, all HTTP requests to 15.0.0.1:80 return "hello"
await this.waitForContainerToBeHealthy();

// You can decide to return another message now...
const secondOutboundWorker = this.ctx.exports.MyWorker({ props: { message: 'switcheroo' } });
await this.ctx.container.interceptOutboundHttp('15.0.0.1:80', secondOutboundWorker);
// all HTTP requests to 15.0.0.1 now show "switcheroo", even on connections that were
// open before this interceptOutboundHttp

// You can even set hostnames, CIDRs, for both IPv4 and IPv6
await this.ctx.container.interceptOutboundHttp('example.com', secondOutboundWorker);
await this.ctx.container.interceptOutboundHttp('*.example.com', secondOutboundWorker);
await this.ctx.container.interceptOutboundHttp('123.123.123.123/23', secoundOutboundWorker);

All proxying to Workers happens locally on the same machine that runs the sandbox VM. Even though communication between container and Worker is “authless”, it is secure.

These methods can be called at any time, before or after starting the container, even while connections are still open. Connections that send multiple HTTP requests will automatically pick up a new entrypoint, so updating outbound Workers will not break existing TCP connections or interrupt HTTP requests.

Local development with wrangler dev also has support for egress interception. To make it possible, we automatically spawn a sidecar process inside the local container’s network namespace. We called this sidecar component proxy-everything. Once proxy-everything is attached, it applies the appropriate TPROXY nftable rules, routing matching traffic from the local Container to workerd, Cloudflare’s open source JavaScript runtime, which runs the outbound Worker. This allows the local development experience to mirror what happens in prod, so testing and development remain simple.

Giving outbound Workers a try

If you haven’t tried Cloudflare Sandboxes, check out the Getting Started guide. If you are a current user of Containers or Sandboxes, start using outbound Workers now by reading the documentation and upgrading to @cloudflare/containers@0.3.0 or @cloudflare/sandbox@0.8.9.

Watch on Cloudflare TV

Python Workers redux: fast cold starts, packages, and a uv-first workflow

Dominik Picheta — Mon, 08 Dec 2025 06:00:00 GMT

^{Note: This post was updated with additional details regarding AWS Lambda.}

Last year we announced basic support for Python Workers, allowing Python developers to ship Python to region: Earth in a single command and take advantage of the Workers platform.

Since then, we’ve been hard at work making the Python experience on Workers feel great. We’ve focused on bringing package support to the platform, a reality that’s now here — with exceptionally fast cold starts and a Python-native developer experience.

This means a change in how packages are incorporated into a Python Worker. Instead of offering a limited set of built-in packages, we now support any package supported by Pyodide, the WebAssembly runtime powering Python Workers. This includes all pure Python packages, as well as many packages that rely on dynamic libraries. We also built tooling around uv to make package installation easy.

We’ve also implemented dedicated memory snapshots to reduce cold start times. These snapshots result in serious speed improvements over other serverless Python vendors. In cold start tests using common packages, Cloudflare Workers start over 2.4x faster than AWS Lambda without SnapStart and 3x faster than Google Cloud Run.

In this blog post, we’ll explain what makes Python Workers unique and share some of the technical details of how we’ve achieved the wins described above. But first, for those who may not be familiar with Workers or serverless platforms – and especially those coming from a Python background — let us share why you might want to use Workers at all.

Deploying Python globally in 2 minutes

Part of the magic of Workers is simple code and easy global deployments. Let's start by showing how you can deploy a FastAPI app across the world with fast cold starts in less than two minutes.

A simple Worker using FastAPI can be implemented in a handful of lines:

from fastapi import FastAPI
from workers import WorkerEntrypoint
import asgi

app = FastAPI()

@app.get("/")
async def root():
   return {"message": "This is FastAPI on Workers"}

class Default(WorkerEntrypoint):
   async def fetch(self, request):
       return await asgi.fetch(app, request.js_object, self.env)

To deploy something similar, just make sure you have uv and npm installed, then run the following:

$ uv tool install workers-py
$ pywrangler init --template \
    https://github.com/cloudflare/python-workers-examples/03-fastapi
$ pywrangler deploy

With just a little code and a pywrangler deploy, you’ve now deployed your application across Cloudflare’s edge network that extends to 330 locations across 125 countries. No worrying about infrastructure or scaling.

And for many use cases, Python Workers are completely free. Our free tier offers 100,000 requests per day and 10ms CPU time per invocation. For more information, check out the pricing page in our documentation.

For more examples, check out the repo in GitHub. And read on to find out more about Python Workers.

So what can you do with Python Workers?

Now that you’ve got a Worker, just about anything is possible. You write the code, so you get to decide. Your Python Worker receives HTTP requests and can make requests to any server on the public Internet.

You can set up cron triggers, so your Worker runs on a regular schedule. Plus, if you have more complex requirements, you can make use of Workflows for Python Workers, or even long-running WebSocket servers and clients using Durable Objects.

Here are more examples of the sorts of things you can do using Python Workers:

Faster package cold starts

Serverless platforms like Workers save you money by only running your code when it’s necessary to do so. This means that if your Worker isn’t receiving requests, it may be shut down and will need to be restarted once a new request comes in. This typically incurs a resource overhead we refer to as the “cold start.” It’s important to keep these as short as possible to minimize latency for end users.

In standard Python, booting the runtime is expensive, and our initial implementation of Python Workers focused on making the runtime boot fast. However, we quickly realized that this wasn’t enough. Even if the Python runtime boots quickly, in real-world scenarios the initial startup usually includes loading modules from packages, and unfortunately, in Python many popular packages can take several seconds to load.

We set out to make cold starts fast, regardless of whether packages were loaded.

To measure realistic cold start performance, we set up a benchmark that imports common packages, as well as a benchmark running a “hello world” using a bare Python runtime. Standard Lambda is able to start just the runtime quickly, but once you need to import packages, the cold start times shoot up. In order to optimize for faster cold starts with packages, you can use SnapStart on Lambda (which we will be adding to the linked benchmarks shortly). This incurs a cost to store the snapshot and an additional cost on every restore. Python Workers will automatically apply memory snapshots for free for every Python Worker.

Here are the average cold start times when loading three common packages (httpx, fastapi and pydantic):

Platform	Mean Cold Start (secs)
Cloudflare Python Workers	1.027
AWS Lambda (without SnapStart)	2.502
Google Cloud Run	3.069

In this case, Cloudflare Python Workers have 2.4x faster cold starts than AWS Lambda without SnapStart and 3x faster cold starts than Google Cloud Run. We achieved these low cold start numbers by using memory snapshots, and in a later section we explain how we did so.

We are regularly running these benchmarks. Go here for up-to-date data and more info on our testing methodology.

We’re architecturally different from these other platforms — namely, Workers is isolate-based. Because of that, our aims are high, and we are planning for a zero cold start future.

Package tooling integrated with uv

The diverse package ecosystem is a large part of what makes Python so amazing. That’s why we’ve been hard at work ensuring that using packages in Workers is as easy as possible.

We realised that working with the existing Python tooling is the best path towards a great development experience. So we picked the uv package and project manager, as it’s fast, mature, and gaining momentum in the Python ecosystem.

We built our own tooling around uv called pywrangler. This tool essentially performs the following actions:

Reads your Worker’s pyproject.toml file to determine the dependencies specified in it
Includes your dependencies in a python_modules folder that lives in your Worker

Pywrangler calls out to uv to install the dependencies in a way that is compatible with Python Workers, and calls out to wrangler when developing locally or deploying Workers.

Effectively this means that you just need to run pywrangler dev and pywrangler deploy to test your Worker locally and deploy it.

Type hints

You can generate type hints for all of the bindings defined in your wrangler config using pywrangler types. These type hints will work with Pylance or with recent versions of mypy.

To generate the types, we use wrangler types to create typescript type hints, then we use the typescript compiler to generate an abstract syntax tree for the types. Finally, we use the TypeScript hints — such as whether a JS object has an iterator field — to generate mypy type hints that work with the Pyodide foreign function interface.

Decreasing cold start duration using snapshots

Python startup is generally quite slow and importing a Python module can trigger a large amount of work. We avoid running Python startup during a cold start using memory snapshots.

When a Worker is deployed, we execute the Worker’s top-level scope and then take a memory snapshot and store it alongside your Worker. Whenever we are starting a new isolate for the Worker, we restore the memory snapshot and the Worker is ready to handle requests, with no need to execute any Python code in preparation. This improves cold start times considerably. For instance, starting a Worker that imports fastapi, httpx and pydantic without snapshots takes around 10 seconds. With snapshots, it takes 1 second.

The fact that Pyodide is built on WebAssembly enables this. We can easily capture the full linear memory of the runtime and restore it.

Memory snapshots and Entropy

WebAssembly runtimes do not require features like address space layout randomization for security, so most of the difficulties with memory snapshots on a modern operating system do not arise. Just like with native memory snapshots, we still have to carefully handle entropy at startup to avoid using the XKCD random number generator (we’re very into actual randomness).

By snapshotting memory, we might inadvertently lock in a seed value for randomness. In this case, future calls for “random” numbers would consistently return the same sequence of values across many requests.

Avoiding this is particularly challenging because Python uses a lot of entropy at startup. These include the libc functions getentropy() and getrandom() and also reading from /dev/random and /dev/urandom. All of these functions share the same implementation in terms of the JavaScript crypto.getRandomValues() function.

In Cloudflare Workers, crypto.getRandomValues() has always been disabled at startup in order to allow us to switch to using memory snapshots in the future. Unfortunately, the Python interpreter cannot bootstrap without calling this function. And many packages also require entropy at startup time. There are essentially two purposes for this entropy:

Hash seeds for hash randomization
Seeds for pseudorandom number generators

Hash randomization we do at startup time and accept the cost that each specific Worker has a fixed hash seed. Python has no mechanism to allow replacing the hash seed after startup.

For pseudorandom number generators (PRNG), we take the following approach:

At deploy time:

Seed the PRNG with a fixed “poison seed”, then record the PRNG state.
Replace all APIs that call into the PRNG with an overlay that fails the deployment with a user error.
Execute the top level scope of user code.
Capture the snapshot.

At run time:

Assert that the PRNG state is unchanged. If it changed, we forgot the overlay for some method. Fail the deployment with an internal error.
After restoring the snapshot, reseed the random number generator before executing any handlers.

With this, we can ensure that PRNGs can be used while the Worker is running, but stop Workers from using them during initialization and pre-snapshot.

Memory snapshots and WebAssembly state

An additional difficulty arises when creating memory snapshots on WebAssembly: The memory snapshot we are saving consists only of the WebAssembly linear memory, but the full state of the Pyodide WebAssembly instance is not contained in the linear memory.

There are two tables outside of this memory.

One table holds the values of function pointers. Traditional computers use a “Von Neumann” architecture, which means that code exists in the same memory space as data, so that calling a function pointer is a jump to some memory address. WebAssembly has a “Harvard architecture” where code lives in a separate address space. This is key to most of the security guarantees of WebAssembly and in particular why WebAssembly does not need address space layout randomization. A function pointer in WebAssembly is an index into the function pointer table.

A second table holds all JavaScript objects referenced from Python. JavaScript objects cannot be directly stored into memory because the JavaScript virtual machine forbids directly obtaining a pointer to a JavaScript object. Instead, they are stored into a table and represented in WebAssembly as an index into the table.

We need to ensure that both of these tables are in exactly the same state after we restore a snapshot as they were when we captured the snapshot.

The function pointer table is always in the same state when the WebAssembly instance is initialized and is updated by the dynamic loader when we load dynamic libraries — native Python packages like numpy.

To handle dynamic loading:

When taking the snapshot, we patch the loader to record the load order of dynamic libraries, the address in memory where the metadata for each library is allocated, and the function pointer table base address for relocations.
When restoring the snapshot, we reload the dynamic libraries in the same order, and we use a patched memory allocator to place the metadata in the same locations. We assert that the current size of the function pointer table matches the function pointer table base we recorded for the dynamic library.

All of this ensures that each function pointer has the same meaning after we’ve restored the snapshot as it had when we took the snapshot.

To handle the JavaScript references, we implemented a fairly limited system. If a JavaScript object is accessible from globalThis by a series of property accesses, we record those property accesses and replay them when restoring the snapshot. If any reference exists to a JavaScript object that is not accessible in this way, we fail deployment of the Worker. This is good enough to deal with all the existing Python packages with Pyodide support, which do top level imports like:

from js import fetch

Reducing cold start frequency using sharding

Another important characteristic of our performance strategy for Python Workers is sharding. There is a very detailed description of what went into its implementation here. In short, we now route requests to existing Worker instances, whereas before we might have chosen to start a new instance.

Sharding was actually enabled for Python Workers first and proved to be a great test bed for it. A cold start is far more expensive in Python than in JavaScript, so ensuring requests are routed to an already-running isolate is especially important.

Where do we go from here?

This is just the start. We have many plans to make Python Workers better:

More developer-friendly tooling
Even faster cold starts by utilising our isolate architecture
Support for more packages
Support for native TCP sockets, native WebSockets, and more bindings

To learn more about Python Workers, check out the documentation available here. To get help, be sure to join our Discord.

Simple, scalable, and global: Containers are coming to Cloudflare Workers in June 2025

Mike Nomitch — Fri, 11 Apr 2025 14:00:00 GMT

It is almost the end of Developer Week and we haven’t talked about containers: until now. As some of you may know, we’ve been working on a container platform behind the scenes for some time.

In late June, we plan to release Containers in open beta, and today we’ll give you a sneak peek at what makes it unique.

Workers are the simplest way to ship software around the world with little overhead. But sometimes you need to do more. You might want to:

Run user-generated code in any language
Execute a CLI tool that needs a full Linux environment
Use several gigabytes of memory or multiple CPU cores
Port an existing application from AWS, GCP, or Azure without a major rewrite

Cloudflare Containers let you do all of that while being simple, scalable, and global.

Through a deep integration with Workers and an architecture built on Durable Objects, Workers can be your:

API Gateway: Letting you control routing, authentication, caching, and rate-limiting before requests reach a container
Service Mesh: Creating private connections between containers with a programmable routing layer
Orchestrator: Allowing you to write custom scheduling, scaling, and health checking logic for your containers

Instead of having to deploy new services, write custom Kubernetes operators, or wade through control plane configuration to extend the platform, you just write code.

Let’s see what it looks like.

Deploying different application types

A stateful workload: executing AI-generated code

First, let’s take a look at a stateful example.

Imagine you are building a platform where end-users can run code generated by an LLM. This code is untrusted, so each user needs their own secure sandbox. Additionally, you want users to be able to run multiple requests in sequence, potentially writing to local files or saving in-memory state.

To do this, you need to create a container on-demand for each user session, then route subsequent requests to that container. Here’s how you can accomplish this:

First, you write some basic Wrangler config, then you route requests to containers via your Worker:

import { Container } from "cloudflare:workers";

export default {
  async fetch(request, env) {
    const url = new URL(request.url);

    if (url.pathname.startsWith("/execute-code")) {
      const { sessionId, messages } = await request.json();
      // pass in prompt to get the code from Llama 4
      const codeToExecute = await env.AI.run("@cf/meta/llama-4-scout-17b-16e-instruct", { messages });

      // get a different container for each user session
      const id = env.CODE_EXECUTOR.idFromName(sessionId);
      const sandbox = env.CODE_EXECUTOR.get(id);

      // execute a request on the container
      return sandbox.fetch("/execute-code", { method: "POST", body: codeToExecute });
    }

    // ... rest of Worker ...
  },
};

// define your container using the Container class from cloudflare:workers
export class CodeExecutor extends Container {
  defaultPort = 8080;
  sleepAfter = "1m";
}

Then, deploy your code with a single command: wrangler deploy. This builds your container image, pushes it to Cloudflare’s registry, readies containers to boot quickly across the globe, and deploys your Worker.

$ wrangler deploy

That’s it.

How does it work?

Your Worker creates and starts up containers on-demand. Each time you call env.CODE_EXECUTOR.get(id) with a unique ID, it sends requests to a unique container instance. The container will automatically boot on the first fetch, then put itself to sleep after a configurable timeout, in this case 1 minute. You only pay for the time that the container is actively running.

When you request a new container, we boot one in a Cloudflare location near the incoming request. This means that low-latency workloads are well-served no matter the region. Cloudflare takes care of all the pre-warming and caching so you don’t have to think about it.

This allows each user to run code in their own secure environment.

Stateless and global: FFmpeg everywhere

Stateless and autoscaling applications work equally well on Cloudflare Containers.

Imagine you want to run a container that takes a video file and turns it into an animated GIF using FFmpeg. Unlike the previous example, any container can serve any request, but you still don’t want to send bytes across an ocean and back unnecessarily. So, ideally the app can be deployed everywhere.

To do this, you declare a container in Wrangler config and turn on autoscaling. This specific configuration ensures that one instance is always running and if CPU usage increases beyond 75% of capacity, additional instances are added:

"containers": [
  {
    "class_name": "GifMaker",
    "image": "./Dockerfile", // container source code can be alongside Worker code
    "instance_type": "basic",
    "autoscaling": {
      "minimum_instances": 1,
      "cpu_target": 75,
    }
  }
],
// ...rest of wrangler.jsonc...

To route requests, you just call env.GIF_MAKER.fetch and requests are automatically sent to the closest container:

import { Container } from "cloudflare:workers";

export class GifMaker extends Container {
  defaultPort: 1337,
}

export default {
  async fetch(request, env) {
    const url = new URL(request.url);

    if (url.pathname === "/make-gif") {
      return env.GIF_MAKER.fetch(request)
    }

    // ... rest of Worker ...
  },
};

Going beyond the basics

From the examples above, you can see that getting a basic container service running on Workers just takes a few lines of config and a little Workers code. There’s no need to worry about capacity, artifact registries, regions, or scaling.

For more advanced use, we’ve designed Cloudflare Containers to run on top of Durable Objects and work in tandem with Workers. Let’s take a look at the underlying architecture and see some of the advanced use cases it enables.

Durable Objects as programmable sidecars

Routing to containers is enabled using Durable Objects under the hood. In the examples above, the Container class from cloudflare:workers just wraps a container-enabled Durable Object and provides helper methods for common patterns. In the rest of this post, we’ll look at examples using Durable Objects directly, as this should shed light on the platform’s underlying design.

Each Durable Object acts as a programmable sidecar that can proxy requests to the container and manages its lifecycle. This allows you to control and extend your containers in ways that are hard on other platforms.

You can manually start, stop, and execute commands on a specific container by calling RPC methods on its Durable Object, which now has a new object at this.ctx.container:

class MyContainer extends DurableObject {
  // these RPC methods are callable from a Worker
  async customBoot(entrypoint, envVars) {
    this.ctx.container.start({ entrypoint, env: envVars });
  }

  async stopContainer() {
    const SIGTERM = 15;
    this.ctx.container.signal(SIGTERM);
  }

  async startBackupScript() {
    await this.ctx.container.exec(["./backup"]);
  }
}

You can also monitor your container and run hooks in response to Container status changes.

For instance, say you have a CI job that runs builds in a Container. You want to post a message to a Queue based on the exit status. You can easily program this behavior:

class BuilderContainer extends DurableObject {
  constructor(ctx, env) {
    super(ctx, env)
    async function onContainerExit() {
      await this.env.QUEUE.send({ status: "success", message: "Build Complete" });
    }

    async function onContainerError(err) {
      await this.env.QUEUE.send({ status: "error", message: err});
    }

    this.ctx.container.start();
    this.ctx.container.monitor().then(onContainerExit).catch(onContainerError); 
  }

  async isRunning() { return this.ctx.container.running; }
}

And lastly, if you have state that needs to be loaded into a container each time it runs, you can use status hooks to persist state from the container before it sleeps and to reload state into the container after it starts:

import { startAndWaitForPort } from "./helpers"

class MyContainer extends DurableObject {
  constructor(ctx, env) {
    super(ctx, env)
    this.ctx.blockConcurrencyWhile(async () => {
      this.ctx.storage.sql.exec('CREATE TABLE IF NOT EXISTS state (value TEXT)');
      this.ctx.storage.sql.exec("INSERT INTO state (value) SELECT '' WHERE NOT EXISTS 
(SELECT * FROM state)");
      await startAndWaitForPort(this.ctx.container, 8080);
      await this.setupContainer();
      this.ctx.container.monitor().then(this.onContainerExit); 
    });
  }

  async setupContainer() {
    const initialState = this.ctx.storage.sql.exec('SELECT * FROM state LIMIT 1').one().value;
    return this.ctx.container
      .getTcpPort(8080)
      .fetch("http://container/state", { body: initialState, method: 'POST' });
  }

  async onContainerExit() {
    const response = await this.ctx.container
      .getTcpPort(8080)
      .fetch('http://container/state');
    const newState = await response.text();
    this.ctx.storage.sql.exec('UPDATE state SET value = ?', newState);
  }
}

Building around your Containers with Workers

Not only do Durable Objects allow you to have fine-grained control over the Container lifecycle, the whole Workers platform allows you to extend routing and scheduling behavior as you see fit.

Using Workers as an API gateway

Workers provide programmable ingress logic from over 300 locations around the world. In this sense, they provide similar functionality to an API gateway.

For instance, let’s say you want to route requests to a different version of a container based on information in a header. This is accomplished in a few lines of code:

export default {
  async fetch(request, env) {
    const isExperimental = request.headers.get("x-version") === "experimental";
    
    if (isExperimental) {
      return env.MY_SERVICE_EXPERIMENTAL.fetch(request);
    } else {
      return env.MY_SERVICE_STANDARD.fetch(request);
    }
  },
};

Or you want to rate limit and authenticate requests to the container:

async fetch(request, env) {
  const url = new URL(request.url);

  if (url.pathname.startsWith('/api/')) {
    const token = request.headers.get("token");

    const isAuthenticated = await authenticateRequest(token);
    if (!isAuthenticated) {
      return new Response("Not authenticated", { status: 401 });
    }

    const { withinRateLimit } = await env.MY_RATE_LIMITER.limit({ key: token });
    if (!withinRateLimit) {
      return new Response("Rate limit exceeded for token", { status: 429 });
    }

    return env.MY_APP.fetch(request);
  }
  // ...
}

Using Workers as a service mesh

By default, Containers are private and can only be accessed via Workers, which can connect to one of many container ports. From within the container, you can expose a plain HTTP port, but requests will still be encrypted from the end user to the moment we send the data to the container’s TCP port in the host. Due to the communication being relayed through the Cloudflare network, the container does not need to set up TLS certificates to have secure connections in its open ports.

You can connect to the container through a WebSocket from the client too. See this repository for a full example of using Websockets.

Just as the Durable Object can act as proxy to the container, it can act as a proxy from the container as well. When setting up a container, you can toggle Internet access off and ensure that outgoing requests pass through Workers.

// ... when starting the container...
this.ctx.container.start({ 
  workersAddress: '10.0.0.2:8080',
  enableInternet: false, // 'enableInternet' is false by default
});

// ... container requests to '10.0.0.2:8080' securely route to a different service...
override async onContainerRequest(request: Request) {
  const containerId = this.env.SUB_SERVICE.idFromName(request.headers['X-Account-Id']);
  return this.env.SUB_SERVICE.get(containerId).fetch(request);
}

You can ensure all traffic in and out of your container is secured and encrypted end to end without having to deal with networking yourself.

This allows you to protect and connect containers within Cloudflare’s network… or even when connecting to external private networks.

Using Workers as an orchestrator

You might require custom scheduling and scaling logic that goes beyond what Cloudflare provides out of the box.

We don’t want you having to manage complex chains of API calls or writing an operator to get the logic you need. Just write some Worker code.

For instance, imagine your containers have a long startup period that involves loading data from an external source. You need to pre-warm containers manually, and need control over the specific region to prewarm. Additionally, you need to set up manual health checks that are accessible via Workers. You’re able to achieve this fairly simply with Workers and Durable Objects.

import { Container, DurableObject } from "cloudflare:workers";

// A singleton Durable Object to manage and scale containers

class ContainerManager extends DurableObject {
  scale(region, instanceCount) {
    for (let i = 0; i < instanceCount; i++) {
      const containerId = env.CONTAINER.idFromName(`instance-${region}-${i}`);
      // spawns a new container with a location hint
      await env.CONTAINER.get(containerId, { locationHint: region }).start();
    }
  }

  async setHealthy(containerId, isHealthy) {
    await this.ctx.storage.put(containerId, isHealthy);
  }
}

// A Container class for the underlying compute

class MyContainer extends Container {
  defaultPort = 8080;

  async onContainerStart() {
    // run healthcheck every 500ms
    await this.scheduleEvery(0.5, 'healthcheck');
  }

  async healthcheck() {
    const manager = this.env.MANAGER.get(
      this.env.MANAGER.idFromName("manager")
    );
    const id = this.ctx.id.toString();

    await this.container.fetch("/_health")
      .then(() => manager.setHealthy(id, true))
      .catch(() => manager.setHealthy(id, false));
  }
}

The ContainerManager Durable Object exposes the scale RPC call, which you can call as needed with a region and instanceCount which scales up the number of active Container instances in a given region using a location hint. The this.schedule code executes a manually defined healthcheck method on the Container and tracks its state in the Manager for use by other logic in your system.

These building blocks let users handle complex scheduling logic themselves. For a more detailed example using standard Durable Objects, see this repository.

We are excited to see the patterns you come up with when orchestrating complex applications built with containers, and trust that between Workers and Durable Objects, you’ll have the tools you need.

Integrating with more of Cloudflare’s Developer Platform

Since it is Developer Week 2025, we would be remiss to not talk about Workflows, which just went GA, and Agents, which just got even better.

Let’s finish up by taking a quick look at how you can integrate Containers with these two tools.

Running a short-lived job with Workflows & R2

You need to download a large file from R2, compress it, and upload it. You want to ensure that this succeeds, but don’t want to write retry logic and error handling yourself. Additionally, you don’t want to deal with rotating R2 API tokens or worry about network connections — it should be secure by default.

This is a perfect opportunity for a Workflow using Containers. The container can do the heavy lifting of compressing files, Workers can stream the data to and from R2, and the Workflow can ensure durable execution.

export class EncoderWorkflow extends WorkflowEntrypoint {
  async run(event: WorkflowEvent, step: WorkflowStep) {
    const id = this.env.ENCODER.idFromName(event.instanceId);
    const container = this.env.ENCODER.get(id);

    await step.do('init container', async () => {
      await container.init();
    });

    await step.do('compress the object with zstd', async () => {
      await container.ensureHealthy();
      const object = await this.env.ARTIFACTS.get(event.payload.r2Path);
      const result = await container.fetch('http://encoder/zstd', {
        method: 'POST', body: object.body 
      });
      await this.env.ARTIFACTS.put(`results${event.payload.r2Path}`, result.body);
    });

    await step.do('cleanup container', async () => {
      await container.destroy();
    });
  }
}

Calling a Container from an Agent

Lastly, imagine you have an AI agent that needs to spin up cloud infrastructure (you like to live dangerously). To do this, you want to use Terraform, but since it’s run from the command line, you can’t run it on Workers.

By defining a tool, you can enable your Agent to run the shell commands from a container:

// Make tools that call to a container from an agent

const createExternalResources = tool({
  description: "runs Terraform in a container to create resources",
  parameters: z.object({ sessionId: z.number(), config: z.string() }),
  execute: async ({ sessionId, config }) => {
    return this.env.TERRAFORM_RUNNER.get(sessionId).applyConfig(config);
  },
});

// Expose RPC Methods that call to the container

class TerraformRunner extends DurableObject {
  async applyConfig(config) {
    await this.ctx.container.getTcpPort(8080).fetch(APPLY_URL, {
      method: 'POST',
      body: JSON.stringify({ config }),
    });
  }

  // ...rest of DO...
}

Containers are so much more powerful when combined with other tools. Workers make it easy to do so in a secure and simple way.

Pay for what you use and use the right tool

The deep integration between Workers and Containers also makes it easy to pick the right tool for the job with regards to cost.

With Cloudflare Containers, you only pay for what you use. Charges start when a request is sent to the container or it is manually started. Charges stop after the container goes to sleep, which can happen automatically after a configurable timeout. This makes it easy to scale to zero, and allows you to get high utilization even with highly-variable traffic.

Containers are billed for every 10ms that they are actively running at the following rates:

Memory: $0.0000025 per GB-second
CPU: $0.000020 per vCPU-second
Disk $0.00000007 per GB-second

After 1 TB of free data transfer per month, egress from a Container will be priced per-region. We'll be working out the details between now and the beta, and will be launching with clear, transparent pricing across all dimensions so you know where you stand.

Workers are lighter weight than containers and save you money by not charging when waiting on I/O. This means that if you can, running on a Worker helps you save on cost. Luckily, on Cloudflare it is easy to route requests to the right tool.

Cost comparison

Comparing containers and functions services on paper is always going to be an apples to oranges exercise, and results can vary so much depending on use case. But to share a real example of our own, a year ago when Cloudflare acquired Baselime, Baselime was a heavy user of AWS Lambda. By moving to Cloudflare, they lowered their cloud compute bill by 80%.

Below we wanted to share one representative example that compares costs for an application that uses both containers and serverless functions together. It’d be easy for us to come up with a contrived example that uses containers sub-optimally on another platform, for the wrong types of workloads. We won’t do that here. We know that navigating cloud costs can be challenging, and that cost is a critical part of deciding what type of compute to use for which pieces of your application.

In the example below, we’ll compare Cloudflare Containers + Workers against Google Cloud Run, a very well-regarded container platform that we’ve been impressed by.

Example application

Imagine that you run an application that serves 50 million requests per month, and each request consumes an average 500 ms of wall-time. Requests to this application are not all the same though — half the requests require a container, and the other half can be served just using serverless functions.

Requests per month	Wall-time (duration)	Compute required	Cloudflare	Google Cloud
25 million	500ms	Container + serverless functions	Containers + Workers	Google Cloud Run + Google Cloud Run Functions
25 million	500ms	Serverless functions	Workers	Google Cloud Run Functions

Container pricing

On both Cloud Run and Cloudflare Containers, a container can serve multiple requests. On some platforms, such as AWS Lambda, each container instance is limited to a single request, pushing cost up significantly as request count grows. In this scenario, 50 requests can run simultaneously on a container with 4 GB memory and half of a vCPU. This means that to serve 25 million requests of 500ms each, we need 625,000 seconds worth of compute

In this example, traffic is bursty and we want to avoid paying for idle-time, so we’ll use Cloud Run’s request-based pricing.

	Price per vCPU second	Price per GB-second of memory	Price per 1m requests	Monthly Price for Compute + Requests
Cloudflare Containers	$0.000020	$0.0000025	$0.30	$20.00
Google Cloud Run	$0.000024	$0.0000025	$0.40	$23.75

^{* Comparison does not include free tiers for either provider and uses a single Tier 1 GCP region}

Compute pricing for both platforms are comparable. But as we showed earlier in this post, Containers on Cloudflare run anywhere, on-demand, without configuring and managing regions. Each container has a programmable sidecar with its own database, backed by Durable Objects. It’s the depth of integration with the rest of the platform that makes containers on Cloudflare uniquely programmable.

Function pricing

The other requests can be served with less compute, and code written in JavaScript, TypeScript, Python or Rust, so we’ll use Workers and Cloud Run Functions.

These 25 million requests also run for 500 ms each, and each request spends 480 ms waiting on I/O. This means that Workers will only charge for 20 ms of “CPU-time”, the time that the Worker actually spends using compute. This ratio of low CPU time to high wall time is extremely common when building AI apps that make inference requests, or even when just building REST APIs and other business logic. Most time is spent waiting on I/O. Based on our data, we typically see Workers use less than 5 ms of CPU time per request vs seconds of wall time (waiting on APIs or I/O).

The Cloud Run Function will use an instance with 0.083 vCPU and 128 MB memory and charge on both CPU-s and GiB-s for the full 500 ms of wall-time.

	Total Price for “wall-time”	Total Price for “CPU-time”	Total Price for Compute + Requests
Cloudflare Workers	N/A	$0.83	$8.33
Google Cloud Run Functions	$1.44	N/A	$11.44

^{* Comparison does not include free tiers and uses a single Tier 1 GCP region.}

This comparison assumes you have configured Google Cloud Run Functions with a max of 20 concurrent requests per instance. On Google Cloud Run Functions, the maximum number of concurrent requests an instance can handle varies based on the efficiency of your function, and your own tolerance for tail latency that can be introduced by traffic spikes.

Workers automatically scale horizontally, don’t require you to configure concurrency settings (and hope to get it right), and can run in over 300 locations.

A holistic view of costs

The most important cost metric is the total cost of developing and running an application. And the only way to get the best results is to use the right compute for the job. So the question boils down to friction and integration. How easily can you integrate the ideal building blocks together?

As more and more software makes use of generative AI, and makes inference requests to LLMs, modern applications must communicate and integrate with a myriad of services. Most systems are increasingly real-time and chatty, often holding open long-lived connections, performing tasks in parallel. Running an instance of an application in a VM or container and calling it a day might have worked 10 years ago, but when we talk to developers in 2025, they are most often bringing many forms of compute to the table for particular use cases.

This shows the importance of picking a platform where you can seamlessly shift traffic from one source of compute to another. If you want to rate-limit, serve server-side rendered pages, API responses and static assets, handle authentication and authorization, make inference requests to AI models, run core business logic via Workflows, or ingest streaming data, just handle the request in Workers. Save the heavier compute only for where it is actually the only option. With Cloudflare Workers and Containers, this is as simple as an if-else statement in your Worker. This makes it easy to pick the right tool for the job.

Coming June 2025

We are collecting feedback and putting the finishing touches on our APIs now, and will release the open beta to the public in late June 2025.

From day one of building Cloudflare Workers, it’s been our goal to build an integrated platform, where Cloudflare products work together as a system, rather than just as a collection of separate products. We’ve taken this same approach with Containers, and aim to make Cloudflare not only the best place to deploy containers across the globe, but the best place to deploy the types of complete applications that developers are building, that use containers in tandem with serverless functions, Workflows, Agents, and much more.

We’re excited to get this into your hands soon. Stay on the lookout this summer.

Our container platform is in production. It has GPUs. Here’s an early look

Brendan Irvine-Broque — Fri, 27 Sep 2024 13:00:00 GMT

We’ve been working on something new — a platform for running containers across Cloudflare’s network. We already use it in production for Workers AI, Workers Builds, Remote Browsing Isolation, and the Browser Rendering API. Today, we want to share an early look at how it’s built, why we built it, and how we use it ourselves.

In 2024, Cloudflare Workers celebrates its 7th birthday. When we first announced Workers, it was a completely new model for running compute in a multi-tenant way — on isolates, as opposed to containers. While, at the time, Workers was a pretty bare-bones functions-as-a-service product, we took a big bet that this was going to become the way software was going to be written going forward. Since introducing Workers, in addition to expanding our developer products in general to include storage and AI, we have been steadily adding more compute capabilities to Workers:

2020	Cron Triggers
2021	Durable Objects Write Workers in Rust Service Bindings
2022	Queues Email Workers Durable Objects Alarms
2023	Workers TCP Socket API Hyperdrive Smart Placement Workers AI
2024	Python Workers JavaScript-native RPC Node.js compatibility SQLite in Durable Objects

With each of these, we’ve faced a question — can we build this natively into the platform, in a way that removes, rather than adds complexity? Can we build it in a way that lets developers focus on building and shipping, rather than managing infrastructure, so that they don’t have to be a distributed systems engineer to build distributed systems?

In each instance, the answer has been YES. We try to solve problems in a way that simplifies things for developers in the long run, even if that is the harder path for us to take ourselves. If we didn’t, you’d be right to ask — why not self-host and manage all of this myself? What’s the point of the cloud if I’m still provisioning and managing infrastructure? These are the questions many are asking today about the earlier generation of cloud providers.

Pushing ourselves to build platform-native products and features helped us answer this question. Particularly because some of these actually use containers behind the scenes, even though as a developer you never interact with or think about containers yourself.

If you’ve used AI inference on GPUs with Workers AI, spun up headless browsers with Browser Rendering, or enqueued build jobs with the new Workers Builds, you’ve run containers on our network, without even knowing it. But to do so, we needed to be able to run untrusted code across Cloudflare’s network, outside a v8 isolate, in a way that fits what we promise:

You shouldn’t have to think about regions or data centers. Routing, scaling, load balancing, scheduling, and capacity are our problem to solve, not yours, with tools like Smart Placement.
You should be able to build distributed systems without being a distributed systems engineer.
Every millisecond matters — Cloudflare has to be fast.

There wasn’t an off-the-shelf container platform that solved for what we needed, so we built it ourselves — from scheduling to IP address management, pulling and caching images, to improving startup times and more. Our container platform powers many of our newest products, so we wanted to share how we built it, optimized it, and well, you can probably guess what’s next.

Global scheduling — “The Network is the Computer”

Cloudflare serves the entire world — region: earth. Rather than asking developers to provision resources in specific regions, data centers and availability zones, we think “The Network is the Computer”. When you build on Cloudflare, you build software that runs on the Internet, not just in a data center.

When we started working on this, Cloudflare’s architecture was to just run every service via systemd on every server (we call them “metals” — we run our own hardware), allowing all services to take advantage of new capacity we add to our network. That fits running NGINX and a few dozen other services, but cannot fit a world where we need to run many thousands of different compute heavy, resource hungry workloads. We’d run out of space just trying to load all of them! Consider a canonical AI workload — deploying Llama 3.1 8B to an inference server. If we simply ran a Llama 3.1 8B service on every Cloudflare metal, we’d have no flexibility to use GPUs for the many other models that Workers AI supports.

We needed something that would allow us to still take advantage of the full capacity of Cloudflare’s entire network, not just the capacity of individual machines. And ideally not put that burden on the developer.

The answer: we built a control plane on our own Developer Platform that lets us schedule a container anywhere on Cloudflare’s Network:

The global scheduler is built on Cloudflare Workers, Durable Objects, and KV, and decides which Cloudflare location to schedule the container to run in. Each location then runs its own scheduler, which decides which metals within that location to schedule the container to run on. Location schedulers monitor compute capacity, and expose this to the global scheduler. This allows Cloudflare to dynamically place workloads based on capacity and hardware availability (e.g. multiple types of GPUs) across our network.

Why does global scheduling matter?

When you run compute on a first generation cloud, the “contract” between the developer and the platform is that the developer must specify what runs where. This is regional scheduling, the status quo.

Let’s imagine for a second if we applied regional scheduling to running compute on Cloudflare’s network, with locations in 330+ cities, across 120+ countries. One of the obvious reasons people tell us they want to run on Cloudflare is because we have compute in places where others don’t, within 50ms of 95% of the world’s Internet-connected population. In South America, other clouds have one region in one city. Cloudflare has 19:

Running anywhere means you can be faster, highly available, and have more control over data location. But with regional scheduling, the more locations you run in, the more work you have to do. You configure and manage load balancing, routing, auto-scaling policies and more. Balancing performance and cost in a multi-region setup is literally a full-time job (or more) at most companies who have reached meaningful scale on traditional clouds.

But most importantly, no matter what tools you bring, you were the one who told the cloud provider, “run this container over here”. The cloud platform can’t move it for you, even if moving it would make your workload faster. This prevents the platform from adding locations, because for each location, it has to convince developers to take action themselves to move their compute workloads to the new location. Each new location carries a risk that developers won’t migrate workloads to it, or migrate too slowly, delaying the return on investment.

Global scheduling means Cloudflare can add capacity and use it immediately, letting you benefit from it. The “contract” between us and our customers isn’t tied to a specific datacenter or region, so we have permission to move workloads around to benefit customers. This flexibility plays an essential role in all of our own uses of our container platform, starting with GPUs and AI.

GPUs everywhere: Scheduling large images with Workers AI

In late 2023, we launched Workers AI, which provides fast, easy to use, and affordable GPU-backed AI inference.

The more efficiently we can use our capacity, the better pricing we can offer. And the faster we can make changes to which models run in which Cloudflare locations, the closer we can move AI inference to the application, lowering Time to First Token (TTFT). This also allows us to be more resilient to spikes in inference requests.

AI models that rely on GPUs present three challenges though:

Models have different GPU memory needs. GPU memory is the most scarce resource, and different GPUs have different amounts of memory.
Not all container runtimes, such as Firecracker, support GPU drivers and other dependencies.
AI models, particularly LLMs, are very large. Even a smaller parameter model, like @cf/meta/llama-3.1-8b-instruct, is at least 5 GB. The larger the model, the more bytes we need to pull across the network when scheduling a model to run in a new location.

Let’s dive into how we solved each of these…

First, GPU memory needs. The global scheduler knows which Cloudflare locations have blocks of GPU memory available, and then delegates scheduling the workload on a specific metal to the local scheduler. This allows us to prioritize placement of AI models that use a large amount of GPU memory, and then move smaller models to other machines in the same location. By doing this, we maximize the overall number of locations that we run AI models in, and maximize our efficiency.

Second, container runtimes and GPU support. Thankfully, from day one we built our container platform to be runtime agnostic. Using a runtime agnostic scheduler, we’re able to support gVisor, Firecracker microVMs, and traditional VMs with QEMU. We are also evaluating adding support for another one: cloud-hypervisor which is based on rust-vmm and has a few compelling advantages for our use case:

GPU passthrough support using VFIO
vhost-user-net support, enabling high throughput between the host network interface and the VM
vhost-user-blk support, adding flexibility to introduce novel network-based storage backed by other Cloudflare Workers products
all the while being a smaller codebase than QEMU and written in a memory-safe language

Our goal isn’t to build a platform that makes you as the developer choose between runtimes, and ask, “should I use Firecracker or gVisor”. We needed this flexibility in order to be able to run workloads with different needs efficiently, including workloads that depend on GPUs. gVisor has GPU support, while Firecracker microVMs currently does not.

gVisor’s main component is an application kernel (called Sentry) that implements a Linux-like interface but is written in a memory-safe language (Go) and runs in userspace. It works by intercepting application system calls and acting as the guest kernel, without the need for translation through virtualized hardware.

The resource footprint of a containerized application running on gVisor is lower than a VM because it does not require managing virtualized hardware and booting up a kernel instance. However, this comes at the price of reduced application compatibility and higher per-system call overhead.

To add GPU support, the Google team introduced nvproxy which works using the same principles as described above for syscalls: it intercepts ioctls destined to the GPU and proxies a subset to the GPU kernel module.

To solve the third challenge, and make scheduling fast with large models, we weren’t satisfied with the status quo. So we did something about it.

Docker pull was too slow, so we fixed it (and cut the time in half)

Many of the images we need to run for AI inference are over 15 GB. Specialized inference libraries and GPU drivers add up fast. For example, when we make a scheduling decision to run a fresh container in Tokyo, naively running docker pull to fetch the image from a storage bucket in Los Angeles would be unacceptably slow. And scheduling speed is critical to being able to scale up and down in new locations in response to changes in traffic.

We had 3 essential requirements:

Pulling and pushing very large images should be fast
We should not rely on a single point of failure
Our teams shouldn’t waste time managing image registries

We needed globally distributed storage, so we used R2. We needed the highest cache hit rate possible, so we used Cloudflare’s Cache, and will soon use Tiered Cache. And we needed a fast container image registry that we could run everywhere, in every Cloudflare location, so we built and open-sourced serverless-registry, which is built on Workers. You can deploy serverless-registry to your own Cloudflare account in about 5 minutes. We rely on it in production.

This is fast, but we can be faster. Our performance bottleneck was, somewhat surprisingly, docker push. Docker uses gzip to compress and decompress layers of images while pushing and pulling. So we started using Zstandard (zstd) instead, which compresses and decompresses faster, and results in smaller compressed files.

In order to build, chunk, and push these images to the R2 registry, we built a custom CLI tool that we use internally in lieu of running docker build and docker push. This makes it easy to use zstd and split layers into 500 MB chunks, which allows uploads to be processed by Workers while staying under body size limits.

Using our custom build and push tool doubled the speed of image pulls. Our 30 GB GPU images now pull in 4 minutes instead of 8. We plan on open sourcing this tool in the near future.

Anycast is the gift that keeps on simplifying — Virtual IPs and the Global State Router

We still had another challenge to solve. And yes, we solved it with anycast. We’re Cloudflare, did you expect anything else?

First, a refresher — Cloudflare operates Unimog, a Layer 4 load balancer that handles all incoming Cloudflare traffic. Cloudflare’s network uses anycast, which allows a single IP address to route requests to a variety of different locations. For most Cloudflare services with anycast, the given IP address will route to the nearest Cloudflare data center, reducing latency. Since Cloudflare runs almost every service in every data center, Unimog can simply route traffic to any Cloudflare metal that is online and has capacity, without needing to map traffic to a specific service that runs on specific metals, only in some locations.

The new compute-heavy, GPU-backed workloads we were taking on forced us to confront this fundamental “everything runs everywhere” assumption. If we run a containerized workflow in 20 Cloudflare locations, how does Unimog know which locations, and which metals, it runs in? You might say “just bring your own load balancer” — but then what happens when you make scheduling decisions to migrate a workload between locations, scale up, or scale down?

Anycast is foundational to how we build fast and simple products on our network, and we needed a way to keep building new types of products this way — where a team can deploy an application, get back a single IP address, and rely on the platform to balance traffic, taking load, container health, and latency into account, without extra configuration. We started letting teams use the container platform without solving this, and it was painfully clear that we needed to do something about it.

So we started integrating directly into our networking stack, building a sidecar service to Unimog. We’ll call this the Global State Router. Here’s how it works:

An eyeball makes a request to a virtual IP address issued by Cloudflare
Request sent to the best location as determined by BGP routing. This is anycast routing.
A small eBPF program sits on the main networking interface and ensures packets bound to a virtual IP address are handled by the Global State Router.
The main Global State Router program contains a mapping of all anycast IPs addresses to potential end destination container IP addresses. It updates this mapping based on container health, readiness, distance, and latency. Using this information, it picks a best-fit container.
Packets are forwarded at the L4 layer.
When a target container’s server receives a packet, its own Global State Router program intercepts the packet and routes it to the local container.

This might sound like just a lower level networking detail, disconnected from developer experience. But by integrating directly with Unimog, we can let developers:

Push a containerized application to Cloudflare.
Provide constraints, health checks, and load metrics that describe what the application needs.
Delegate scheduling and scaling many containers across Cloudflare’s network.
Get back a single IP address that can be used everywhere.

We’re actively working on this, and are excited to continue building on Cloudflare’s anycast capabilities, and pushing to keep the simplicity of running “everywhere” with new categories of workloads.

Low latency & global — Remote Browser Isolation & Browser Rendering

Our container platform actually started because of a very specific challenge, running Remote Browser Isolation across our network. Remote Browser Isolation provides Chromium browsers that run on Cloudflare, in containers, rather than on the end user’s own computer. Only the rendered output is sent to the end user. This provides a layer of protection against zero-day browser vulnerabilities, phishing attacks, and ransomware.

Location is critical — people expect their interactions with a remote browser to feel just as fast as if it ran locally. If the server is thousands of miles away, the remote browser will feel slow. Running across Cloudflare’s network of over 330 locations means the browser is nearly always as close to you as possible.

Imagine a user in Santiago, Chile, if they were to access a browser running in the same city, each interaction would incur negligible additional latency. Whereas a browser in Buenos Aires might add 21 ms, São Paulo might add 48 ms, Bogota might add 67 ms, and Raleigh, NC might add 128 ms. Where the container runs significantly impacts the latency of every user interaction with the browser, and therefore the experience as a whole.

It’s not just browser isolation that benefits from running near the user: WebRTC servers stream video better, multiplayer games have less lag, online advertisements can be served faster, financial transactions can be processed faster. Our container platform lets us run anything we need to near the user, no matter where they are in the world.

Using spare compute — “off-peak” jobs for Workers CI/CD builds

At any hour of the day, Cloudflare has many CPU cores that sit idle. This is compute power that could be used for something else.

Via anycast, most of Cloudflare’s traffic is handled as close as possible to the eyeball (person) that requested it. Most of our traffic originates from eyeballs. And the eyeballs of (most) people are closed and asleep between midnight and 5:00 AM local time. While we use our compute capacity very efficiently during the daytime in any part of the world, overnight we have spare cycles. Consider what a map of the world looks like at nighttime in Europe and Africa:

As shown above, we can run containers during “off-peak” in Cloudflare locations receiving low traffic at night. During this time, the CPU utilization of a typical Cloudflare metal looks something like this:

We have many “background” compute workloads at Cloudflare. These are workloads that don’t actually need to run close to the eyeball because there is no eyeball waiting on the request. The challenge is that many of these workloads require running untrusted code — either a dependency on open-source code that we don’t trust enough to run outside of a sandboxed environment, or untrusted code that customers deploy themselves. And unlike Cron Triggers, which already make a best-effort attempt to use off-peak compute, these other workloads can’t run in v8 isolates.

On Builder Day 2024, we announced Workers Builds in open beta. You connect your Worker to a git repository, and Cloudflare builds and deploys your Worker each time you merge a pull request. Workers Builds run on our containers platform, using otherwise idle “off-peak” compute, allowing us to offer lower pricing, and hold more capacity for unexpected spikes in traffic in Cloudflare locations during daytime hours when load is highest. We preserve our ability to serve requests as close to the eyeball as possible where it matters, while using the full compute capacity of our network.

We developed a purpose-built API for these types of jobs. The Workers Builds service has zero knowledge of where Cloudflare has spare compute capacity on its network — it simply schedules an “off-peak” job to run on the containers platform, by defining a scheduling policy:

scheduling_policy: "off-peak"

Making off-peak jobs faster with prewarmed images

Just because a workload isn’t “eyeball-facing” doesn’t mean speed isn’t relevant. When a build job starts, you still want it to start as soon as possible.

Each new build requires a fresh container though, and we must avoid reusing containers to provide strong isolation between customers. How can we keep build job start times low, while using a new container for each job without over-provisioning?

We prewarm servers with the proper image.

Before a server becomes eligible to receive an “off peak” job, the container platform instructs it to download the correct image. Once the image is downloaded and cached locally, new containers can start quickly in a Firecracker VM after receiving a request for a new build. When a build completes, we throw away the container, and start the next build using a fresh container based on the prewarmed image.

Without prewarming, pulling and unpacking our Workers Build images would take roughly 75 seconds. With prewarming, we’re able to spin up a new container in under 10 seconds. We expect this to get even faster as we introduce optimizations like pre-booting images before new runs, or Firecracker snapshotting, which can restore a VM in under 200ms.

Workers and containers, better together

As more of our own engineering teams rely on our containers platform in production, we’ve noticed a pattern: they want a deeper integration with Workers.

We plan to give it to them.

Let’s take a look at a project deployed on our container platform already, Key Transparency. If the container platform were highly integrated with Workers, what would this team’s experience look like?

Cloudflare regularly audits changes to public keys used by WhatsApp for encrypting messages between users. Much of the architecture is built on Workers, but there are long-running compute-intensive tasks that are better suited for containers.

We don’t want our teams to have to jump through hoops to deploy a container and integrate with Workers. They shouldn’t have to pick specific regions to run in, figure out scaling, expose IPs and handle IP updates, or set up Worker-to-container auth.

We’re still exploring many different ideas and API designs, and we want your feedback. But let’s imagine what it might look like to use Workers, Durable Objects and Containers together.

In this case, an outer layer of Workers handles most business logic and ingress, a specialized Durable Object is configured to run alongside our new container, and the platform ensures the image is loaded on the right metals and can scale to meet demand.

I add a containerized app to the wrangler.toml configuration file of my Worker (or Terraform):

[[container-app]]
image = "./key-transparency/verifier/Dockerfile"
name = "verifier"

[durable_objects]
bindings = { name = "VERIFIER", class_name = "Verifier", container = "verifier" } }

Then, in my Worker, I call the runVerification RPC method of my Durable Object:

fetch(request, env, ctx) {
  const id = new URL(request.url).searchParams.get('id')
  const durableObjectId = env.VERIFIER.idFromName(request.params.id);
  await env.VERIFIER.get(durableObjectId).runVerification()
  //...
}

From my Durable Object I can boot, configure, mount storage buckets as directories, and make HTTP requests to the container:

class Verifier extends DurableObject {
  constructor(state, env) {
    this.ctx.blockConcurrency(async () => {

      // starts the container
      await this.ctx.container.start();

      // configures the container before accepting traffic
      const config = await this.state.storage.get("verifierConfig");
      await this.ctx.container.fetch("/set-config", { method: "PUT", body: config});
    })
  }

  async runVerification(updateId) {
    // downloads & mounts latest updates from R2
    const latestPublicKeyUpdates = await this.env.R2.get(`public-key-updates/${updateId}`);
    await this.ctx.container.mount(`/updates/${updateId}`, latestPublicKeyUpdates);

    // starts verification via HTTP call 
    return await this.ctx.container.fetch(`/verifier/${updateId}`);
  }
}

And… that’s it.

I didn’t have to worry about placement, scaling, service discovery authorization, and I was able to leverage integrations into other services like KV and R2 with just a few lines of code. The container platform took care of routing, placement, and auth. If I needed more instances, I could call the binding with a new ID, and the platform would scale up containers for me.

We are still in the early stages of building these integrations, but we’re excited about everything that containers will bring to Workers and vice versa.

So, what do you want to build?

If you’ve read this far, there’s a non-zero chance you were hoping to get to run a container yourself on our network. While we’re not ready (quite yet) to open up the platform to everyone, now that we’ve built a few GA products on our container platform, we’re looking for a handful of engineering teams to start building, in advance of wider availability in 2025. And we’re continuing to hire engineers to work on this.

We’ve told you about our use cases for containers, and now it’s your turn. If you’re interested, tell us here what you want to build, and why it goes beyond what’s possible today in Workers and on our Developer Platform. What do you wish you could build on Cloudflare, but can’t yet today?

More NPM packages on Cloudflare Workers: Combining polyfills and native code to support Node.js APIs

James M Snell — Mon, 09 Sep 2024 21:00:00 GMT

Today, we are excited to announce a preview of improved Node.js compatibility for Workers and Pages. Broader compatibility lets you use more NPM packages and take advantage of the JavaScript ecosystem when writing your Workers.

Our newest version of Node.js compatibility combines the best features of our previous efforts. Cloudflare Workers have supported Node.js in some form for quite a while. We first announced polyfill support in 2021, and later built-in support for parts of the Node.js API that has expanded over time.

The latest changes make it even better:

You can use far more NPM packages on Workers.
You can use packages that do not use the node: prefix to import Node.js APIs
You can use more Node.js APIs on Workers, including most methods on async_hooks, buffer, dns, os, and events. Many more, such as fs or process are importable with mocked methods.

To give it a try, add the following flag to wrangler.toml, and deploy your Worker with Wrangler:

compatibility_flags = ["nodejs_compat_v2"]

Packages that could not be imported with nodejs_compat, even as a dependency of another package, will now load. This includes popular packages such as body-parser, jsonwebtoken, {}got, passport, md5, knex, mailparser, csv-stringify, cookie-signature, stream-slice, and many more.

This behavior will soon become the default for all Workers with the existing nodejs_compat compatibility flag enabled, and a compatibility date of 2024-09-23 or later. As you experiment with improved Node.js compatibility, share your feedback by opening an issue on GitHub.

Workerd is not Node.js

To understand the latest changes, let’s start with a brief overview of how the Workers runtime differs from Node.js.

Node.js was built primarily for services run directly on a host OS and pioneered server-side JavaScript. Because of this, it includes functionality necessary to interact with the host machine, such as process or fs, and a variety of utility modules, such as crypto.

Cloudflare Workers run on an open source JavaScript/Wasm runtime called workerd. While both Node.js and workerd are built on V8, workerd is designed to run untrusted code in shared processes, exposes bindings for interoperability with other Cloudflare services, including JavaScript-native RPC, and uses web-standard APIs whenever possible.

Cloudflare helped establish WinterCG, the Web-interoperable Runtimes Community Group to improve interoperability of JavaScript runtimes, both with each other and with the web platform. You can build many applications using only web-standard APIs, but what about when you want to import dependencies from NPM that rely on Node.js APIs?

For example, if you attempt to import pg, a PostgreSQL driver, without Node.js compatibility turned on…

import pg from 'pg'

You will see the following error when you run wrangler dev to build your Worker:

✘ [ERROR] Could not resolve "events"
    ../node_modules/.pnpm/pg-cloudflare@1.1.1/node_modules/pg-cloudflare/dist/index.js:1:29:
      1 │ import { EventEmitter } from 'events';
        ╵                              ~~~~~~~~
  The package "events" wasn't found on the file system but is built into node.

This happens because the pg package imports the events module from Node.js, which is not provided by workerd by default.

How can we enable this?

Our first approach – build-time polyfills

Polyfills are code that add functionality to a runtime that does not natively support it. They are often added to provide modern JavaScript functionality to older browsers, but can be used for server-side runtimes as well.

In 2022, we added functionality to Wrangler that injected polyfill implementations of some Node.js APIs into your Worker if you set node_compat = true in your wrangler.toml. For instance, the following code would work with this flag, but not without:

import EventEmitter from 'events';
import { inherits } from 'util';

These polyfills are essentially just additional JavaScript code added to your Worker by Wrangler when deploying the Worker. This behavior is enabled by @esbuild-plugins/node-globals-polyfill which in itself uses rollup-plugin-node-polyfills.

This allows you to import and use some NPM packages, such as pg. However, many modules cannot be polyfilled with fast enough code or cannot be polyfilled at all.

For instance, Buffer is a common Node.js API used to handle binary data. Polyfills exist for it, but JavaScript is often not optimized for the operations it performs under the hood, such as copy, concat, substring searches, or transcoding. While it is possible to implement in pure JavaScript, it could be far faster if the underlying runtime could use primitives from different languages. Similar limitations exist for other popular APIs such as Crypto, AsyncLocalStorage, and Stream.

Our second approach – native support for some Node.js APIs in the Workers runtime

In 2023, we started adding a subset of Node.js APIs directly to the Workers runtime. You can enable these APIs by adding the nodejs_compat compatibility flag to your Worker, but you cannot use polyfills with node_compat = true at the same time.

Also, when importing Node.js APIs, you must use the node: prefix:

import { Buffer } from 'node:buffer';

Since these Node.js APIs are built directly into the Workers runtime, they can be written in C++, which allows them to be faster than JavaScript polyfills. APIs like AsyncLocalStorage, which cannot be polyfilled without safety or performance issues, can be provided natively.

Requiring the node: prefix made imports more explicit and aligns with modern Node.js conventions. Unfortunately, existing NPM packages may import modules without node:. For instance, revisiting the example above, if you import the popular package pg in a Worker with the nodejs_compat flag, you still see the following error:

✘ [ERROR] Could not resolve "events"
    ../node_modules/.pnpm/pg-cloudflare@1.1.1/node_modules/pg-cloudflare/dist/index.js:1:29:
      1 │ import { EventEmitter } from 'events';
        ╵                              ~~~~~~~~
  The package "events" wasn't found on the file system but is built into node.

Many NPM packages still didn’t work in Workers, even if you enabled the nodejs_compat compatibility flag. You had to choose between a smaller set of performant APIs, exposed in a way that many NPM packages couldn’t access, or a larger set of incomplete and less performant APIs. And APIs like process that are exposed as globals in Node.js could still only be accessed by importing them as modules.

The new approach: a hybrid model

What if we could have the best of both worlds, and it just worked?

A subset of Node.js APIs implemented directly in the Workers Runtime
Polyfills for the majority of other Node.js APIs
No node: prefix required
One simple way to opt-in

Improved Node.js compatibility does just that.

Let’s take a look at two lines of code that look similar, but now act differently under the hood when nodejs_compat_v2 is enabled:

import { Buffer } from 'buffer';  // natively implemented
import { isIP } from 'net'; // polyfilled

The first line imports Buffer from a JavaScript module in workerd that is backed by C++ code. Various other Node.js modules are similarly implemented in a combination of Typescript and C++, including AsyncLocalStorage and Crypto. This allows for highly performant code that matches Node.js behavior.

Note that the node: prefix is not needed when importing buffer, but the code would also work with node:buffer.

The second line imports net which Wrangler automatically polyfills using a library called unenv. Polyfills and built-in runtime APIs now work together.

Previously, when you set node_compat = true, Wrangler added polyfills for every Node.js API that it was able to, even if neither your Worker nor its dependencies used that API. When you enable the nodejs_compat_v2 compatibility flag, Wrangler only adds polyfills for Node.js APIs that your Worker or its dependencies actually use. This results in small Worker sizes, even with polyfills.

For some Node.js APIs, there is not yet native support in the Workers runtime nor a polyfill implementation. In these cases, unenv “mocks” the interface. This means it adds the module and its methods to your Worker, but calling methods of the module will either do nothing or will throw an error with a message like:

[unenv] is not implemented yet!

This is more important than it might seem. Because if a Node.js API is “mocked”, NPM packages that depend on it can still be imported. Consider the following code:

// Package name: my-module

import fs from "fs";

export function foo(path) {
  const data = fs.readFileSync(path, 'utf8');
  return data;
}

export function bar() {
  return "baz";
}

import { bar } from "my-module"

bar(); // returns "baz"
foo(); // throws readFileSync is not implemented yet!

Previously, even with the existing nodejs_compat compatibility flag enabled, attempting to import my-module would fail at build time, because the fs module could not be resolved. Now, the fs module can be resolved, methods that do not rely on an unimplemented Node.js API work, and methods that do throw a more specific error – a runtime error that a specific Node.js API method is not yet supported, rather than a build-time error that the module could not be resolved.

This is what enables some packages to transition from “doesn’t even load on Workers” to, “loads, but with some unsupported methods”.

Still missing an API from Node.js? Module aliasing to the rescue

Let’s say you need an NPM package to work on Workers that relies on a Node.js API that isn’t yet implemented in the Workers runtime or as a polyfill in unenv. You can use module aliasing to implement just enough of that API to make things work.

For example, let’s say the NPM package you need to work calls fs.readFile. You can alias the fs module by adding the following to your Worker’s wrangler.toml:

[alias]
"fs" = "./fs-polyfill"

Then, in the fs-polyfill.js file, you can define your own implementation of any methods of the fs module:

export function readFile() {
  console.log("readFile was called");
  // ...
}

Now, the following code, which previously threw the error message “[unenv] readFile is not implemented yet!”, runs without errors:

import { readFile } from 'fs';

export default {
  async fetch(request, env, ctx) {
    readFile();
    return new Response('Hello World!');
  },
};

You can also use module aliasing to provide an implementation of an NPM package that does not work on Workers, even if you only rely on that NPM package indirectly, as a dependency of one of your Worker's dependencies.

For example, some NPM packages, such as cross-fetch, depend on node-fetch, a package that provided a polyfill of the fetch() API before it was built into Node.js. The node-fetch package isn't needed in Workers, because the fetch() API is provided by the Workers runtime. And node-fetch doesn't work on Workers, because it relies on currently unsupported Node.js APIs from the http and https modules.

You can alias all imports of node-fetch to instead point directly to the fetch() API that is built into the Workers runtime using the popular nolyfill package:

[alias]
"node-fetch" = "./fetch-nolyfill"

All your replacement module needs to do in this case is to re-export the fetch API that is built into the Workers runtime:

export default fetch;

Contributing back to unenv

Cloudflare is actively contributing to unenv. We think unenv is solving the problem of cross-runtime compatibility the right way — it adds only the necessary polyfills to your application, based on what APIs you use and what runtime you target. The project supports a variety of runtimes beyond workerd and is already used by other popular projects including Nuxt and Nitro. We want to thank Pooya Parsa and the unenv maintainers and encourage others in the ecosystem to adopt or contribute.

The path forward

Currently, you can enable improved Node.js compatibility by setting the nodejs_compat_v2 flag in wrangler.toml. We plan to make the new behavior the default when using the nodejs_compat flag on September 23rd. This will require updating your compatibility_date.

We are excited about the changes coming to Node.js compatibility, and encourage you to try it today. See the documentation on how to opt-in for your Workers, and please send feedback and report bugs by opening an issue. Doing so will help us identify any gaps in support and ensure that as much of the Node.js ecosystem as possible runs on Workers.